pet medications no prescription california institute for regenerative medicine

Recent Posts

Pages: [1] 2 3 ... 10
WebCopy / Re: Remapping of links
« Last post by Richard Moss on December 11, 2018, 05:13:44 PM »

Welcome to the forums and thank you for the question. Unfortunately if the WebCopy crashed then it won't have any reference to anything you've already downloaded and so won't be able to use these files.

What you could do is try splitting the job into two - I tested this process on the demonstration website which worked fine.

Firstly, make sure the "Folder | Empty website folder before copy" and "Links | Clear link information before scan" options are not set.

Secondly, add a rule for the expression .* with the Exclude and Crawl Content  options set. This will instruction WebCopy to scan all HTML files, but not to download any resources or keep any of the HTML.

Next, add a rule to download the media. What expression you use depends on the contents of your site, but I used (\.png|\.jpg) to download images. For this rule you need to set the Include option. This rule overrides the first rule and states that any matching URL's should be fully downloaded.

Now run the job. (Perhaps this should be tested on a subset of your site first to avoid having to download everything again only for it to fail).

Once complete, save the project so we can at least resume from this point.

Now, change the second rule from Include to Exclude and completely disable (or delete) the first rule.

If the job is then ran again, WebCopy will start downloading all the HTML, CSS, and other files that it ignored before. It knows about the media files that were previously downloaded and so when it remaps the files, it will use local filenames of that existing media.

In regards to your "Always download latest version" question, again this relies on WebCopy already having the meta data for a given URL, and that the website returns last modified or etag headers. So again, if WebCopy crashed and the project file wasn't saved, as far as it is concerned it doesn't know anything about any local files - it will ignore then and download fresh copies. I shall log something to reconsider this behaviour.

I hope this helps and I do apologise for the poor behaviour on the part of WebCopy. I do have plans to fix this issue (it is logged as #326), but I was leaving this until after #61 (multi threaded crawling) was implemented.

Richard Moss
WebCopy / Re: Is there a way to copy a select range of pages in a larger group?
« Last post by Richard Moss on December 09, 2018, 08:00:27 PM »

Welcome to the forums and thanks for the question. Unfortunately, there isn't a way to resume a failed scan. Although you can only copy changed files if supported by the web server and WebCopy is configured correctly, this assumes a successful initial crawl.

Work is ongoing in a separate branch to implement pause/resume support (#165), as part of which I planned to autosave state as the crawl progressed to allow recovery from crash scenarios. This was originally planned for 1.7 but I'm quite behind on getting that build shipped so it may have to be deferred.

If the software crashed due to memory errors do you have a reference number to help pin down the issue?

Richard Moss
WebCopy / Re: Copying only ONE base URL
« Last post by Richard Moss on December 09, 2018, 07:52:00 PM »

      Welcome to the forums and thanks for the question. Sorry it has taken so long to get around to answer it.

      You can use the Rules feature to do this

      • From the main window, click Rules
      • Click Add
      • Enter /listings into the Expression field
      • Uncheck Exclude and check Include
      • Click Advanced and then check Stop processing more rules
      • Follow the above steps to add another rule with the expression /pittsburgh
      • Click Add
      • Enter .* into the Expression field
      • Click OK

      The first rule and second rules make sure that any URL containing "listings" or "pittsburgh" is downloaded - and also to stop checking more rules if a match is found. The final rule specifies that everything else will be excluded.

      Hope this helps.

      Richard Moss
WebCopy / Remapping of links
« Last post by epiktek on December 09, 2018, 02:56:14 PM »

I'm archiving a site that's 50GB+ with external media (hosted on AWS). Since this job was so large, there were errors partway through which made the process stop. What I didn't realize was that links would not get remapped until the very END of the job. Cyotek did a great job of downloading everything but since the operation was interrupted partway through, a lot of the links were not remapped.

What are my options now?
1) Can I exclude all external media and start the process to download only the HTML/ASP files and have the process complete to get the links remapped? My concern is that links to the external media files will not get remapped due to the fact that I excluded WebCopy from downloading external media.
2) Can I force WebCopy to only update the HTML/ASP files. If I set "Always download latest version" will it re-download all the media files?
3) Delete all the HTML/ASP files (keeping the large media files), UNCHECK "Always download latest version" (so it won't force a re-download of the media files), and letting WebCopy run though to completion so it remaps the links?

Any suggestions?

WebCopy / Is there a way to copy a select range of pages in a larger group?
« Last post by dumbnoob on December 06, 2018, 03:33:25 AM »
Thank you for this amazing program! Your work is very much appreciated.

As a complete idiot, I was able to run the program and get almost everything I was hoping for. Unfortunately, after a few days, I ran out of memory just shy of completion. There are about 10,000 pages left and they all have the same name format:  page-65000.html, page-65001.html, page-65002.html...

Note: I've already copied page-20000.html to page-64999.html

Is there a way I can direct WebCopy to just copy the pages that I need and not have to start over?

I apologize if this has already been answered elsewhere in the forum. I have looked and found a lot of helpful info, but nothing that addresses this exact possibility.

Thanks again for WebCopy and for any help you can provide!

WebCopy / Copying only ONE base URL
« Last post by magnao on December 04, 2018, 10:33:44 PM »
I'm trying to add a rule to make Webcopy only scan links on a page that start with a specific URL: for instance, if I'm having it search the url "", and I want it to only scan and download pages that have the URL "", how do I do this? This would be assuming that "listings" URL provides links to every single "pittsburgh" link on the website.

Any help or advice would be greatly appreciated!
WebCopy / Re: Tumblr
« Last post by Richard Moss on November 20, 2018, 07:16:06 PM »

Welcome to the forums and thanks for the question. The currently nightly builds of WebCopy 1.7 "should" be able to handle the login via the new "Use web browser to login" option, which will display a browser window for you to authenticate with - it simply isn't possible for WebCopy to handle recapture or multi-factor authentication otherwise.

Aside from that, I'm not aware of any explicit issues with copying Tumblr sites, assume it isn't JavaScript based.

Richard Moss

Welcome to the forums and thanks for the question. Create a rule which contains the value about and any URI containing about would be ignored.

Richard Moss
WebCopy / Re: How to create a new rule to exclude specific directory?
« Last post by Richard Moss on November 20, 2018, 06:33:19 AM »

To exclude everything in a given directory, create a rule which contains the directory name surrounded by forward slashes. For example, if the site had a directory named forums you wanted to exclude, you would create an exclude rule with the expression /forums/.

For bonus points you could use the rule /forums(/|$|\?). This slightly more complicated expression will match the folder, or will match a URI which just includes the folder name or a query string without any document names.

Hope this helps.

Richard Moss
WebCopy / Tumblr
« Last post by designationme on November 17, 2018, 10:49:34 PM »

First post. 

There is a Tumblr site for motivational images.   I am looking to grab some images and want to know if there is a way to get through to Tumblr.

I know there is a lot of discussion about how to login to sites, but Tumblr has a log in, separate page for password and I think a captcha... 

Any information is appreciated. I would like to know if it's not possible too, just so I can stop banging my head against the wall.   

Thank you
Pages: [1] 2 3 ... 10