Main Menu

Recent posts

#1
WebCopy / Asking for a real case explana...
Last post by jlomo - Today at 06:27:17 PM
Hello,
What will be the step by step procedure if I want to load only the pages linked to https://www.lacoccinelle.net/211213.html from the site https://www.lacoccinelle.net?
Thank you so much.
#2
WebCopy / Failed to load image resource ...
Last post by Steve_Gibson - July 03, 2024, 07:35:58 PM
I have long been a huge fan and promoter of WebCopy. I have a long-running (19 years) popular podcast ("Security Now!") where I've several times promoted and strongly recommended WebCopy. It has never given me any trouble... until today.

Today, upon startup, the opening splash screen came up and stalled. Finally, after quite a pause, the following series of five "Failed to load image resource" errors were presented in succession:


Since I hadn't run WebCopy for a while I figured I'd download the latest and greatest. This jumped me from a 2021 release to 2023. But... no change.

The only thing funky about my world may be that this machine I'm using is still Win7... but WebCopy states that it's still compatible with Win7, and the last time I used it I had no problems.

Note that I searched high and low on this site for any way to create a "ticket" or just to let Richard know about this little glitch. Also... after dismissing the five error messages, WebCopy ran and performed flawlessly.

So... just a heads-up, Richard.  Thanks for the terrific world-class solution.  I've sent money a few times, including again today, since I would LOVE to have you keep this alive, though it's already excellent!
#3
WebCopy / archive.org and "above root" e...
Last post by ben - June 26, 2024, 01:19:14 PM
I am trying to copy a website from archive.org, the format is something like
https://web.archive.org/web/20180914202352/http://www.motorboards.org/index.php/Main_Page

So I say "Download Everyting" and in rules
I download everything that matches "motorboards.org"
and I exclude everything that doesn't match "motorboards.org"

However the vast majority of pages that I want are ignored because of "above root" error.
Why does "above root" even appear, if I say to download everything?
Thank you
#4
WebCopy / What is meant by "Site Only"? ...
Last post by therogoc - June 11, 2024, 10:45:05 PM
This webpage has PDF and LaTeX files linked:

https://borisbabic.com/teaching/inseadqm/home/index.html

I like to download the page with the linked files for my own use.

1. Is the "Site Only" crawl mode with "Download all resources" really the correct approach? The documentation is confusing in regard to domain and URL.

2. The linked LaTeX files are incorrectly saved with an HTML file. How to download them correctly?
#5
WebCopy / Using 'input-file'
Last post by Jake Morrison - June 11, 2024, 09:19:27 PM
Hi,
Both the CLI & GUI command lines allow '/input-file <file>' to be used.
I'm wondering if there is a description available of the file format or any other help with using 'input-file'.

Also, I do not find a way to use an input file within the program. Have I missed it somehow?

thanks,
Jake
#6
WebCopy / Re: How to crawl remote sitema...
Last post by lychniscuddly - June 10, 2024, 03:28:15 AM
Hi, I'm a new member.
Do you have any updates? I'm experiencing the same issue as mentioned earlier. The same version.
#7
WebCopy / Need to control timeout for we...
Last post by Awful_Genius - June 08, 2024, 08:14:29 AM
Hi all!

I'm using WebCopy for a long time and often get error: "The operation has timed out" for some files while crawling:



After some researching I got that the most likely that WebCopy use something like WebRequest .NET method and use small timeout for it:

System.Net.WebException: The operation has timed out

Sometimes Exception raised for big files, which web server can't send quickly.

So, can you add to WebCopy functionality to control requests timeouts?

Other option - redownload option for Errors tab

#8
WebCopy / How to crawl remote sitemap.xm...
Last post by chillybang - June 03, 2024, 05:42:33 PM
Hey!

I must say, I'm hardly impressed after switching from HTTrack to Cyotek - it is like a millenium switch :)

As a newbie I miss a feature or rather the way to find it - namely to crawl URLs from a remote, live sitemap.

I tried to add a sitemap as https://example.com/sitemap.xml to "Load additional URLs from file" - but this doesn't work (rises an alert on saving).

Do I miss something? What is the way to crawl all URLs from a remote live sitemap? Saving site's sitemap locally to use it as an URL list isn't very useful because sites update their sitemaps...
#9
WebCopy / Local File Links issue + some ...
Last post by DavesNotHereDude - June 02, 2024, 04:07:54 PM
I've having a hard time getting URL's to link properly on a site where the pages are hosted in multiple sub directories.

Example.

Although the root directory is https://guides.goingviralnewsletter.hookpoint.com/

The page I want to download started at https://guides.goingviralnewsletter.hookpoint.com/0c6dd8e9/school-of-hard-knockz/

I'm able download the site,  and when I open the below index.htm, I'm able to load the page (some images don't load, more on that later)

C:\Downloaded Web Sites\guides.goingviralnewsletter.hookpoint.com\0c6dd8e9\school-of-hard-knockz\index.htm

When I click a link on the loaded page, it redirects to file:///C:/0c6dd8e9/school-of-hard-knockz/LINK

Instead of the full URL.

Note, this is only when I disable "Remap references within downloaded file". When this is enabled, the index.htm on https://guides.goingviralnewsletter.hookpoint.com/0c6dd8e9/school-of-hard-knockz/ also loads "Your file couldn't be accessed" because it references file:///C:/0c6dd8e9/school-of-hard-knockz/

Second issue is some of the content doesn't load.

Couple examples:

If I download https://guides.goingviralnewsletter.hookpoint.com/0c6dd8e9/school-of-hard-knockz/ the banner image doesn't load.
If I download https://guides.goingviralnewsletter.hookpoint.com/0c6dd8e9/school-of-hard-knockz/video-references/ there are links to video's that don't render (not sure if that's fixable?)

The secondary issue is moot if I can't get the primary issue resolved.

I've searched through the forum and I've seen a couple suggestions (I thought Remap references within downloaded file was the fix for this issue, but oddly, when it's turned on, it seems to break the main index file within the root directly I added to start the crawl from)
#10
WebCopy / Re: How to auto skip warning o...
Last post by haultalented - May 30, 2024, 10:31:43 AM
Hi i'm a new member.
Have any updates? I have the same problem as mentioned. The same version.