Cyotek Forums

Products => WebCopy => Topic started by: Manuela on July 13, 2023, 05:34:30 PM

Title: Why are external links processed?
Post by: Manuela on July 13, 2023, 05:34:30 PM
This is very time consuming especially when the linked site is not available or a large pdf is linked.
Title: Re: Why are external links processed?
Post by: Richard Moss on July 13, 2023, 09:05:00 PM
Because I can't win.

If external links are excluded, then people complain that resources on CDN's aren't downloaded automatically.

If external links are included, then people complain that external links are being scanned.

As WebCopy has evolved, it tries to have a sensible set of defaults, and the sensible default in this pair is to assume that a user wants to copy linked resources. That means while it won't automatically download or crawl HTML on external sites, it will still try to get the content type in order to determine if it is a resource (such as your PDF) that it will download.

See the Downloading all resources (https://docs.cyotek.com/cyowcopy/current/settingsresources.html) setting for details - this controls whether external URLs are queried for their content type in order to download non-HTML

If there are certain domains you don't care about, then create rules to exclude them.

Regards;
Richard Moss
Title: Re: Why are external links processed?
Post by: Manuela on July 14, 2023, 12:04:35 PM
Thank you for your answer.

I guess I misunderstood this option. Now I have it unchecked and will see what happens.
Title: Re: Why are external links processed?
Post by: Manuela on July 14, 2023, 12:07:00 PM
The download stops immediately when I uncheck "download all resources". This is clearly not what I want.

What about an option "do not download any resources from external hosts"?