Why are external links processed?

Started by Manuela, July 13, 2023, 05:34:30 PM

Previous topic - Next topic

Manuela

This is very time consuming especially when the linked site is not available or a large pdf is linked.

Richard Moss

Because I can't win.

If external links are excluded, then people complain that resources on CDN's aren't downloaded automatically.

If external links are included, then people complain that external links are being scanned.

As WebCopy has evolved, it tries to have a sensible set of defaults, and the sensible default in this pair is to assume that a user wants to copy linked resources. That means while it won't automatically download or crawl HTML on external sites, it will still try to get the content type in order to determine if it is a resource (such as your PDF) that it will download.

See the Downloading all resources setting for details - this controls whether external URLs are queried for their content type in order to download non-HTML

If there are certain domains you don't care about, then create rules to exclude them.

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

Manuela

Thank you for your answer.

I guess I misunderstood this option. Now I have it unchecked and will see what happens.

Manuela

The download stops immediately when I uncheck "download all resources". This is clearly not what I want.

What about an option "do not download any resources from external hosts"?