Limit distance from root URL - always one step less for HTML files

Started by hajzlik, December 09, 2021, 11:02:09 AM

Previous topic - Next topic

hajzlik

There should be an option to set different limits for HTML and for non-HTML files.

Let's say you have a sitemap. You want to download all the linked pages, but not any further links.

You can limit the distance from root URL to 1, but then you end up with HTML pages without images and other content.

You can limit the distance from root URL to 2, and the other content will download, but you will end up with a bunch of unwanted HTML files (but without any images they contain, which makes them even more useless).

david72

I am trying to only scan the folders without saving html files

Anyone have idea how to do the same ?
Due to huge files and deep folders , its difficult store index files

Thanks
Dav

Richard Moss

Quote from: hajzlik on December 09, 2021, 11:02:09 AMThere should be an option to set different limits for HTML and for non-HTML files.

Let's say you have a sitemap. You want to download all the linked pages, but not any further links.

You can limit the distance from root URL to 1, but then you end up with HTML pages without images and other content.

You can limit the distance from root URL to 2, and the other content will download, but you will end up with a bunch of unwanted HTML files (but without any images they contain, which makes them even more useless).

Hello,

Belated thanks for the feedback. This makes a lot of sense! I don't think it makes sense to add another option (or maybe add it but not expose it for now, or hide it away somewhere), but I do think it makes sense to completely ignore distance for non-HTML. I've logged that as issue #464.

Thanks again!

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

Richard Moss

Quote from: david72 on March 19, 2023, 08:04:24 AMI am trying to only scan the folders without saving html files

Anyone have idea how to do the same ?
Due to huge files and deep folders , its difficult store index files

Thanks
Dav


I haven't quite worked out if this is an AI generated spam post, but decided to err on the side of caution this time instead of deleting account and post as I usually do. I did edit the post the remove the link though.

Open the Project menu, click Scan Website
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.