From Landing Page How To Follow Other Pages

Started by transfield, April 12, 2019, 06:26:11 PM

Previous topic - Next topic

transfield

Hi,
Let's say my initial landing page is https://www.somesite.com/en/job-search/human-resource-manager-jobs/ From this page, I want to download all pages that begin with the following url's : https://www.somesite.com/en/job/human-resource-

After that go to Page 2 of the initial landing page which would look something like this: https://www.somesite.com/en/job-search/human-resource-manager-jobs/2/?src=16&srcr=12 and continue downloading all pages that begin with https://www.somesite.com/en/job/human-resource-

Repeat this process until all pages paginated from the initial landing page is crawled.

How do I do this?

Richard Moss

Hello,

Thanks for the question and sorry for the delay in replying. You can add an "exclude" rule with the expression .* to exclude all content, then add an "include" rule with the expression /en/job/human-resource- to include matching URL's.

At present WebCopy doesn't let you specify a "distance" from the source page before crawling halts; that is on the todo list. But based on your description I think the two rules above should suffice.

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.