Hi everyone, I'm new to WebCopy and have just started exploring its features. I'm trying to use it to scrape content from a website, but I need some help with the settings.
Specifically, I want to focus on crawling only certain pages of a website—let's say blog posts or product pages—while ignoring things like the homepage or contact lion games (https://1games.io/tag/lion) page. How can I set WebCopy to only target specific types of pages? I'm familiar with the basic options, but I haven't figured out how to filter the pages for my particular needs.
If anyone has experience with fine-tuning the crawl settings or knows of a way to set up custom filters, I'd really appreciate the help! Thank you in advance!
Quote from: scenebozo on February 17, 2025, 08:38:44 AMHi everyone, I'm new to WebCopy and have just started exploring its features. I'm trying to use it to scrape content from a website Doodle Jump (https://doodle-jump.co/), but I need some help with the settings.
Specifically, I want to focus on crawling only certain pages of a website—let's say blog posts or product pages—while ignoring things like the homepage or contact page. How can I set WebCopy to only target specific types of pages? I'm familiar with the basic options, but I haven't figured out how to filter the pages for my particular needs.
If anyone has experience with fine-tuning the crawl settings or knows of a way to set up custom filters, I'd really appreciate the help! Thank you in advance!
You can use "Rules" in WebCopy to include only specific pages. Try setting up URL patterns to match blog posts or product pages while excluding others. For example, if blog posts have "/blog/" in the URL, add an "Include" rule for */blog/* and "Exclude" rules for unwanted sections like */contact/*. Hope that helps!