Main Menu

Recent posts

#1
Announcements / Forums Temporary Read Only / R...
Last post by Richard Moss - October 18, 2025, 11:36:12 AM
With the introduction of the Online Safety Act, these forums were switched off. I've decided to re-enable them, but in read-only form, in case any useful information is buried somewhere. User registration has also been disabled. Private messages are disabled.

I don't have to time to figure out how to complete risk registers, and I certainly will not add privacy destroying age verification features - ask Discord how that is working out. The forums have always been a ghost town and I just can't afford the resources for resolving this.

Funny how this Act is causing problems for the small fry while Big Tech will continue to screw over everyone in the name of profit and without any recourse.
#2
WebCopy / Re: Difficult site to map, on ...
Last post by scylla - March 10, 2025, 10:15:18 PM
1. Adjust Link Following Rules
Go to Project > Rules and make sure it's set to follow links under /cartoon/ pages.
Add rules to include URLs that match the pattern:

https://www.supercartoons.net/cartoon/*

and

https://ww.supercartoons.net/*/*.mp4

2. Increase Crawl Depth
Under Project > Scan Settings, set Maximum Link Depth to a high value (e.g., 10).
#3
WebCopy / Re: Using 'input-file'
Last post by RustyHill - February 27, 2025, 06:27:13 PM
I found this worked for me in the CLI version.

wcopy /input-file "C:\Scripts\target.txt" /O D:\websites /recursive

The target files is just a plain txt list, CRLF seperated, of URL -
https://bbc.co.uk
https://cnn.com

etc

Hope this helps.
#4
WebCopy / Re: How Can I Customize WebCop...
Last post by Traciuarez - February 25, 2025, 08:16:50 AM
Quote from: scenebozo on February 17, 2025, 08:38:44 AMHi everyone, I'm new to WebCopy and have just started exploring its features. I'm trying to use it to scrape content from a website Doodle Jump, but I need some help with the settings.

Specifically, I want to focus on crawling only certain pages of a website—let's say blog posts or product pages—while ignoring things like the homepage or contact page. How can I set WebCopy to only target specific types of pages? I'm familiar with the basic options, but I haven't figured out how to filter the pages for my particular needs.

If anyone has experience with fine-tuning the crawl settings or knows of a way to set up custom filters, I'd really appreciate the help! Thank you in advance!
You can use "Rules" in WebCopy to include only specific pages. Try setting up URL patterns to match blog posts or product pages while excluding others. For example, if blog posts have "/blog/" in the URL, add an "Include" rule for */blog/* and "Exclude" rules for unwanted sections like */contact/*. Hope that helps!
#5
WebCopy / Download a site that has links...
Last post by eugene - February 20, 2025, 11:06:50 AM
Hello.
I need to download a site that has links to google drive files. I tried to download the site, the site was downloaded but without google drive files. More precisely,  there are html files instead docx files that was stored on the Google drive.
I would be grateful for any help.
#6
WebCopy / How Can I Customize WebCopy's ...
Last post by scenebozo - February 17, 2025, 08:38:44 AM
Hi everyone, I'm new to WebCopy and have just started exploring its features. I'm trying to use it to scrape content from a website, but I need some help with the settings.

Specifically, I want to focus on crawling only certain pages of a website—let's say blog posts or product pages—while ignoring things like the homepage or contact lion games page. How can I set WebCopy to only target specific types of pages? I'm familiar with the basic options, but I haven't figured out how to filter the pages for my particular needs.

If anyone has experience with fine-tuning the crawl settings or knows of a way to set up custom filters, I'd really appreciate the help! Thank you in advance!
#7
WebCopy / Re: The rules do not seem to b...
Last post by jeandemarc - January 26, 2025, 01:28:17 PM
Try adding a rule "Content Type" = text/css and text/javascript

Like this: (text/css|text/javascript)
#8
WebCopy / The rules do not seem to be wo...
Last post by Esche - January 03, 2025, 10:43:46 AM
Hi all, Im new to WebCopy.
Now I need to copy a website whose critical css and js stripts are storaged in another domain, so I tried to set some rules to include those files for downloading.
However, after the rules I set have passed the rule checker, such files still be skipped to download.
I have no idea how to solve this problem now, could anyone tell me what could I do now? Many thanks!

I have attached the copy result and how the stripts are cited in the webpages below.

#9
WebCopy / Creating a Kiosk from NASA dat...
Last post by evolgomi - December 24, 2024, 06:23:21 PM
I'm attempting to create an offline website for a kiosk in a museum.  We have a lot of these actual patches on display, and would love to have a little iPad or something to give more information.

Nasa has the gallery here:

https://www.nasa.gov/gallery/human-spaceflight-mission-patches/

I need to limit what links can be gone to, and was hoping to be able to just download the 7 gallery pages and the corresponding detail pages.

When I attempt to copy just that link, the scan pulls EVERYTHING.  So insane amount of data.

I've tried doing a does not match string on the above link, and get nothing.

Any help is appreciated.
#10
WebCopy / Please include an option to co...
Last post by whiggs - December 22, 2024, 04:19:38 PM
Hello,
I am trying to scan a website using the "scan" button right next to the input box where you type the URL:

You can see the URL for the site that I want to scan in the image, but I will include it at the bottom of the post so that you can just copy/paste it.  In any event, when I attempt to scan the site indicated in the picture, part way through the process, this appears on my screen:

You can see the url that the application ran into which generated the error in the message box, but the error also immediately terminates the scan and prevents it from scanning the remainder of the site, which is NOT what I want the program to do, so including an option to ignore errors like this would be awesome.