Difficult site to map, on multiple roots

Started by micnolmad, December 22, 2024, 11:08:59 AM

Previous topic - Next topic

micnolmad

Hello guys

I want to save the cartoons from supercartoons so my children can get a sunday playlist each week, kind of when I was a child when binge watching didn't exist.

The main site is https://www.supercartoons.net

The shows are located as /serie/tom-and-jerry/ or /serie/looney-tunes/ but when you click an episode it is now /cartoon/EPISODE.

/cartoon/ is not searchable, 404. You would need all episode filenames to directly search each of those urls.

The actual mp4 files are located on https://ww.supercartoons.net/SERIE/EPISODE.mp4
Notice the missing w.

Here SERIE can be /tom-and-jerry/ or /looney-tunes/ and EPISODE is the filename.

Example:
https://ww.supercartoons.net/tom-and-jerry/the-million-dollar-cat.mp4

The link to the mp4 file is given only when entering the episode url aka
https://www.supercartoons.net/cartoon/the-million-dollar-cat/

BUT when I start the search on the either the root or serie, i.e. tom and jerry, no matter what depth I set or non at all, I never get the mp4 links and it never follows the next pages or goes into the episode page.

I know I am only using this cyotek program for a few hours.. please can anyone help?
I would love to share my settings cwb but how?

scylla

1. Adjust Link Following Rules
Go to Project > Rules and make sure it's set to follow links under /cartoon/ pages.
Add rules to include URLs that match the pattern:

https://www.supercartoons.net/cartoon/*

and

https://ww.supercartoons.net/*/*.mp4

2. Increase Crawl Depth
Under Project > Scan Settings, set Maximum Link Depth to a high value (e.g., 10).