index-wcopy.html file behavior changed

Started by daler, September 09, 2020, 10:57:55 AM

Previous topic - Next topic

daler

BACKGROUND
I have been using WebCopy to back up my blog website for several months. This month I needed to re-image my computer, at which time I reinstalled Webcopy and ran a new backup.

I am comparing a new WebCopy backup to an older one and I wonder if I have a problem.

In the old backup (before re-imaging) the  "index-wcopy.html" got created, but was empty ... and the "index.htm" was 36 KB. The "Index.htm" appears to be able access all the rest of the copied site correctly, and everything is fine. The TOTAL size of the old backed up site folder was about 100 MB.

NOW, "index-wcopy.html" is HUGE (~670 MB) --- and I'm not clear what it does, or what it means, or what it adds to my backup. The "index.htm" file is a small 39 kb, and links to the rest of the backup files just fine, it seems to me. In this new world, the TOTAL size of the backed up site folder is about 760 MB.

I have not added NEARLY that much to the blog site in the two months between these two backups. Even so, in the old setup, the "index-wcopy.html" file, though created in the backup, was empty and apparently did nothing. Now it is HUGE and I'm not clear what it adds other than taking up space. 

QUESTIONS:
So ... do I need the new, large "index-wcopy.html" file? 
Is there a setting in the Project to get backups back down to the smaller size --- maybe to NOT populate "index-wcopy.html" with so much stuff --- or maybe not populate it at all?

Please HELP!  :o    THANKS much!!! - Dale R. :)

daler

I'm pleased to report I found a tiny setup difference that restored backups to the older, smaller version.
This change makes "index-wcopy.html" empty and small and me happy.

It is the way I entered the website path!
https://www.musicminister.net/  gets me a small, fast, good working backup. I guess that's how I had set up the old backup.

BUT ... leaving out the "www." (so ... https://musicminister.net/ ) was getting me a backup with a HUGE "index-wcopy.html" file.

I'd love to know the reason, but as I tell people I fix things for, "Getting it to work will cost you $1. Making me figure out what made it quit working will cost $100.  (I can uncheck a box and restore service, but might not be able to discover how the box checked itself in the first place ... funny. Hah.)

SO ... this beg for help can be closed unless it would delight someone to explain why the "www." in the path makes so much difference. It would be WORTH $100, but no need to watch your mailbox.  ;D

Many thanks to the makers of WebCopy! God bless you lots. - Dale R.

Richard Moss

Hello,

The index-wcopy.html is created by the Sitemap add-in and it is supposed to create a structured overview of the links found in the scanned site. It isn't directly related to the website content so you can delete it if required.

Alternatively, you can tell WebCopy not to generate it via the Sitemap tab in the Project Properties dialog.

You didn't mention what version of WebCopy you're using, but the 1.8.1 nightlies included some tweaks to how WebCopy tries to calculate site structure (and a fix for a bug that could kill the process). It is possible this accounts for the different in size, but I'd need to look at the URL pattern of your site to be sure. However, I don't see why there is a difference between the naked and non-naked versions of the domain. Again, something I'll do some digging into when I get a moment.

One question of my own - this looks like a WordPress site and you mentioned it is your own blog. Doesn't WordPress include its own export functionality that will likely be superior to using WebCopy to scrape the public HTML?

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.