WebCopy hangs at random places

middlesky · October 24, 2020, 06:23:55 PM

Hi there, thank you for developing this product.

I have had problems downloading the following site with Javascript (using version 1.8.1 with IE parsing)
https://interactivebrokers.github.io/tws-api/

I use default settings except with "Use web browser" enabled.

It would work fine but hang at random places. If I kill WC and copy again at the page where it hung, it would work again for a while, but never to full completion.

When WC is rerun, it doesn't skip over files that already exist, but makes a new copy of the file with "-1" or "-2" or "-n" appended to the end of the file name.

As I keep rerunning WC from its hang point, I find that some files have been scanned already 7 times, but WC always hangs before completing; I cannot tell if I ever obtained a complete copy of the website.

I tried to uncheck "Always download latest version" under Project Properties->Advanced, but it didn't change any of this behavior.

According to your blog, it seems you had some difficulties with threading issues? This could be the cause of it.

There seems to be no way to save progress of a copy and resume without re-scanning pages?

Thank you for your dedication and help to users.

Kirby

Richard Moss · October 28, 2020, 07:38:33 AM

Hello,

Welcome to the forums and thanks for the message. I wasn't able to reproduce any hangs when testing with the website you referenced, but I did identify a potential issue with how WebCopy interacts with the embedded IE window and have put a speculative fix in place for that. Version 1.8.1.726 or above will include this fix.

Was there any particular reason you'd enabled the use of IE while scanning? Although I see this site uses JavaScript for the menu interactions, these are interactive so it using IE over native WebCopy shouldn't have any impact.

Currently it isn't possible to resume a cancelled or crashed scan as the working state is lost. Now that 1.8.1 is finally out the door, my next plan is to merge in a large rewrite of crawler logic which forms the basis of 1.9. After that, in 1.10, is when I finally hope to address this by adding proper resume support. WebCopy doesn't have issues with threading per se, but the crawl engine runs on a single thread. When it runs on many threads, crawling is much faster. That is probably what I noted on the blog as I had a testing version of WebCopy that did multi threaded crawling but it needs more work before releasing it to the wild.

Regards;
Richard Moss

Cyotek Forums

WebCopy hangs at random places

middlesky

Richard Moss