Recent Posts

Pages: [1] 2 3 ... 10
General Discussion / MOVED: error correction?
« Last post by Richard Moss on Yesterday at 07:01:15 PM »
General Discussion / MOVED: %3A = : et %2F = / not possible
« Last post by Richard Moss on Yesterday at 07:01:01 PM »
WebCopy / error correction?
« Last post by little1406 on May 11, 2021, 12:05:05 AM »
In my first use of Webcopy, a number of files showed "error" "and something like
"server did not respond".   That's possibly due to my 3rd-world-class Internet connection here in Central Texas.   Is there a way I can start webcopy up again and have it go back to those particular files and try again?
WebCopy / %3A = : et %2F = / not possible
« Last post by flojul on May 09, 2021, 07:24:20 PM »
I find your software great however when it sucks up the site it does not download the gif.
it does not recognize
do you have a solution please?
thank you so much
WebCopy / Re: Critical speed decrease for big web sites (pull request)
« Last post by Richard Moss on May 03, 2021, 08:28:27 AM »

Profiling a test website with 1 million pages via the CLI didn't show any major issues and speed was consistent throughout. While I found nothing major I did tweak some functionality to improve performance slightly based on this.

That meant my original assumption was wrong (which I'm actually happy about!), but while the CLI profile was running I started looking through the UI code and found a facepalm inducing possible issue. When I virtualised the reporting lists I wrote the lookup code in a very inefficient way and profiling this with just 10 thousand pages indicated this was a major issue.

Reworking that bad code yielded around a 30% performance increase on the 10 thousand pages test. As that code no longer operates as o(n) performance should be static rather than getting worse and worse over time as you reported.

The original virtual list implementation was for fixed data sets and wasn't designed to work with one that was continuously extended, so there's still more improvements to be made (although they won't match the scale of this first fix).

A build with at least the lookup fix in place should be available by the end of the day (build 789 or above) but I will also do some more work in the other improvements I noted I could make.

Lesson learned, profile the UI more often (I usually profile just the CLI as it is simpler to automate, but I've now extended the GUI to support the same parameters as the CLI so this can be automated as well).

Many thanks for finding this issue and helping to improve WebCopy for other users.

Richard Moss

WebCopy / Re: Problem with URL rewriting
« Last post by matthew.zimmerman on April 26, 2021, 02:18:03 PM »
Hi Richard,

Thanks for looking into this. Please let me know if there is anything I can do to assist.

WebCopy / Re: Problem with URL rewriting
« Last post by Richard Moss on April 26, 2021, 05:29:35 AM »

Thanks for the information, and more importantly the details to reproduce. I believe I have reproduced the issue from this and will next do some debugging to try and find a cause and fix.

Richard Moss
WebCopy / Re: Multilingual interface
« Last post by Richard Moss on April 26, 2021, 04:39:54 AM »

The build process for our products creates language.txt and language.dat for each defined language. The text file isn't directly used, it was included only so that users could suggest corrections to the auto generated text for all languages other than English and to allow future extensibility support.

The .dat file is a binary version of the .txt file and this is the file that is actually used. Future versions will likely drop this in favour of po files or something similar.

Currently the vast majority of the strings in WebCopy aren't available for localisation, so won't be listed in the txt file, for the most part it is only comprised of shared strings.

Richard Moss
WebCopy / Re: Critical speed decrease for big web sites (pull request)
« Last post by Richard Moss on April 26, 2021, 04:32:10 AM »

Thanks for the information. While I have mulled over releasing the source code to the crawler component in the past, it isn't something that will be happening at the moment. I've logged issue #399 to look into this further. My initial assumption given your description is the slowdown is querying the link map but I won't know until I do some profiling once I have set up a somewhat larger dataset than the one usually used for testing.

Richard Moss
ImageBox / Re: Starting coordinate points
« Last post by Deer2014 on April 25, 2021, 11:43:05 AM »
Thanks, Richard.
Pages: [1] 2 3 ... 10