Good afternoon! You have a very good WebCopy product and I understand the need for it. The question is, I have a blog and recently I noticed a high load on it, wrote to the hoster and it turned out that a certain IP using WebCopy was downloading my site (http://pyatilistnik.org/). I don't like the fact that I create an unnecessary load that disturbs me and the visitors. I banned this IP but wondered if it was possible to somehow initially restrict the site from such copying, especially without my permission?
Hello,
Apologies for the delay in responding.
By default WebCopy uses a custom user agent (for example CyotekWebCopy/1.9 CyotekHTTP/6.0) (although bear in mind the version numbers change every so often) so you could block anything with the webcopy string. This will only deter people who don't change the settings though. While WebCopy currently doesn't look at robots.txt it is something I intend on adding to the product in a later version - you could add a rule to this to block the WebCopy user agent, when I do implement this functionality it will be checked using the default agent, not anything the user tries to override and I won't be adding an option for users to disable it.
Another option if supported by your blog software or hosting provider would be to soft ban addresses that are making requests quick enough that it is clearly automated.
Sorry I can't offer any better ideas - I've been meaning to add robots.txt support for some time but it is another issue that gets deferred again and again.
Regards;
Richard Moss
Quote from: Richard Moss on August 24, 2021, 07:30:59 PM
Hello,
Apologies for the delay in responding.
By default WebCopy uses a custom user agent (for example CyotekWebCopy/1.9 CyotekHTTP/6.0) (although bear in mind the version numbers change every so often) so you could block anything with the webcopy string. This will only deter people who don't change the settings though. While WebCopy currently doesn't look at robots.txt it is something I intend on adding to the product in a later version - you could add a rule to this to block the WebCopy user agent, when I do implement this functionality it will be checked using the default agent, not anything the user tries to override and I won't be adding an option for users to disable it.
Another option if supported by your blog software or hosting provider would be to soft ban addresses that are making requests quick enough that it is clearly automated.
Sorry I can't offer any better ideas - I've been meaning to add robots.txt support for some time but it is another issue that gets deferred again and again.
Regards;
Richard Moss
Could you add a feature to make requests in a random time frame from a selection. An example would be from 1-10 seconds.
Thank you very much, I will try to write robots in the file, and I will wait for your improvements.