« Last post by joem789 on February 16, 2019, 01:01:36 AM »

I am a member of an informational website. It has what seems like a simple user pass login form. The program does pop open the page where I put in the login info. Then it logs in fine. I click Copy and the program goes to work. But there seems to be no shortage of errors throughout the crawl. When I try to open the offline website, I cannot seem to get past the login page. So I am not sure what's happening to prevent the success. I do know that there seems to be plenty of script errors (typical IE).

Seems like a very promising program, once the bugs are ironed out.
« Last post by Richard Moss on February 04, 2019, 06:13:05 PM »

Thanks for the follow up, glad to know you found a solution. I suppose I should start thinking more seriously about replacing IE with Chromium, I expect this issue will pop up more and more frequently.

If you have have defined forms to post, these will be applied after the cookies are copied from the web browser, so I can see how this would break if the post failed and wiped the cookies - I'll add this to the documentation.

Thanks again for the follow up!

Richard Moss
« Last post by Abdel1712 on February 04, 2019, 01:38:51 PM »
Thank you for you reply.
well your response actually made everything clear now for my first inquiry too.
I hope you can update the software soon.
I dont Know if it's against the rules here but if you have any recommendation about how to download a website using javascript, plz dont hesitate to tell me.

Thank you.
« Last post by anne on February 04, 2019, 08:30:59 AM »

Thanks for your reply.
I just realised, that I could not log into the front end of my webside with Internet Explorer at all. I never use this browser, so I did not notice. I managed to fix that problem thanks to this poste:

Then it took me a while to realise that I can't have a form and the "log in using web browser" checked at the same time. Having both, the download did not work. With the form deleted, it worked fine.

Thank you!
« Last post by Richard Moss on February 03, 2019, 06:43:39 PM »

In regards to setting up the form definition, did you set it up manually or did you use the Capture tool and only selected fields which were user-enterable,  e.g. username/email address and password?

I'm surprised the the login page didn't seem to work in the embedded browser - that is a normal Internet Explorer window and I'm not doing anything special that should cause things to break. Of course, some websites are making a point of not supporting Internet Explorer anymore but I'm loath to embed Chromium due to the massive dependency size.

Richard Moss
« Last post by Richard Moss on February 03, 2019, 06:40:11 PM »

Thanks for the follow up. The website you referenced looks like it uses JavaScript to construct the exam list. Unfortunately, as per the limitations help topic, WebCopy is not able to execute JavaScript at this time. While I have been mulling over this issue (given the number of complaints I get about it!) and have ideas for adding this in the future, it's not likely to be in the product in the mean time.

I haven't yet had a chance to delve into your previous reply regarding redirects.

Richard Moss
« Last post by Richard Moss on February 03, 2019, 10:51:56 AM »

Welcome to the forums and thanks for the question.

The "Above Root" setting is a legacy of the oldest versions of WebCopy and really should be removed. After all, if you wanted to copy all pages, then you'd simply enter the domain rather than a nested page. At some point this year I'll be rewriting the crawl engine using more modern components and I'll be removing some of these obsolete settings and reworking the engine to actually make sense behind the scenes. There is a "Download All Resources" option (set by default) which allows you to automatically download non-HTML resources regardless of their location.

I started looking into your issue and realised that when I introduced said "Download All Resources" option it was conflicting rather badly with "above the root" pages and downloaded them in full. This has now been fixed and will be available as a nightly build this evening.

With this fix in place, you should now find your scenario works "out of the box" - if you are copying from /first/second/third/index.html and the "Download All Resources" option is set then it will include any non-HTML resources found elsewhere on the site, e.g. linked files in /photos.

Thanks again for finding this bug!

Richard Moss
« Last post by alixira on February 03, 2019, 10:06:30 AM »
I'm using your sofwater to turn simple wordpress websites into static html. Everything works fine however I need to change some URLs.
I tried to understand but regular expressions are very difficult for me.  :'(
I need to know how to:
1) remove the word index.html in each link
Example: in the online website: after download -> /name-page/index.html how to remove index.html?

2) leave unchanged or change the domain site name
Example: in the online website: after download -> /name-page/index.html -  I need: or

3) always in the links, rename some words with the relative directories
Example: rename /wp-includes/ to /includes/ or to

I hope I have been clear enough,
and I thank you very much in advance for the help you will give me.
« Last post by ShadowWizard on February 01, 2019, 10:12:57 PM »
So, i am trying to copy a part of a webpage, including photos.  However the location of some of the photos are above the url I am starting from.  However when I tell it to crawl "Above root" in the advanced options it crawls the whole site (As I assume there are links the take it back) how do I tell it to only go forward on URL links, but to get photos from anywhere?
No, I can not post the site I am pulling from for confidentiality reasons.
Example.  Page I want to pull from:
Photos are stored in:
This means any photos that are on will not download.  I need them to.
However it should NOT browse to even if there is a link to it on
However it SHOULD go to (Assuign there is a link on to it of course)
« Last post by anne on February 01, 2019, 01:28:31 PM »

Thank you for this great tool.
I downloaded my Joomla! website with the verison and it worked great. However, there is a part of the website you can only access with a login. This sites where not included in the first download.  I tried to set up Forms with Passwords. But with these, the copying stops again right after starting. In the Results section, it says, the login URL Source is Unknown the the Websites URL ist Skipped. The Error Tap however is empty.

I've tried to test the form, but then I get the error message that either the security token did not match or my session had expired. Sometimes it also just opens the homepage without even going to the login page.

I've also tired it with the "Log in using web browser" checked. Then it asks me to log in in the pop up browser after I stated the copying. However, if I click on the "Login" button after entering the login information, nothing happens.

Any ideas what I should be doing differently?

Thank you so much!
