Copying Site content that uses username and password (ERROR)

Started by Abdel1712, January 19, 2019, 01:31:47 PM

Previous topic - Next topic

Abdel1712

Hello,

Thank you for this great tool!
My question is as follows.

Background
I'm trying to download a site with a plateform of online tests (questions and answers) on it. Normally I just login and navigate to any page I want.
Only the initial login is required.

Problem
I've set up webcopy 1.7 to download a portion of this site by pasting a narrow URL related to my desired page. The thing is I get tthis error as mentioned in the attached picture.

note: I have used to the capture form for the username and password information to login.


Question

Do you have any idea what might be going wrong?

Thanks in advance
[attachment id=1 msg=792]

Richard Moss

Hello,

Welcome to the forums and thanks for the  helpful post.

Unfortunately that looks like a definite bug with WebCopy. Can you copy and paste the JSON from the error report window into a reply so that I can try and determine what the cause is for a fix?

Thanks;
Richard Moss

Abdel1712

The error has gone now! but still didnt work ....I dont know what is going on.
check this out!

Richard Moss

Hello,

Thanks for the reply. There's only one Form definition in your screenshot, compared to two in the first, so I assume that means one of those forms was the problem - that's something I can look into anyway.

In regards to the skipped items, the list shows they are "External" and this is evident by the host you're trying to copy starting with "m" and those URL's starting with "c". You would need to add the "c..." as an Additional Host, or if the "c" is a subdomain of "m", try changing the Crawl Mode to "Sub domains".

Alternatively, if the files on "C" are not pages but images or other resources then ignore the above and just select the "Download all resources" option.

Hope this helps.

Regards;
Richard Moss

Abdel1712

Thanks richard! I have  tried to follow your suggestions regarding the subdomain and it did work partly.
the first pic shows that some of the pages are turning green and it did work perfectly.
the second pic shows some problems in the pages, FYI these pages are related to the previous ones (pic1), for example: there is a webpage full of titles in form of hyperlinks that supposed to be the destination for courses.
pic1:
[attachment id=0 msg=796]

pic 2:
[attachment id=1 msg=796]
for details please check your PM!
thanks.

Abdel1712

One more thing, can you help me regarding the skipped urls, how I can download them after the copy process couldnt do it?
it is mentioned : status= skipped , reason= Redirect.

Thanks

Abdel1712

{URGENT}
Does anyone know how to deal with the situation where webcopy shows skipped pages for the reason of redirects? (attached in the pic. below)
Thanks!

Richard Moss

Hello,

Have you tried changing the redirect configuration options in the project properties? By default only internal redirects will be followed.

Regards;
Richard Moss

Abdel1712

Yes, I have tried all redirect options available, none of them fixed the problem. I also tried to increase the maximum redirect chain length.
is there any other suggetions or it's just beyond the software capacities?

Thanks.

PS: if you envisage some kind of solution and maybe you need more information regarding my target website or my login information in order to identify the problem, dont hesitate to Private message me.
I will be more than happy to provide additional information in order to improve the software.

Richard Moss

Hello,

Just to clarify - you are aware that WebCopy will always flag pages that do a redirect as skipped? While it doesn't download the file, it will always check the response for the Location header and extract the URL that the redirect points to from that. It then decides if the new URL should be processed based on your redirect settings.

Therefore, is the problem that WebCopy is skipping the destination files, or is it simply that the UI is confusing and you think it isn't doing something it is?

For example, if you see in your log that an item was skipped as it was a redirect, context click the item and choose Properties. The dialog should state the redirect destination - and there's a properties button next to it. Clicking that should then display the properties of the destination which will easily let you see if it was processed or not.

There is a different response code, too many redirects, displayed if a redirect processing was deliberately broken due to the chain length so is unlikely to be your issue.

Hope this helps, if not please let me know.

Regards;
Richard Moss

Abdel1712

First let me express my apology to you because I may misunderstood you due to my fledgling experience in this technical domain.
I will try to follow up your example and attach pictures accordingly. (if I miss something let me know).

1-I click to properties choosing one of the skipped pages(picture N01)
2-I go to the destination file saved in my PC where you can see that all skipped pages have been processed (picture N02).
3-I open one of the skipped pages using a browser (picture N03) and as you have noticed it's mentioned "moved permanently" and there is a click HERE button that take you back to the initial webpage link that was skipped.

Honestly I don't have an opinion on this issue at all, maybe my user interface on this webpage is too complex and confusing to be handled.

-One more thing is that I have also tried to narrow the copying process just by choosing one of the skipped pages and past it in the URL area and clicked on "COPY" to see whether it may work or not but the result was similar to the previous one.

Any suggetions to add are welcomed Mr. Richard.

Thanks.


Abdel1712

My desired website is similar to this one : http://gre.kmf.com/exam , in terms of structure and content.

So maybe it can help you figure it out because even the mentioned website above didnt work.

Thanks.


Richard Moss

Hello,

Thanks for the follow up. The website you referenced looks like it uses JavaScript to construct the exam list. Unfortunately, as per the limitations help topic, WebCopy is not able to execute JavaScript at this time. While I have been mulling over this issue (given the number of complaints I get about it!) and have ideas for adding this in the future, it's not likely to be in the product in the mean time.

I haven't yet had a chance to delve into your previous reply regarding redirects.

Regards;
Richard Moss

Abdel1712

Thank you for you reply.
well your response actually made everything clear now for my first inquiry too.
I hope you can update the software soon.
I dont Know if it's against the rules here but if you have any recommendation about how to download a website using javascript, plz dont hesitate to tell me.

Thank you.