website with login and cookie

Started by flozirkus, September 20, 2015, 11:38:30 AM

Previous topic - Next topic

flozirkus

Hi,

I've to a website that uses login with session cookie.
Setting up in webcopy the login form seems to work, but it can only be done for one specific site.
Normally the login sets a cookie where all information are registered. If I login outside of webcopy on a standard browser, taking the sessionid from there and adding it to the url works as well, but only for the starting page, not for the crawled links.
Do you have any idea how to solve it or what am I doing wrong?

Thanks and greetings
Florian

Richard Moss

Hello,

Thanks for the message. I'm afraid I don't quite understand your problem, can you provide more details?

If you use WebCopy to log in to a website at the start of a copy and a cookie is returned, this cookie will be sent to all future copy requests during that particular copy. If other requests set cookies, then these too should be honoured.

The downloaded website more than likely won't make use of cookies at all as it is a static copy, not the original dynamic site.

Regards;
Richard Moss

flozirkus

Hi Richard,

thanks for the fast replay.
Maybe the reason for the problem is that I do not know how to do it:
"If you use WebCopy to log in to a website at the start of a copy ..."
I looked for a how-to but did not find anything.

I would be happy to provide more information about the site and even login-data but not in a public forum.

Looking forward your anser
Best regards
Florian

Richard Moss

Hello,

To be honest, I'd prefer you didn't share credentials anyway (definitely not public, and not privately either), a secret is not a secret if it's shared, even with the best of intentions! And a public forum is definitely not the place for them, even PM's.

If you need to authenticate yourself with a web site before initiating a copy, then you need to use the Forms and Passwords section. The easiest way will probably be to use the new Capture tool that was added to recent builds.


  • Create a new WebCopy project and set the URL of the website to copy (or open an existing project of course, as long as the URL is set)
  • Click the Capture button on the main window
  • Use the embedded browser to navigate to the page hosting the log in form
  • Select the form (hopefully only one will be listed, if more than one you'll have to judge which is correct)
  • Check each of the parameters you want to supply values for - variations of user name / email and password
  • Click Create Form to save the data and close the Capture tool
  • Now, double click the new form that will appear and in the Form Data field, enter the real user name and password
  • Optionally, click the button to the right of the URI field - this will attempt to post the form so you can see if it works or not. It's a little unfriendly to use right now, but hopefully it will let you know if the attempt was successful. Close the Test URL window when you have finished
  • Click OK to save the modified details

Now, when you start a copy, WebCopy will post the form which will log you in if everything is set up correctly, then continue with the job of copying your website.

Hope this helps!

Regards;
Richard Moss

flozirkus

Hi Richard,

thanks for the extensive reply and the good explanation.
Unfortunately it works only partly.
The site I define the form/password for is captured correctly as if logged-in. But each captured site linked to that page is capture as logged-out.
Do you have any idea why this happens or what I can do?

Another way would be do to the webcopy-login for each page as the formular is existing on each page, but only if there is a simple way.

Best regards
Florian

Richard Moss

Hello,

The most obvious things to check would be


       
  • Did the login really work?
  • Did you define a rule to prevent log outs?
If as you state you are sure the login is succeeding, then you need to ensure that WebCopy doesn't access any URL that would subsequently log you back out.

flozirkus

Hi Richard,

the second point came to my mind as well and I created a rule to prevent Webcopy following this logout link.
Regarding the point if the login really works I can only say that it in the URL-Tester the page is shown after successfull login.
The last idea I had was that several connections are used to download the page but only one is really logged in. I could not test it as I did not found such settings but this finally was the key for success with a similar tool.

Thanks for your support so far. If you want to go deeper finding the solution for this issue I'm happy to answer any question.

Best regards
Florian

Richard Moss

Hello,

Thanks for the follow up. Peculiar, it sounds very similar to another users issue. Unfortunately, as I can't reproduce it's somewhat difficult to fix!

Does the site you are trying to copy use sub domains do you know?

flozirkus

Hi,

sure without being able to try it yourself it's hard to do something.
No, the site is not using subdomains.

Greetings
Florian