Main Menu

Basecamp Classic

Started by MarkBaillie, April 13, 2016, 01:24:38 PM

Previous topic - Next topic

MarkBaillie

I'm looking to take a backup of my basecamp classic projects but am having a terrible time getting WebCopy to log in. I think I've done everything I should with the forms but think Basecamp must have some other authorisation token that's screwing things up.

On hitting my basecamphq.com site I get redirected to /login, the form for which should submit to https://launchpad.37signals.com/authentication. However, when I submit that returns a 404. If I watch a login via Firebug or fiddler that page responds fine.

Has anyone tried WebCopy on Basecamp before ?

Richard, rather than give my credentials I could make you a basecamp user if you want to test ?

P.s. Awesome bit of software, so thank you.



triclabendazoleonline.com


Richard Moss

Hello,

Welcome to the forums, the latest in a long list of users with WebCopy login problems! Not a record I particularly enjoy in fairness. The 404 you mentioned could be a 405 in disguise I suppose, when you watched the traffic with Fiddler/Firebug did you notice which method was being used? Or perhaps additional headers need to be passed which identify your particular site.

I'm glad you like the software, despite it's little flaws and foibles!

If you are able to create a read-only user which can't do anything, that may be useful although it may be a few days before I can do any in-depth debugging. But if you can check what is being sent in regards to methods, headers and anything else that doesn't get set when WebCopy tries to login that could be a great help.

Thanks;
Richard Moss

MarkBaillie

Hi Richard,

Thank you very much for your reply. Don't take it badly, I think 37 Signals are doing something tricky with basecamp's login - I'm sure it's their fault !

1.
The login form starts at xyz.basecamphq.com/login. This has a cookie of _basecamp_session_v2. The form action is to POST to launchpad.37signals.com/authenticate with it has an INPUT HIDDEN of authenticity_token which may be a part of this

2.
launchpad.37signals.com/authenticate responds with a 302 and a location of xyz.basecamphq.com/login/authenticate and a "sig" querystring

3.
xyz.basecamphq.com/login/authenticate responds with a 302 and a location of xyz.basecamphq.com/clients indicating a successful login

I'm afraid I can't see any tricky HTTP Headers or anything that would help explain it.

I'll create a sample basecamp project and user and PM you the details. Your help is greatly appreciated.

Regards,
Mark

Richard Moss

#3
Thank you for the test login, that was extremely helpful. I identified two separate issues that were preventing a successful login. As you said, the Basecamp login is tricky!


  • In order to get dynamic values like the authentication token, WebCopy needs to GET the login page and pull out all the form fields, then merge in anything custom you have defined, before POST'ing the page. Only problem is that the GET uri and POST uri were completely different - hence the 404
  • The second problem was the login process is actually two stages. Unlike most logins which perform the login on the POST, Basecamp requires a little more. In order to complete the login, you need to GET the URI returned from the POST's 302. As I didn't consider this, WebCopy always discarded the response and didn't do anything further with it, except to abort the copy if the response wasn't a 302

Fixing the second issue was easy enough, I now simply do a GET on the URI returned by the Location header after a successful post. Currently I'm completely ignoring the response, it's enough to get the data sent to the second URL to complete the login.

Fixing the first issue was just as easy - but I'm not entirely happy with how I've fixed it as now I've made the UI more complicated again and I already think WebCopy is a bit of an unfriendly beast at the best of times. There is now a second URI field for each form definition, which allows you to set the GET URI. This second field only needs to be populated if the POST and GET URI's are different. I've updated the Capture Form tool to support this automatically which might help - definitely something I need to look into before making a stable release. I also updated the Test URL tool to include this second field.



The screenshot above shows the form editor for your form (with dummy values for credentials and domain of course!). The top field is your POST URI. The bottom URI is the GET URI. In your existing WebCopy project, just edit your form, and set the second field to be "login" (these fields can either be absolute, or relative to the WebCopy project URI).

You will also need to add a rule for the /logout URI, otherwise after scanning a few URI's you'll be automatically logged out. You may also need to set the Do not prompt for passwords option (Forms and Passwords | Passwords) as one of the URI's I hit while testing issued a 401 challenge.

I haven't thoroughly tested this change, I stopped once the scan clearly showed the authentication was successful and I had full access. I also haven't written any new tests to cover these changes, although it doesn't look like any regressions have been introduced as all existing tests pass so it should be fine to use.

I don't know when I will release a stable build with this fix as I want to have another think on the UI front, but the next stable WebCopy update is overdue so I probably can't tarry for too much longer. However, the nightly build is available now so if you feel brave you can always grab that - feedback appreciated!

Thanks again for providing the initial bug report and the test login which allowed me to find and fix it!

Regards;
Richard Moss

MarkBaillie

Thank you, thank you, thank you. I've tried the nightly and it's doing it's best to pull down the site and fill my hard drive with 8 years of projects.

Interestingly it's stopping with errors on a 200 Status Code "Input string was not in a correct format", cant see an issue causing and, and you know - 200 ?

Is there a setting I can put in to get it to just jump over any errors, I don't need my crawl to be 100% perfect ?

Cheers,
Mark

Richard Moss

Hmm. That's a .NET exception. It's expecting a number and gotten something completely different.

If you right click the URL in the log (note that this will only work for the current session, not if you close and reopen the project) and select Properties there should be an Error Text field, with a Report link next to it. Click that should send the exception details to us, which will give a chance to find and fix the bug.

If the test login is active I can always try running that again on the off chance one of the URL's it has access to will expose the bug.

By default it should ignore most errors, not sure why it's halting at the 200 - I'd never get anything done if it stopped at every bump in the road :)

MarkBaillie

Hi Richard,

I reported one of these oddities yesterday and the test login should still be up. If you want I can add you to the project where the error keeps getting thrown, there's nothing in there I'd be worried about you seeing and your level of login can't do any harm.

Thanks,
Mark

Richard Moss

Hello,

For the relative URI /projects/1094207-non-chargeable/todo_lists/21348636 can you check the output folder where WebCopy was placing these files (note that any of the segments in that URI may have (nnn) sequences in them as WebCopy tries to avoid existing files/folders) and confirm if part of the path includes the { or } characters?

If the folder names do include this character, can you also confirm if the root folder that you specified (i.e. not the bits that WebCopy auto generates) contains this character? I've managed to reproduce the bug from the stack trace in the exception report you logged, but if this is the cause, I'm unable to figure out how that character got into the path unless it was already there.

A new nightly will be available shortly after this post with a fix for this issue, so if you can give that a whirl too, that would be great.

Thanks;
Richard Moss

MarkBaillie

Hi Richard,

Once again you were right, the file path had a "}" in it. I should have thought of something like that, we're seeing much the same issue with WebDeploy from Visual Studio to IIS. Your nightly sorted it out.

Can I ask about Copy > File Options ?
Basecamp seems happy to return pages with no extensions which then screws up if those pages have child pages. e.g.

  • /to-dos becomes /to-dos
  • /to-dos/123 becomes /to-dos (1)/123
  • /to-dos/456 becomes /to-dos (1)/456
This means the links in /to-dos are pointing to the wrong place. I've tried "Remap file extensions by content type" as "Only if no extension present" which makes this /to-dos.html.

I was then expecting "Remap references within downloaded files" to change any href to "/to-dos" to become "/to-dos.htm", but they hadn't. Now I've not finished a scan yet, but does it go back on remap these at the end, or am i missing something. Effectively I don't care how the files end up named, as long as they link ok.

Cheers,
Mark

Richard Moss

Yes, WebCopy only remap all URI's once copying has complete, that way I don't have to keep track of each page's "completion" status. It is something I may change in the future, but only once I've had the time to do some fairly major architectural changes to try and reduce memory requirements and improve performance.

The current options for choosing how extension-less URL's are frankly rubbish, and it's been on the todo list for a long time to improve them.

With that said, I'm not aware of any actual issues to do with remapping aside from limited options (cyotek.com uses extension-less URL's and that scans fine) but that doesn't mean there isn't any unknown issues so please let me know if you come across anything else. It's often frustrating for users like yourself to hit these breaking problems, but it all leads to a better product in the end. In theory!

Thanks;
Richard Moss

MarkBaillie

Sorry I didn't want you to think there was a problem with it, it's just that the basecamp site is a *massive* crawl and I didn't want to let it run without knowing that it would rewrite the urls okay.

I'm giving it a shot today, will let you know if there's any problems.

Thanks again for all your help with this.