Website crawl failure: path and/or namefile too long

Started by basso_l, March 13, 2015, 02:01:30 PM

Previous topic - Next topic

basso_l

Trying to download one of our websites, we've got a "Website copy canceled due to crawl failure", originated by a url longer than 260 characters. Given that urls longer than 260 chrs are not uncommon for many-levels websites, is there a way to overcome this limit? If not, is there a way to prevent Cyotek WebCopy from stopping and instruct it to skip such long urls?
Thanks.

Best regards,
Luca

Richard Moss

Hello,

I don't suppose you can give me examples of both the local path the site is to be downloaded to, and the URL that failed? (Masking the characters to make them private is fine, I just need a complete path and URI so I can break up the segments)

Also, what version of WebCopy are you using? This bug was actually fixed quite some time ago, and is also covered by tests so if it breaks, I should immediately know.

The way the existing fix works is it defines what it *thinks* the file name should be based on the URI and your project settings. If it's longer or equal than MAX_PATH it removes the file name part and tries to create a unique short version.

Oddly enough I was actually looking at implementing AlphaFS to allow super long paths, but parked it for now as there's other more important things to resolve.

If you can provide the two values hopefully this should pinpoint the flaw in the existing solution so I can fix it for the next update.

Thanks for the bug report!

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

basso_l

Hi Richard,
my local path the site is to be downloaded to is:
C:\Users\basso_l\Documents\Cyotek\www.allapari.regione.emilia-romagna.it

and the URL that failed is:
http://www.allapari.regione.emilia-romagna.it/temi/conciliazione-tra-vita-e-lavoro-1/allegati_conciliazione/ReviewoftheImplementationoftheBPfAWomenandtheEconomyReconciliationofWorkFamilyMainfindings.pdf/at_download/file/Review of the Implementation of the BPfA Women and the Economy Reconciliation of Work & Family -Main findings.pdf

The error we've got was:
200,Percorso e/o nome di file specificato troppo lungo. Il nome di file completo deve contenere meno di 260 caratteri, mentre il nome di directory deve contenere meno di 248 caratteri.
(200, path and/or specified namefile too long...)

WebCopy version in use is 1.0.9.1.

Please let me know if you need any more clarifications.

Regards,
Luca

Richard Moss

Wow... that is a long URL.

Thanks for the information. As it turns out, I finished fixing the form bug at the weekend and I've been looking at this one - I'd already determined the existing code had some holes in and was rewriting that particular section along with a bucket of proper tests - I'll run your path and URL through it too to make sure it works.

Thanks again for the info!

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

Richard Moss

As a quick follow up, the details you provided above, when ran through the updated code, now map to

C:\Users\basso_l\Documents\Cyotek\www.allapari.regione.emilia-romagna.it\temi\conciliazione-tra-vita-e-lavoro-1\allegati_conciliazione\ReviewoftheImplementationoftheBPfAWomenandtheEconomyReconciliationofWorkFamilyMainfindings.pdf\at_download\file\Review of.pdf

Due to the length of the directory, the routine jumps up the tree until it finds a spot where the path is < 248 characters, and then tries to fit in the filename - in your example it's a very long filename and most of that context is lost, being reduced to Review of.pdf.

While not entirely ideal, there's probably not much I can do - I'm not going to try and shrink directory names themselves at this point in time, it's enough to cross the bug off the list knowing your file will be downloaded.

I might add some form of warning if the base path before crawling is already quite long - C:\Users\basso_l\Documents\Cyotek\www.allapari.regione.emilia-romagna.it\ is a lot of characters before you even start traversing the site.

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

basso_l

To conclude, what will be your strategy in case of very long urls? Will you issue a warning and let WebCopy continue it's crawling? That would be a reasonable solution of the problem, though not the ideal one.
Thanks a lot,

Luca

Richard Moss

The warning would likely be something that is displayed when you change the project's URI or root folder and it determines the combination of the two mean that space will be constrained. I certainly won't prevent them from using it, as even if they specified a path of 248 characters long (the maximum you can use for a directory name) that still leaves plenty of space for filenames, as WebCopy automatically creates unique file names where they conflict.

Edit: If you mean warnings while the crawl is in progress, I'll have to think on that one - there isn't really a warning system built in as such. I certainly wouldn't interrupt the copy to display a message box, that would not be a welcome feature I suspect :)
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.

Richard Moss

Hello,

WebCopy 1.0.10.0 has just gone live which ought to fix this one - there's a whole bunch of fixes, including a couple specifically related to the path shortening.

No warning yet as I forgot about it until writing this reply.

Let me know if you still encounter any issues.

Regards;
Richard Moss
Read "Before You Post" before posting (https://forums.cyotek.com/cyotek-webcopy/before-you-post/). Do not send me private messages. Do not expect instant replies.

All responses are hand crafted. No AI involved. Possibly no I either.