Cyotek Forums

Products => WebCopy => Topic started by: basso_l on March 13, 2015, 02:01:30 PM

Title: Website crawl failure: path and/or namefile too long
Post by: basso_l on March 13, 2015, 02:01:30 PM
Trying to download one of our websites, we've got a "Website copy canceled due to crawl failure", originated by a url longer than 260 characters. Given that urls longer than 260 chrs are not uncommon for many-levels websites, is there a way to overcome this limit? If not, is there a way to prevent Cyotek WebCopy from stopping and instruct it to skip such long urls?
Thanks.

Best regards,
Luca
Title: Re: Website crawl failure: path and/or namefile too long
Post by: Richard Moss on March 13, 2015, 04:42:40 PM
Hello,

I don't suppose you can give me examples of both the local path the site is to be downloaded to, and the URL that failed? (Masking the characters to make them private is fine, I just need a complete path and URI so I can break up the segments)

Also, what version of WebCopy are you using? This bug was actually fixed quite some time ago, and is also covered by tests so if it breaks, I should immediately know.

The way the existing fix works is it defines what it *thinks* the file name should be based on the URI and your project settings. If it's longer or equal than MAX_PATH it removes the file name part and tries to create a unique short version.

Oddly enough I was actually looking at implementing AlphaFS to allow super long paths, but parked it for now as there's other more important things to resolve.

If you can provide the two values hopefully this should pinpoint the flaw in the existing solution so I can fix it for the next update.

Thanks for the bug report!

Regards;
Richard Moss
Title: Re: Website crawl failure: path and/or namefile too long
Post by: basso_l on March 18, 2015, 01:20:02 PM
Hi Richard,
my local path the site is to be downloaded to is:
C:\Users\basso_l\Documents\Cyotek\www.allapari.regione.emilia-romagna.it

and the URL that failed is:
http://www.allapari.regione.emilia-romagna.it/temi/conciliazione-tra-vita-e-lavoro-1/allegati_conciliazione/ReviewoftheImplementationoftheBPfAWomenandtheEconomyReconciliationofWorkFamilyMainfindings.pdf/at_download/file/Review of the Implementation of the BPfA Women and the Economy Reconciliation of Work & Family -Main findings.pdf

The error we've got was:
200,Percorso e/o nome di file specificato troppo lungo. Il nome di file completo deve contenere meno di 260 caratteri, mentre il nome di directory deve contenere meno di 248 caratteri.
(200, path and/or specified namefile too long...)

WebCopy version in use is 1.0.9.1.

Please let me know if you need any more clarifications.

Regards,
Luca
Title: Re: Website crawl failure: path and/or namefile too long
Post by: Richard Moss on March 18, 2015, 04:31:04 PM
Wow... that is a long URL.

Thanks for the information. As it turns out, I finished fixing the form bug at the weekend and I've been looking at this one - I'd already determined the existing code had some holes in and was rewriting that particular section along with a bucket of proper tests - I'll run your path and URL through it too to make sure it works.

Thanks again for the info!

Regards;
Richard Moss
Title: Re: Website crawl failure: path and/or namefile too long
Post by: Richard Moss on March 18, 2015, 05:52:50 PM
As a quick follow up, the details you provided above, when ran through the updated code, now map to

C:\Users\basso_l\Documents\Cyotek\www.allapari.regione.emilia-romagna.it\temi\conciliazione-tra-vita-e-lavoro-1\allegati_conciliazione\ReviewoftheImplementationoftheBPfAWomenandtheEconomyReconciliationofWorkFamilyMainfindings.pdf\at_download\file\Review of.pdf

Due to the length of the directory, the routine jumps up the tree until it finds a spot where the path is < 248 characters, and then tries to fit in the filename - in your example it's a very long filename and most of that context is lost, being reduced to Review of.pdf.

While not entirely ideal, there's probably not much I can do - I'm not going to try and shrink directory names themselves at this point in time, it's enough to cross the bug off the list knowing your file will be downloaded.

I might add some form of warning if the base path before crawling is already quite long - C:\Users\basso_l\Documents\Cyotek\www.allapari.regione.emilia-romagna.it\ is a lot of characters before you even start traversing the site.

Regards;
Richard Moss
Title: Re: Website crawl failure: path and/or namefile too long
Post by: basso_l on March 19, 2015, 11:41:03 AM
To conclude, what will be your strategy in case of very long urls? Will you issue a warning and let WebCopy continue it's crawling? That would be a reasonable solution of the problem, though not the ideal one.
Thanks a lot,

Luca
Title: Re: Website crawl failure: path and/or namefile too long
Post by: Richard Moss on March 19, 2015, 04:31:15 PM
The warning would likely be something that is displayed when you change the project's URI or root folder and it determines the combination of the two mean that space will be constrained. I certainly won't prevent them from using it, as even if they specified a path of 248 characters long (the maximum you can use for a directory name) that still leaves plenty of space for filenames, as WebCopy automatically creates unique file names where they conflict.

Edit: If you mean warnings while the crawl is in progress, I'll have to think on that one - there isn't really a warning system built in as such. I certainly wouldn't interrupt the copy to display a message box, that would not be a welcome feature I suspect :)
Title: Re: Website crawl failure: path and/or namefile too long
Post by: Richard Moss on March 22, 2015, 07:27:40 AM
Hello,

WebCopy 1.0.10.0 has just gone live which ought to fix this one - there's a whole bunch of fixes (http://www.cyotek.com/cyotek-webcopy/revision-history#1-0-10-0), including a couple specifically related to the path shortening.

No warning yet as I forgot about it until writing this reply.

Let me know if you still encounter any issues.

Regards;
Richard Moss