Cyotek Forums

Products => WebCopy => Topic started by: lswang on January 18, 2016, 06:47:57 AM

Title: Encoding of URL's is invalid
Post by: lswang on January 18, 2016, 06:47:57 AM
Hi Richard,

Good morning.
I have downloaded the target website, it make it more quick to access the website now.
But I still facing challenges on some issues, it should be caused during the generating of local links between storage folders:
1. If the slug in the WP pages is in English, the generated link and local storage folder can be open no problem.
(https://lh3.googleusercontent.com/SumVIciQE8rN0wuEE_Pexm2jnNXL2gSqrYC4cwaOM8Q4Jvo1H-srJEuFy7bGBRHfDiD4BMpByGYSs6Pt07bjWc8vf5m59UwBcJsgIXFqyUukoVKiQFQ5oZs7Y0A74PHl8txPZwQewOVZ7Nk6eeAwF2GdPxXw2f4GbQf33XrTD0jSLuoJQ7pp0OLx91H68T8WvvQxYIEueXBNTgKT25AeYoFqo8tNCRzcEjgnEnvmRUBREXQkQcNMfhpOmBBgLMKupkr2fuubodrMRddYfbq2xgJC1qe7PQMA8jPd4ruQP7RcKzxbxrQj5EPwlT9xrsPYpv8-PY00LBOEfbD1EbYKEkJIbRXhwrzyB5YUeitffJ-aO7Xibwp3Q7YwA2kfM2a8U9fCpdxh_hKGebOYDmPJa2sbML2krHFj7X3ct1J6MpoKOu_q5vbvUJPVLHnOxKXmz6EuMil2P-vKkvLlCJgziqL_-S54PIVWCr-plW_HXeds0rA2O-E9gg_UObfc1UD-c7ozXxWnCJZ2-AedNEHsZfO0UL-IJWUZlDGOa5k7GZsDyyaAxuufks7197dImNottOSO=w551-h479-no)

2. But if the page don't have a English character but a Chinese or other format of Character, it can't be open locally later:
(https://lh3.googleusercontent.com/vkGzsOnpj4U_q-h-nmegGxxqxJFohQL0kZnIeWXUpOZ8-q3LD8a1kJ1UT65CbwZ9LhrsaCSW6UTUH8xar3GM59gnIU-sah4fbWbSnmxJOOK7PutvL2E37RuVuxQgL-HuHUiWpjtUCac__64mJqb9CuXILLO-LJQYdPaw9QV9XnM0gqpGq5mbmcHDabXbf17OIc7X5QwM_yH5elDHWiPPGOlwc1R5fcZImYIVT9iJKCuuVLI4uoPwn5_K1mOJ5Rh-9xQAT--coQxSDFKRDMUKjr-i3XlbIaxCsuTmi77uey-JtjDrNnoFir8Pb7FQMGIKSoboVuHVvxpeAYbBAvZ2I60G5UUBbJ944_gn-QovMW0Gz9DF5ICHkkQE_7pqS9fmEcaX-44fahAhKm8UHsnH6gV3vmCxTBhNa-o3BUVvMH2QTGwSdqZ0VzQ5O5UvNL0NGbD1mw9OR4kK7qerFl0bCZYgA70JnsX4S1mnGz5HLF3ULdQlfFBLantekPanihQXEPUZi-6Rcq3e4UU8xhDtkv90NprhLj8yTnGO_dPSP7kYSouMAl4pTXOAgfYmsoD93M0O=w958-h608-no)

3. I think the reason might be the Unicode issues. if you search folder carefully, you will find actually the page already downloaded but stored in a Complicated Named folder:
(https://lh3.googleusercontent.com/0BvKiGTDTjSV-FrGjyDMPk7ELPP_ml_hMgQ9AlQTItcSb4Vc7rxT4kurU-eIuSAelNt2oNzJlafAVnyuyw4haCR9wCOIO8UEq_RfHZk-keP5yoPp-5ZBHmxjEdg8OeMQbDZR2fyKfJcxXR6ch8uulGsOiwYEQZAJ8fSYMzD2A6Y3tMRfaq_QK700wnhE2UVvn8TuWjB04gsWy7Z7GCTZbX8O481lNx7-PdJrTUxFoTOJexQuZ7qYnxGgT6deaUEF36hORG6-ChCrpXIy2G3kSIP0PWPgThhl4HYH1KPeAWUN5Aq_YheizZtAVlkP7TdW19VpWYYrhbyIXMBYgbocv9loDezNz9B35gxAu8BVut8SPHofhkCC7AoVED71qMoQIGYBjQ2RWxO2NtQQF82Qyg8avEiubM--xNQeM_cX5byTL_l28n6muPjM6xNSWzkgm6uq2L-Dg406vT5F0C5tTcxwKIVoL9mgXp3KjLNAyAXh7fhIawWSTRAqWlmPayKkW1IBOM6IAE1gCCs7dE2d4Q6OMrjd_SIzS9QDsx-BL7lJHc8bfBg9Ye_tZsVAeEluJESO=w1115-h955-no)

   Could you help to check if I can fix this problem here? thank you.

Kind regards,
Louis
Title: Re: Encoding of URL's is invalid
Post by: Richard Moss on January 18, 2016, 04:28:19 PM
Hello,

I knew it was too good to be true ;) That is a bug - I'll take a look and see if I can get it fixed for the next update. Slightly odd as I do actually have tests for Unicode URL's, but I shall do some more digging.

I doubt there's much you can do to resolve the issue - save a lot of manual search and replace

Regards;
Richard Moss
Title: Re: Encoding of URL's is invalid
Post by: Richard Moss on January 25, 2016, 07:02:07 PM
I'm not able to reproduce this locally, my test pages that include Unicode characters in their names are processed correctly.

Assuming you saved your WebCopy project so all the link information is stored in it, can you send me the project file for analysis? (You can clear the rules and forms / passwords before sending)

If you could also send one of the HTML files that contain the "bad" characters so I can check its encoding. It seems that for whatever reason the pages aren't being saved as UTF-8 as the response headers indicate they should be.

Also, as this is a different issue from your original post, I'm going to split this topic in two so there's one thread per issue.

Thanks;
Richard Moss