Author Topic: Absolute Paths Not Being Rewritten  (Read 176 times)

Offline shikage

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Absolute Paths Not Being Rewritten
« on: March 08, 2019, 12:34:36 AM »
Greetings, I've done a cursory search of the site but nothing recent seems to cover the issue I am encountering. I am attempting to save a site for offline viewing but most links in this site use absolute paths for the hrefs such as /mypage/childpage1 and these are not being rewritten when saving the site so what should be a url to file:///C:/savedsites/mysite/mypage/childpage1 is just a link to file:///mypage/childpage1 which does not work. Is there a way to address this within the settings for the project which I'm missing?

Offline Richard Moss

  • Cyotek Team
  • Administrator
  • Sr. Member
  • *****
  • Posts: 310
  • Karma: +17/-0
    • cyotek.com
Re: Absolute Paths Not Being Rewritten
« Reply #1 on: March 09, 2019, 11:56:16 AM »
Hello,

WebCopy should process these links correctly and convert them into the appropriate relative path for the offline copy. Are you able to share the address of a page that is affected by the issue so that I can run some tests? (If you don't want to share publicly you can send a message or an email)

Thanks;
Richard Moss

Offline shikage

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Re: Absolute Paths Not Being Rewritten
« Reply #2 on: March 09, 2019, 05:35:18 PM »
It's an LMS site, I have enrolled in a number of their courses and want to save these web based courses offline to read through them when on the bus and traveling more easily. You can find them at https://courses.nihongoshark.com/

Unfortunately, when pulling the content without logging in the links all work and get rewritten, at least for the limited content it has access to and most of the pages just request that you login to actually view the content. When I use the steps to login using the web browser I am able to authenticate and it is able to pull down all of the pages but the tiles that link to courses have the incorrect URLs and if I open a course directly none of the side nav links work.

EDIT: I tried copying again and while the main page isn't working if I go to one of the courses I can navigate the course and read the content which is really my primary goal. So I think I am fine.
« Last Edit: March 09, 2019, 06:26:32 PM by shikage »

Offline RichardDavies

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
Re: Absolute Paths Not Being Rewritten
« Reply #3 on: May 01, 2019, 10:00:36 PM »
There's definitely a problem with it not rewriting absolute urls. I've experienced this issue on two different web sites. For example, http://www.csszengarden.com/

Offline Richard Moss

  • Cyotek Team
  • Administrator
  • Sr. Member
  • *****
  • Posts: 310
  • Karma: +17/-0
    • cyotek.com
Re: Absolute Paths Not Being Rewritten
« Reply #4 on: May 02, 2019, 06:53:05 PM »
Hello both,

Thanks for the follow up. I'm just starting work on 1.8 now and I'll definitely run some tests on CSS Zen Garden (wow, can't beleive that is still around, I remember buying the book a long time ago). It should still remap absolute URL's, but if there's a bug and I can find it I'll get it fixed as soon as.

Regards;
Richard Moss

Offline Richard Moss

  • Cyotek Team
  • Administrator
  • Sr. Member
  • *****
  • Posts: 310
  • Karma: +17/-0
    • cyotek.com
Re: Absolute Paths Not Being Rewritten
« Reply #5 on: May 03, 2019, 08:01:37 PM »
Hello,

I just tested absolute paths and can confirm they aren't being remapped. I should be used to writing stupid bugs by now but even so this is pretty silly  :-[. I'll probably push a new 1.7 release in the next day or two with a fix for this, rather than waiting for 1.8.

Regards;
Richard Moss

Offline Richard Moss

  • Cyotek Team
  • Administrator
  • Sr. Member
  • *****
  • Posts: 310
  • Karma: +17/-0
    • cyotek.com
Re: Absolute Paths Not Being Rewritten
« Reply #6 on: May 04, 2019, 04:21:45 PM »
Hello,

In contradiction to my previous post, I haven't been able to reproduce any issues with absolute paths at all. When I ran a test yesterday, I misread the results (the demo sites includes the URL's as title attributes as well, naturally WebCopy doesn't look at title attributes so when I opened the file and saw an absolute path I immediately assumed the worst instead of spending half a second more to look harder).

In addition, today I downloaded the CSS Zengarden website. Out of the 6000+ HTML files generated by this, only one has an absolute reference left behind - this is for the RSS link on the main page, although I wasn't able to reproduce this with a smaller scan (the full scan took long enough that I'm not willing to do it again!). Definitely something to investigate but I'm unable to categorically reproduce the problem in order to fix it.

The only concrete method I know of causing this is by cancelling a crawl - if a crawl is cancelled (either manually or automatically), the stage to remap local files will be skipped. However, this means all links aren't remapped, not just absolute ones.

If you're able to specify an exact set of circumstances where absolute links aren't being remapped that would be of great help.

Regards;
Richard Moss