Hello,
The issue appears to be that the image isn't actually referred to as part of the raw HTML, but is populated by JavaScript. As I note many times, WebCopy doesn't execute JavaScript.
So using an example from your screenshot, the following is part of the raw HTML downloaded by the browser or tools like WebCopy
<a tabindex="-1" href="#" class="pushed" data-caption="1" data-deep="gallery-742680_77759" data-lbox="ilightbox_gallery-742680_77759" data-options="width:2500,height:1866,thumbnail: 'https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg'" data-album='[{"title":"","caption":"","width":"2500","height":"1866","thumbnail":"https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg","url":"https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg"}]' data-lb-index="0">The
href attribute is
#, essentially pointing back to the parent page, so WebCopy ignores it.
Once the JavaScript has ran, the
href is populated.
<a tabindex="-1" href="https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg" class="pushed" data-caption="1" data-deep="gallery-742680_77759" data-lbox="ilightbox_gallery-742680_77759" data-options="width:2500,height:1866,thumbnail: 'https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg'" data-album='[{"title":"","caption":"","width":"2500","height":"1866","thumbnail":"https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg","url":"https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg"}]' data-lb-index="0" data-lbox-init="true">
Unfortunately, while WebCopy can read data from custom attributes (such as
data-album above), it wasn't really designed to extract bits out of them. However, by combining a couple of features, we can at least extract the images - but the a tags won't get updated with the true URL.
Firstly, you need to tell WebCopy where to find the extra URLS
- Project Properties | Advanced | Custom Attributes
- Value: //a/@data-album
(Documentation link: https://docs.cyotek.com/cyowcopy/current/customattributes.html (https://docs.cyotek.com/cyowcopy/current/customattributes.html))
As the blocks of JSON extracted by this method aren't valid URLs, we need to use URL Transforms to discard the bulk of the JSON and just keep the one attribute - I went with
url in this case (again unfortunately WebCopy wasn't designed to be able to pull out multiple URLs from a single value except in some very specific places).
- Project Properties | Advanced | URL Transforms
- Add a new transform
- Expression: \[{(.*?)"url":"(.*?)"}\]
- Replacement: $2
(Documentation link: https://docs.cyotek.com/cyowcopy/current/uritransforms.html (https://docs.cyotek.com/cyowcopy/current/uritransforms.html))
With the above in place "https://vicenteromeroredondo.com/wp-content/uploads/2023/01/130x97-cm.jpg" (and more!) is generated as a URL to scan by WebCopy.
Regards;
Richard Moss