URLs are not added to list of documents to be scanned #951

luzidl · 2024-12-17T10:00:05Z

Following issue is really affecting the usage of the tool at the moment. Adding URLs from Websites will skip URLs that are having partially the same structure. See example below:

E.g. I added the URL

https://www.bkw.ch/de/energie/energiebeschaffung-fuer-geschaeftskundinnen-und-geschaeftskunden/energy-relax

then added the URL

https://www.bkw.ch/de/energie/energiebeschaffung-fuer-geschaeftskundinnen-und-geschaeftskunden/energy-relax-in-tranchen

but the latter will never be added to the list of documents to be scanned. Everything that has the first portion of the URL seems to be not added, too! (means this portion: https://www.bkw.ch/de/energie/energiebeschaffung-fuer-geschaeftskundinnen-und-geschaeftskunden/).
I also experienced that for websites, that have a page counter at the end, eg. page=1 ... n.

Can you please fix this bug as soon as possible?

Thanks in advanced!

jexp · 2024-12-18T09:46:22Z

For the time being you can also save the webpages to PDF and upload them as a workaround.

We're looking into it.

Sometimes the actual text of the webpage is not added because it's generated by javascript and not in the actual HTML but it doesn't seem to be the case here.

kartikpersistent added the bug Something isn't working label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URLs are not added to list of documents to be scanned #951

URLs are not added to list of documents to be scanned #951

luzidl commented Dec 17, 2024

jexp commented Dec 18, 2024

URLs are not added to list of documents to be scanned #951

URLs are not added to list of documents to be scanned #951

Comments

luzidl commented Dec 17, 2024

jexp commented Dec 18, 2024