Open
Description
Hello there,
I observe that even the latest current version of ODD (v3.1.0.1) does not properly encode URL in the output file.
Let me detail the case:
- First, let's ODD a (randomly found on the internet) website containing some special chars in the path:
$ ./OpenDirectoryDownloader -u "https://gregoirelorieux.net/paysagescomposes/villes/Melle/" --output-file test
[...]
Finshed indexing
[...]
Saving URL list to file..
Saved URL list to file: /tmp/test.txt
- Then let's see the first results of the output file:
$ head test.txt
https://gregoirelorieux.net/paysagescomposes/villes/Melle/#3 21 jan/Melle/contrebasse-echantillons/cb-arco-1.aif
[...]
- If we try to download the first file with
wget
(and even other download managers), it fails because there are unencoded characters in the URL: "#" and whitespaces.
$ wget -v "https://gregoirelorieux.net/paysagescomposes/villes/Melle/#3 21 jan/Melle/contrebasse-echantillons/cb-arco-1.aif"
--2024-10-29 23:22:12-- https://gregoirelorieux.net/paysagescomposes/villes/Melle/
Resolving gregoirelorieux.net (gregoirelorieux.net)... 213.186.33.87
Connecting to gregoirelorieux.net (gregoirelorieux.net)|213.186.33.87|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 844 [text/html]
Saving to: ‘index.html’
index.html 100%[===============================================================================>] 844 --.-KB/s in 0s
2024-10-29 23:22:13 (550 MB/s) - ‘index.html’ saved [844/844]
Here, the downloaded file:
- is not the asked one:
https://gregoirelorieux.net/paysagescomposes/villes/Melle/#3 21 jan/Melle/contrebasse-echantillons/cb-arco-1.aif
- but is from this automatically split link:
https://gregoirelorieux.net/paysagescomposes/villes/Melle/
wget
ignores everything after finding a special char, the first one here is "#"
The correct encoded link in the ODD output file should be:
https://gregoirelorieux.net/paysagescomposes/villes/Melle/%233%2021%20jan/Melle/contrebasse-echantillons/cb-arco-1.aif
Instead of:
https://gregoirelorieux.net/paysagescomposes/villes/Melle/#3 21 jan/Melle/contrebasse-echantillons/cb-arco-1.aif
Can you fix it ?
The encodeURIComponent
function should help.
Cheers!
Metadata
Assignees
Labels
No labels