Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove mastodon.archive.org from excluded urls #976

Merged
merged 3 commits into from
Dec 20, 2022
Merged

Conversation

anishsarangi
Copy link
Collaborator

Make changes in isArchiveUrl function to include mastodon.archive.org as a supported url in the extension.

@vbanos
Copy link
Collaborator

vbanos commented Dec 14, 2022

@anishsarangi I see that we block archive.org and we allow mastodon.archive.org.

Allowing/Blocking a URL is happening in SPN, it shouldn't happen in the browser extension.
You should allow all URLs and the SPN will decide to capture it or not.
(allow/block has so many rules, e.g. we also also blog.archive.org and help.archive.org).

So, please drop isArchiveUrl altogether. Thanks.

@anishsarangi
Copy link
Collaborator Author

@vbanos We are showing based "URL not supported" on the UI based on isArchiveUrl, isNotExcludedUrl and isValidUrl. How can we show it without storing urls? Is there any api to check from SPN ?

@cgorringe
Copy link
Member

Would if we did return (hostname === 'archive.org') || (hostname === 'web.archive.org')? I think that should allow for mastodon., blog., and help. to be archivable. (I haven't tested this.)

Alternately, could simply return false to accept all archive domains to test it rather than remove all the calls to isArchiveUrl().

These are just suggestions. I do think there is some benefit in having the UI display "URL not supported" in known cases. I'm not sure what is the best solution.

@cgorringe
Copy link
Member

I'd recommend keeping it for web.archive.org.
When I tested it set to return false, the Wayback Count badge had some issues while viewing archived pages.

@vbanos
Copy link
Collaborator

vbanos commented Dec 20, 2022

@anishsarangi please check again the SPN public API error codes.
https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit#heading=h.2bu3y5mtzecu
If a URL is blocked the API will instantly return a specific error code.

{"message":"This URL is in the Save Page Now service block list and cannot be captured. Please email us at \"info@archive.org\" if you would like to discuss this more.","status":"error","status_ext":"error:blocked-url"}

@cgorringe we may need to capture archive.org static pages, e.g. https://archive.org/about
You can't just block archive.org.

Again, the logic to allow/block a URL is too complicated. The SPN2 back-end is taking care of that.
You shouldn't do any blocking in the extension besides some basic URL validation.

@cgorringe
Copy link
Member

@vbanos My recommendation for blocking web.archive.org does not include archive.org so those static pages can be saved.

Like I said, I tested it. There are other issues with Wayback Count to resolve if we don't block web.archive.org. I still recommend blocking this. (as Anish has just updated.)

@anishsarangi
Copy link
Collaborator Author

@vbanos I think what @cgorringe suggests makes a valid point. I have just updated the code. Please have a look at it so that we can merge the PR.

@vbanos
Copy link
Collaborator

vbanos commented Dec 20, 2022

OK, I've checked it and I think that its OK.

@anishsarangi
Copy link
Collaborator Author

Okay i am merging this PR.

@anishsarangi anishsarangi merged commit 87637fe into master Dec 20, 2022
@anishsarangi anishsarangi deleted the mastodon-issue branch December 20, 2022 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants