Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link checking - internal anchor references #1630

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

kingdonb
Copy link
Member

@kingdonb kingdonb commented Aug 21, 2023

This PR implements:

I'm testing the new Dockerized version of the link-checker-gpt, that helped us fix all the broken links in #1573

It should be possible to merge this if we are happy with it, independent of the release for Flux v2.1 (it can certainly wait until after 👍)

Assuming this allows me to test the new workflow, kingdonb/link-checker@v1-beta – it should pass because this PR doesn't change anything, and any bad links that pre-exist on the site don't count against the PR.

Assuming that works then, I'll push a bad link as test to prove this catches it, there is one known race condition I would like to resolve so this is a Draft but it should be very close to release quality, (I hope I've read all the relevant GHA docs) that race condition is solved 🤞

@kingdonb kingdonb force-pushed the add-link-checker branch 4 times, most recently from 4b1da07 to 027b3b1 Compare August 21, 2023 18:25
@kingdonb
Copy link
Member Author

kingdonb commented Aug 21, 2023

Alright, I've resolved all known issues and published a v0.1.0-beta.1 tag (v1-beta) of the GitHub Action, Dockerized that I believe we could merge here and enable as a check on the fluxcd/website repo today, of course it needs some reviews first.

Here come the test commit (it should fail) and the revert (where, if the race timing is right, you should be able to see the next build waits for deploy, based on the checks from its own PR.)

I've taken some care to make sure this only looks at its own deploy preview, or a newer one. It should not give false positives, it should wait until the deploy preview is ready (and yes, even if it's already been ready, if the deploy preview is currently executing and hasn't passed yet, it will wait another 60s. Since this subsequent commit needs to be allowed to advance the deploy preview to the current state...)

I'm not aware of any remaining issues with the workflow that would prevent me from tagging this as v1.0.0, it checks all the boxes except for looking pretty, I had a few more features I was thinking of adding (non-essential) but since nobody else has tested it yet I thought that would be a priority.

Since it requires access to a GITHUB_TOKEN to access the API (gh cli does), this could potentially warrant a bit of more detailed review and scrutiny. It's not doing anything nefarious with it, this is eventually going to enable the check to comment and nag us when there are latent bad links, but it doesn't do any such thing today. Only reading checks.

@kingdonb
Copy link
Member Author

kingdonb commented Aug 21, 2023

Oops, this type of bad link is not caught:

Screenshot 2023-08-21 at 2 37 08 PM

(this can be forgiven though, since it's not a link at all, but maybe another enhancement for future versions)

❌ The test at 0848a73 did not fail as expected, but fingers crossed for 6ffc756

@kingdonb
Copy link
Member Author

kingdonb commented Aug 21, 2023

The commit failed like it should, however there are a couple of pre-existing bad links that get swept into the report (Edit: I understand why this happens now, it's a feature... kingdonb/link-checker-gpt#20).

Since we're fixing almost all of the bad links in #1573 and it's kind of a feature anyway, I don't foresee this being a blocker to merging.

@kingdonb kingdonb marked this pull request as ready for review August 21, 2023 19:31
kingdonb pushed a commit to kingdonb/link-checker-gpt that referenced this pull request Aug 21, 2023
Signed-off-by: Kingdon Barrett <kingdon@weave.works>
@kingdonb kingdonb changed the title Test link-checker-gpt Link checking - internal anchor references Aug 21, 2023
Kingdon Barrett added 3 commits September 7, 2023 12:35
This PR implements:

* https://github.com/kingdonb/link-checker-gpt/tree/v1-beta#integration-as-a-ci-check

I'm testing the new Dockerized version of the link-checker-gpt, that
helped us fix all the broken links in #1573

It should be possible to merge this if we are happy with it, independent
of the release for Flux v2.1 (it can certainly wait until after 👍)

I'm going to push a broken link as a test.

There has been some thought put into making a good workflow in a way
such that we don't want to 86 it right away... minor annoyances should
all be squashed prior to their annoying anyone

* add the workflow to .github/workflows

* fix the bug (upstream)

* provide GH_TOKEN to link checker
* (fix github token)

Signed-off-by: Kingdon Barrett <kingdon@weave.works>
Add a bad link id anchor that doesn't go anywhere (but isn't a 404)

Signed-off-by: Kingdon Barrett <kingdon@weave.works>
This reverts commit 6ffc756.

Signed-off-by: Kingdon Barrett <kingdon@weave.works>
@kingdonb
Copy link
Member Author

It is possible to repair the damage, assuming that we've broken links, but it really only matters if we are serving up 404s.

(Is it even possible to get the list of all 404s we've ever served up out of Netlify? That would be the easiest way to address the issue, by adding a redirect for each link that is verified to still be in use. We don't need to preserve hard links that nobody ever linked to, if we've cleaned up any references from fluxcd.io that we might have once made before by ourselves! But if anybody else linked to the Flux docs at a specific anchor, we definitely want those links to go somewhere still relevant.)

We can't very well delete any anchors (or pages with anchors on them) if we don't have a system to address this. I'm not confident in the work I've put here in #1630 but I think it's worth another iteration or two, and it showed promise at being able to solve at least 1/3 of the issue - the internal consistency part.

#1658 found at least one case where it did not work

I will come back to this shortly, in the mean time if others are interested: https://github.com/kingdonb/link-checker-gpt

To be clear there's nothing GPT about this link checker, it's doing reference checking with absolutely no AI inferencing or LLMs involved. The code was written by ChatGPT, but the tests are going to have to be written by humans I'm afraid 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant