Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack CI times out for Windows #6128

Closed
andreasabel opened this issue Sep 23, 2022 · 7 comments · Fixed by #6216
Closed

Stack CI times out for Windows #6128

andreasabel opened this issue Sep 23, 2022 · 7 comments · Fixed by #6216
Assignees
Labels
infra: github workflows Issues related to GitHub workflows and actions (not in changelog) platform: windows Any issue specific to Microsoft Windows release blocker Issues blocking the next Agda release stack Concerning building with stack
Milestone

Comments

@andreasabel
Copy link
Member

Lifted from #6116 (comment)
Breakage occurred before merging changes to stack-9.2.4.yaml in course of #6116: https://github.com/agda/agda/actions/runs/3083630490/jobs/4984952651

@andreasabel andreasabel added platform: windows Any issue specific to Microsoft Windows infra: github workflows Issues related to GitHub workflows and actions (not in changelog) labels Sep 23, 2022
@andreasabel
Copy link
Member Author

I am afraid I have no clue why stack install suddenly stalls under the windows-2022 virtual environment, and what upstream change caused this breakage. I currently have no idea how to fix this. One could randomly downgrade stack/ghc/windows or wait for stack 2.9.1 to become available.

@andreasabel
Copy link
Member Author

andreasabel commented Oct 23, 2022

I managed to reproduce this behavior on my Windows machine.
Looks like our Setup script hangs.
We need to investigate this before the release!

2022-10-23 10:11:37.457678: [info] Agda       > copy/register
2022-10-23 10:11:37.457678: [debug] Run process within agda\: 
agda\.stack-work\dist\22605e11\setup\setup.EXE --verbose=1 --builddir=.stack-work\dist\22605e

This is the last message. In the task manager, I can see setup running, and if I kill it, stack terminates as well. So, it is definitely waiting for setup to finish.

@andreasabel andreasabel added release blocker Issues blocking the next Agda release and removed infra: github workflows Issues related to GitHub workflows and actions (not in changelog) labels Oct 23, 2022
@andreasabel andreasabel added this to the 2.6.3 milestone Oct 23, 2022
@nad
Copy link
Contributor

nad commented Oct 23, 2022

Is this a regression? In that case you could use bisection.

@andreasabel
Copy link
Member Author

According to the CI runs on master, https://github.com/agda/agda/actions/workflows/stack.yml?page=3&query=branch%3Amaster, the first commit that broke it was your (@nad) commit https://github.com/agda/agda/actions/runs/3083630490, see workflow run https://github.com/agda/agda/actions/runs/3083630490. But this commit just changed documentation. So I think bisection cannot help here.

Concerning CI run on branches, your run https://github.com/agda/agda/actions/runs/3083625202/jobs/4984786370 failed as well. This is

which you did not describe, so I added the description

(Andreas: Goal of this PR: speed up generation of .agdai files during setup.)

Bingo! This looks like a hit.
But:

  • Why is there no CI run for merging this into master?
  • You did not seem to check CI results before merging this, why?

andreasabel added a commit that referenced this issue Oct 23, 2022
Regression introduced by #6110.

The output of the created Agda process was never read,
resulting in a blocked pipe when the buffer was full.

Using the simpler `readCreateProcess` takes care of
the correct pipe handling for us.
andreasabel added a commit that referenced this issue Oct 23, 2022
Regression introduced by #6110.

The output of the created Agda process was never read,
resulting in a blocked pipe when the buffer was full.

Using the simpler `readCreateProcess` takes care of
the correct pipe handling for us.
@nad
Copy link
Contributor

nad commented Oct 23, 2022

which you did not describe, so I added the description

(Andreas: Goal of this PR: speed up generation of .agdai files during setup.)

There was a description: "Fixed #5225" (and issue #5225 explains what this was about). I don't see the point of duplicating information, I created the pull request only to run the CI tests, not to ask for feedback.

  • Why is there no CI run for merging this into master?

This commit was not merged into the master branch. I usually don't merge, because merging adds noise to the history. Furthermore there is no guarantee that the CI tests will succeed after a merge.

  • You did not seem to check CI results before merging this, why?

In this case it seems as if there were two CI problems:

  • Something related to ICU on MacOS. I usually ignore such problems.
  • The Windows build seems to have been cancelled. I guess I thought that there was some resource problem related to the CI infrastructure, but if I had looked closer I would have seen that the build stalled at the point in the build that I had modified. If we had had fewer false negatives I might have investigated this more carefully.

@andreasabel
Copy link
Member Author

There was a description: "Fixed #5225" (and issue #5225 explains what this was about). I don't see the point of duplicating information, I created the pull request only to run the CI tests, not to ask for feedback.

"Fixed #nnn" isn't a sufficient description. An issue #nnn only describes the problem. A PR should describe the solution. In particular what solution has been taken, and sometimes also why this particular solution was chosen.

Communicating more in a joint enterprise is seldom to the disadvantage. Usually, there isn't enough communication.

As a rule, each change through Agda should go through a PR, for communication, documentation, and also after-the-fact discussion (as many changes introduce regressions).

This commit was not merged into the master branch. I usually don't merge, because merging adds noise to the history. Furthermore there is no guarantee that the CI tests will succeed after a merge.

I didn't mean "merge" in the sense of "merge commit". I strongly prefer "rebase and merge" or "squash and merge". "Merging a branch" more generally means that the commits from a branch enter master.
Not seeing this PR in a CI run on master confused me greatly when looking for the source of the CI failure. As you can see, the first broken CI on master was your documentation patch. If CI starts failing on master, I want to see the first failure on the commits that caused it, not on some other commits. Yes, this means extra CI runs, unfortunately.

* The Windows build seems to have been cancelled. I guess I thought that there was some resource problem related to the CI infrastructure, but if I had looked closer I would have seen that the build stalled at the point in the build that I had modified. If we had had fewer false negatives I might have investigated this more carefully.

Well, yes, a broken macOS CI occurring in parallel did not help here. But the Windows CI kept being broken even after the macOS CI was fixed, and your input would have helped finding the culprit easier. Since I saw it break on an innocuous commit first, I could only suspect upstream and tried to solve it by upgrading the virtual environment, stack, etc., all to no avail.

Usually, I'd say "You owe me a beer", but then, you don't drink.

I wonder what we can learn from this. I am considering this:

  • All additions to master have to go through a PR.
  • Successful CI is mandatory to merge a PR into master.

This is standard in many open-source project. cabal is way stricter, additionally, 2 approving reviews are needed for every PR to get merged. I don't think I want to go that far, since it slows down the development process considerably, and many PRs are actually uncontroversial.

andreasabel added a commit that referenced this issue Oct 23, 2022
Regression introduced by #6110.

The output of the created Agda process was never read,
resulting in a blocked pipe when the buffer was full.

Using the simpler `readCreateProcess` takes care of
the correct pipe handling for us.
@nad
Copy link
Contributor

nad commented Oct 23, 2022

"Fixed #nnn" isn't a sufficient description. An issue #nnn only describes the problem. A PR should describe the solution. In particular what solution has been taken, and sometimes also why this particular solution was chosen.

I don't fully agree with you, but I suggest that we discuss this offline.

I wonder what we can learn from this. I am considering this:

  • All additions to master have to go through a PR.
  • Successful CI is mandatory to merge a PR into master.

I think the most important thing is to avoid false negatives, there have been lots of those recently.

@andreasabel andreasabel linked a pull request Oct 23, 2022 that will close this issue
andreasabel added a commit that referenced this issue Oct 23, 2022
Regression introduced by #6110.

The output of the created Agda process was never read,
resulting in a blocked pipe when the buffer was full.

Using the simpler `readCreateProcess` takes care of
the correct pipe handling for us.
andreasabel added a commit that referenced this issue Oct 23, 2022
Regression introduced by #6110.

The output of the created Agda process was never read,
resulting in a blocked pipe when the buffer was full.

Using the simpler `readCreateProcess` takes care of
the correct pipe handling for us.
@andreasabel andreasabel added the infra: github workflows Issues related to GitHub workflows and actions (not in changelog) label Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra: github workflows Issues related to GitHub workflows and actions (not in changelog) platform: windows Any issue specific to Microsoft Windows release blocker Issues blocking the next Agda release stack Concerning building with stack
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants