Fix nix installation #6400

cocreature · 2020-06-18T08:21:46Z

Nix now requires -L, I’ve gone ahead and just normalized everything to
use -sfL which we were already using in one place.

changelog_begin
changelog_end

Pull Request Checklist

Read and understand the contribution guidelines
Include appropriate tests
Set a descriptive title and thorough description
Add a reference to the issue this PR will solve, if appropriate
Include changelog additions in one or more commit message bodies between the CHANGELOG_BEGIN and CHANGELOG_END tags
Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

ghost · 2020-06-18T08:24:25Z

README.md

@@ -40,7 +40,7 @@ Our builds require various development dependencies (e.g. Java, Bazel, Python),

 On Linux and Mac `dev-env` can be installed with:

-1. Install Nix by running: `bash <(curl https://nixos.org/nix/install)`
+1. Install Nix by running: `bash <(curl -sfL https://nixos.org/nix/install)`


How do you feel about adding -S to this so it shows an error on failure?

Nix now requires -L, I’ve gone ahead and just normalized everything to use -sfL which we were already using in one place. changelog_begin changelog_end

It looks like some nix update has broken our current Terraform setup. The Google provider plugin has changed its reported version to 0.0.0; poking at my local nix store seems to indicate we actually get 3.15, but :shrug:. This PR also reverts the infra part of #6400 so we get back to master == reality. CHANGELOG_BEGIN CHANGELOG_END

This PR duplicates the linux CI cluster. This is the first in a three-step plan to implement #6400 safely while people are working. I usually do cluster updates over the weekend because they require shutting down the entire CI system for about two hours. This is unfortunately not practical while people are working, and timezones make it difficult for me to find a time where people are not working during the week. So instead the plan is as follows: 1. Create a duplicate of our CI cluster (this PR). 2. Wait for the new cluster to be operational (~90-120 minutes ime). 3. In the Azure Pipelines config screen, disable all the nodes of the "old" cluster, so all new jobs get assigned to the temp cluster. Wait for all jobs to finish on the old cluster. 4. Update the old cluster. Wait for it to be deployed. (Second PR.) 5. In Azure, disable temp nodes, wait for jobs to drain. 6. Delete temp nodes (third PR). Reviewing this PR is best done by verifying you can reproduce the following shell session: ``` $ diff vsts_agent_linux.tf vsts_agent_linux_temp.tf 4,7c4,5 < resource "secret_resource" "vsts-token" {} < < data "template_file" "vsts-agent-linux-startup" { < template = "${file("${path.module}/vsts_agent_linux_startup.sh")}" --- > data "template_file" "vsts-agent-linux-startup-temp" { > template = "${file("${path.module}/vsts_agent_linux_startup_temp.sh")}" 16c14 < resource "google_compute_region_instance_group_manager" "vsts-agent-linux" { --- > resource "google_compute_region_instance_group_manager" "vsts-agent-linux-temp" { 18,19c16,17 < name = "vsts-agent-linux" < base_instance_name = "vsts-agent-linux" --- > name = "vsts-agent-linux-temp" > base_instance_name = "vsts-agent-linux-temp" 24,25c22,23 < name = "vsts-agent-linux" < instance_template = "${google_compute_instance_template.vsts-agent-linux.self_link}" --- > name = "vsts-agent-linux-temp" > instance_template = "${google_compute_instance_template.vsts-agent-linux-temp.self_link}" 36,37c34,35 < resource "google_compute_instance_template" "vsts-agent-linux" { < name_prefix = "vsts-agent-linux-" --- > resource "google_compute_instance_template" "vsts-agent-linux-temp" { > name_prefix = "vsts-agent-linux-temp-" 52c50 < startup-script = "${data.template_file.vsts-agent-linux-startup.rendered}" --- > startup-script = "${data.template_file.vsts-agent-linux-startup-temp.rendered}" $ diff vsts_agent_linux_startup.sh vsts_agent_linux_startup_temp.sh 149c149 < su --command "sh <(curl https://nixos.org/nix/install) --daemon" --login vsts --- > su --command "sh <(curl -sSfL https://nixos.org/nix/install) --daemon" --login vsts $ ``` and reviewing that diff, rather than looking at the added files in their entirety. The name changes are benign and needed for Terraform to appropriately keep track of which node belongs to the old vs the temp group. The only change that matters is the new group has the `-sSfL` flag so they will actually boot up. (Hopefully.) CHANGELOG_BEGIN CHANGELOG_END

This PR duplicates the linux CI cluster. This is the first in a three-PR plan to implement #6400 safely while people are working. I usually do cluster updates over the weekend because they require shutting down the entire CI system for about two hours. This is unfortunately not practical while people are working, and timezones make it difficult for me to find a time where people are not working during the week. So instead the plan is as follows: 1. Create a duplicate of our CI cluster (this PR). 2. Wait for the new cluster to be operational (~90-120 minutes ime). 3. In the Azure Pipelines config screen, disable all the nodes of the "old" cluster, so all new jobs get assigned to the temp cluster. Wait for all jobs to finish on the old cluster. 4. Update the old cluster. Wait for it to be deployed. (Second PR.) 5. In Azure, disable temp nodes, wait for jobs to drain. 6. Delete temp nodes (third PR). Reviewing this PR is best done by verifying you can reproduce the following shell session: ``` $ diff vsts_agent_linux.tf vsts_agent_linux_temp.tf 4,7c4,5 < resource "secret_resource" "vsts-token" {} < < data "template_file" "vsts-agent-linux-startup" { < template = "${file("${path.module}/vsts_agent_linux_startup.sh")}" --- > data "template_file" "vsts-agent-linux-startup-temp" { > template = "${file("${path.module}/vsts_agent_linux_startup_temp.sh")}" 16c14 < resource "google_compute_region_instance_group_manager" "vsts-agent-linux" { --- > resource "google_compute_region_instance_group_manager" "vsts-agent-linux-temp" { 18,19c16,17 < name = "vsts-agent-linux" < base_instance_name = "vsts-agent-linux" --- > name = "vsts-agent-linux-temp" > base_instance_name = "vsts-agent-linux-temp" 24,25c22,23 < name = "vsts-agent-linux" < instance_template = "${google_compute_instance_template.vsts-agent-linux.self_link}" --- > name = "vsts-agent-linux-temp" > instance_template = "${google_compute_instance_template.vsts-agent-linux-temp.self_link}" 36,37c34,35 < resource "google_compute_instance_template" "vsts-agent-linux" { < name_prefix = "vsts-agent-linux-" --- > resource "google_compute_instance_template" "vsts-agent-linux-temp" { > name_prefix = "vsts-agent-linux-temp-" 52c50 < startup-script = "${data.template_file.vsts-agent-linux-startup.rendered}" --- > startup-script = "${data.template_file.vsts-agent-linux-startup-temp.rendered}" $ diff vsts_agent_linux_startup.sh vsts_agent_linux_startup_temp.sh 149c149 < su --command "sh <(curl https://nixos.org/nix/install) --daemon" --login vsts --- > su --command "sh <(curl -sSfL https://nixos.org/nix/install) --daemon" --login vsts $ ``` and reviewing that diff, rather than looking at the added files in their entirety. The name changes are benign and needed for Terraform to appropriately keep track of which node belongs to the old vs the temp group. The only change that matters is the new group has the `-sSfL` flag so they will actually boot up. (Hopefully.) CHANGELOG_BEGIN CHANGELOG_END

@nycnewman

See #6400; split out as separate PR so master == reality and we can track when this is done. @nycnewman please merge this once the change is deployed. Note: it has to be deployed before the next restart; nodes will _not_ be able to boot with the current configuration. CHANGELOG_BEGIN CHANGELOG_END

@nycnewman

See #6400; split out as separate PR so master == reality and we can track when this is done. @nycnewman please merge this once the change is deployed. Note: it has to be deployed before the next restart; nodes will _not_ be able to boot with the current configuration. CHANGELOG_BEGIN CHANGELOG_END

cocreature requested review from SamirTalwar and garyverhaegen-da June 18, 2020 08:21

cocreature added the Standard-Change label Jun 18, 2020

ghost approved these changes Jun 18, 2020

View reviewed changes

Fix nix installation

586b65b

Nix now requires -L, I’ve gone ahead and just normalized everything to use -sfL which we were already using in one place. changelog_begin changelog_end

cocreature force-pushed the fix-nix-install branch from 9531233 to 586b65b Compare June 18, 2020 08:25

cocreature merged commit 2c1d4cb into master Jun 18, 2020

cocreature deleted the fix-nix-install branch June 18, 2020 08:34

garyverhaegen-da mentioned this pull request Jun 18, 2020

restore terraform to working state #6402

Merged

garyverhaegen-da mentioned this pull request Jun 18, 2020

duplicate linux CI cluster #6405

Merged

garyverhaegen-da mentioned this pull request Jun 18, 2020

macos nodes: add nix redirect #6406

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nix installation #6400

Fix nix installation #6400

cocreature commented Jun 18, 2020

ghost Jun 18, 2020

cocreature Jun 18, 2020

Fix nix installation #6400

Fix nix installation #6400

Conversation

cocreature commented Jun 18, 2020

Pull Request Checklist

ghost Jun 18, 2020

Choose a reason for hiding this comment

cocreature Jun 18, 2020

Choose a reason for hiding this comment