Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nix installation #6400

Merged
merged 1 commit into from
Jun 18, 2020
Merged

Fix nix installation #6400

merged 1 commit into from
Jun 18, 2020

Conversation

cocreature
Copy link
Contributor

Nix now requires -L, I’ve gone ahead and just normalized everything to
use -sfL which we were already using in one place.

changelog_begin
changelog_end

Pull Request Checklist

  • Read and understand the contribution guidelines
  • Include appropriate tests
  • Set a descriptive title and thorough description
  • Add a reference to the issue this PR will solve, if appropriate
  • Include changelog additions in one or more commit message bodies between the CHANGELOG_BEGIN and CHANGELOG_END tags
  • Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

README.md Outdated
@@ -40,7 +40,7 @@ Our builds require various development dependencies (e.g. Java, Bazel, Python),

On Linux and Mac `dev-env` can be installed with:

1. Install Nix by running: `bash <(curl https://nixos.org/nix/install)`
1. Install Nix by running: `bash <(curl -sfL https://nixos.org/nix/install)`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about adding -S to this so it shows an error on failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Nix now requires -L, I’ve gone ahead and just normalized everything to
use -sfL which we were already using in one place.

changelog_begin
changelog_end
@cocreature cocreature merged commit 2c1d4cb into master Jun 18, 2020
@cocreature cocreature deleted the fix-nix-install branch June 18, 2020 08:34
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
It looks like some nix update has broken our current Terraform setup.
The Google provider plugin has changed its reported version to 0.0.0;
poking at my local nix store seems to indicate we actually get 3.15, but
:shrug:.

This PR also reverts the infra part of #6400 so we get back to master ==
reality.

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
It looks like some nix update has broken our current Terraform setup.
The Google provider plugin has changed its reported version to 0.0.0;
poking at my local nix store seems to indicate we actually get 3.15, but
:shrug:.

This PR also reverts the infra part of #6400 so we get back to master ==
reality.

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
It looks like some nix update has broken our current Terraform setup.
The Google provider plugin has changed its reported version to 0.0.0;
poking at my local nix store seems to indicate we actually get 3.15, but
:shrug:.

This PR also reverts the infra part of #6400 so we get back to master ==
reality.

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
This PR duplicates the linux CI cluster. This is the first in a
three-step plan to implement #6400 safely while people are working.

I usually do cluster updates over the weekend because they require
shutting down the entire CI system for about two hours. This is
unfortunately not practical while people are working, and timezones make
it difficult for me to find a time where people are not working during
the week.

So instead the plan is as follows:

1. Create a duplicate of our CI cluster (this PR).
2. Wait for the new cluster to be operational (~90-120 minutes ime).
3. In the Azure Pipelines config screen, disable all the nodes of the
   "old" cluster, so all new jobs get assigned to the temp cluster. Wait
   for all jobs to finish on the old cluster.
4. Update the old cluster. Wait for it to be deployed. (Second PR.)
5. In Azure, disable temp nodes, wait for jobs to drain.
6. Delete temp nodes (third PR).

Reviewing this PR is best done by verifying you can reproduce the
following shell session:

```
$ diff vsts_agent_linux.tf vsts_agent_linux_temp.tf
4,7c4,5
< resource "secret_resource" "vsts-token" {}
<
< data "template_file" "vsts-agent-linux-startup" {
<   template = "${file("${path.module}/vsts_agent_linux_startup.sh")}"
---
> data "template_file" "vsts-agent-linux-startup-temp" {
>   template =
"${file("${path.module}/vsts_agent_linux_startup_temp.sh")}"
16c14
< resource "google_compute_region_instance_group_manager"
"vsts-agent-linux" {
---
> resource "google_compute_region_instance_group_manager"
"vsts-agent-linux-temp" {
18,19c16,17
<   name               = "vsts-agent-linux"
<   base_instance_name = "vsts-agent-linux"
---
>   name               = "vsts-agent-linux-temp"
>   base_instance_name = "vsts-agent-linux-temp"
24,25c22,23
<     name              = "vsts-agent-linux"
<     instance_template =
"${google_compute_instance_template.vsts-agent-linux.self_link}"
---
>     name              = "vsts-agent-linux-temp"
>     instance_template =
"${google_compute_instance_template.vsts-agent-linux-temp.self_link}"
36,37c34,35
< resource "google_compute_instance_template" "vsts-agent-linux" {
<   name_prefix  = "vsts-agent-linux-"
---
> resource "google_compute_instance_template" "vsts-agent-linux-temp" {
>   name_prefix  = "vsts-agent-linux-temp-"
52c50
<     startup-script =
"${data.template_file.vsts-agent-linux-startup.rendered}"
---
>     startup-script =
"${data.template_file.vsts-agent-linux-startup-temp.rendered}"
$ diff vsts_agent_linux_startup.sh vsts_agent_linux_startup_temp.sh
149c149
< su --command "sh <(curl https://nixos.org/nix/install) --daemon"
--login vsts
---
> su --command "sh <(curl -sSfL https://nixos.org/nix/install) --daemon"
--login vsts
$
```

and reviewing that diff, rather than looking at the added files in their
entirety. The name changes are benign and needed for Terraform to
appropriately keep track of which node belongs to the old vs the temp
group. The only change that matters is the new group has the `-sSfL`
flag so they will actually boot up. (Hopefully.)

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
This PR duplicates the linux CI cluster. This is the first in a
three-step plan to implement #6400 safely while people are working.

I usually do cluster updates over the weekend because they require
shutting down the entire CI system for about two hours. This is
unfortunately not practical while people are working, and timezones make
it difficult for me to find a time where people are not working during
the week.

So instead the plan is as follows:

1. Create a duplicate of our CI cluster (this PR).
2. Wait for the new cluster to be operational (~90-120 minutes ime).
3. In the Azure Pipelines config screen, disable all the nodes of the
   "old" cluster, so all new jobs get assigned to the temp cluster. Wait
   for all jobs to finish on the old cluster.
4. Update the old cluster. Wait for it to be deployed. (Second PR.)
5. In Azure, disable temp nodes, wait for jobs to drain.
6. Delete temp nodes (third PR).

Reviewing this PR is best done by verifying you can reproduce the
following shell session:

```
$ diff vsts_agent_linux.tf vsts_agent_linux_temp.tf
4,7c4,5
< resource "secret_resource" "vsts-token" {}
<
< data "template_file" "vsts-agent-linux-startup" {
<   template = "${file("${path.module}/vsts_agent_linux_startup.sh")}"
---
> data "template_file" "vsts-agent-linux-startup-temp" {
>   template =
"${file("${path.module}/vsts_agent_linux_startup_temp.sh")}"
16c14
< resource "google_compute_region_instance_group_manager"
"vsts-agent-linux" {
---
> resource "google_compute_region_instance_group_manager"
"vsts-agent-linux-temp" {
18,19c16,17
<   name               = "vsts-agent-linux"
<   base_instance_name = "vsts-agent-linux"
---
>   name               = "vsts-agent-linux-temp"
>   base_instance_name = "vsts-agent-linux-temp"
24,25c22,23
<     name              = "vsts-agent-linux"
<     instance_template =
"${google_compute_instance_template.vsts-agent-linux.self_link}"
---
>     name              = "vsts-agent-linux-temp"
>     instance_template =
"${google_compute_instance_template.vsts-agent-linux-temp.self_link}"
36,37c34,35
< resource "google_compute_instance_template" "vsts-agent-linux" {
<   name_prefix  = "vsts-agent-linux-"
---
> resource "google_compute_instance_template" "vsts-agent-linux-temp" {
>   name_prefix  = "vsts-agent-linux-temp-"
52c50
<     startup-script =
"${data.template_file.vsts-agent-linux-startup.rendered}"
---
>     startup-script =
"${data.template_file.vsts-agent-linux-startup-temp.rendered}"
$ diff vsts_agent_linux_startup.sh vsts_agent_linux_startup_temp.sh
149c149
< su --command "sh <(curl https://nixos.org/nix/install) --daemon"
--login vsts
---
> su --command "sh <(curl -sSfL https://nixos.org/nix/install) --daemon"
--login vsts
$
```

and reviewing that diff, rather than looking at the added files in their
entirety. The name changes are benign and needed for Terraform to
appropriately keep track of which node belongs to the old vs the temp
group. The only change that matters is the new group has the `-sSfL`
flag so they will actually boot up. (Hopefully.)

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
This PR duplicates the linux CI cluster. This is the first in a
three-PR plan to implement #6400 safely while people are working.

I usually do cluster updates over the weekend because they require
shutting down the entire CI system for about two hours. This is
unfortunately not practical while people are working, and timezones make
it difficult for me to find a time where people are not working during
the week.

So instead the plan is as follows:

1. Create a duplicate of our CI cluster (this PR).
2. Wait for the new cluster to be operational (~90-120 minutes ime).
3. In the Azure Pipelines config screen, disable all the nodes of the
   "old" cluster, so all new jobs get assigned to the temp cluster. Wait
   for all jobs to finish on the old cluster.
4. Update the old cluster. Wait for it to be deployed. (Second PR.)
5. In Azure, disable temp nodes, wait for jobs to drain.
6. Delete temp nodes (third PR).

Reviewing this PR is best done by verifying you can reproduce the
following shell session:

```
$ diff vsts_agent_linux.tf vsts_agent_linux_temp.tf
4,7c4,5
< resource "secret_resource" "vsts-token" {}
<
< data "template_file" "vsts-agent-linux-startup" {
<   template = "${file("${path.module}/vsts_agent_linux_startup.sh")}"
---
> data "template_file" "vsts-agent-linux-startup-temp" {
>   template =
"${file("${path.module}/vsts_agent_linux_startup_temp.sh")}"
16c14
< resource "google_compute_region_instance_group_manager"
"vsts-agent-linux" {
---
> resource "google_compute_region_instance_group_manager"
"vsts-agent-linux-temp" {
18,19c16,17
<   name               = "vsts-agent-linux"
<   base_instance_name = "vsts-agent-linux"
---
>   name               = "vsts-agent-linux-temp"
>   base_instance_name = "vsts-agent-linux-temp"
24,25c22,23
<     name              = "vsts-agent-linux"
<     instance_template =
"${google_compute_instance_template.vsts-agent-linux.self_link}"
---
>     name              = "vsts-agent-linux-temp"
>     instance_template =
"${google_compute_instance_template.vsts-agent-linux-temp.self_link}"
36,37c34,35
< resource "google_compute_instance_template" "vsts-agent-linux" {
<   name_prefix  = "vsts-agent-linux-"
---
> resource "google_compute_instance_template" "vsts-agent-linux-temp" {
>   name_prefix  = "vsts-agent-linux-temp-"
52c50
<     startup-script =
"${data.template_file.vsts-agent-linux-startup.rendered}"
---
>     startup-script =
"${data.template_file.vsts-agent-linux-startup-temp.rendered}"
$ diff vsts_agent_linux_startup.sh vsts_agent_linux_startup_temp.sh
149c149
< su --command "sh <(curl https://nixos.org/nix/install) --daemon"
--login vsts
---
> su --command "sh <(curl -sSfL https://nixos.org/nix/install) --daemon"
--login vsts
$
```

and reviewing that diff, rather than looking at the added files in their
entirety. The name changes are benign and needed for Terraform to
appropriately keep track of which node belongs to the old vs the temp
group. The only change that matters is the new group has the `-sSfL`
flag so they will actually boot up. (Hopefully.)

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
This PR duplicates the linux CI cluster. This is the first in a
three-PR plan to implement #6400 safely while people are working.

I usually do cluster updates over the weekend because they require
shutting down the entire CI system for about two hours. This is
unfortunately not practical while people are working, and timezones make
it difficult for me to find a time where people are not working during
the week.

So instead the plan is as follows:

1. Create a duplicate of our CI cluster (this PR).
2. Wait for the new cluster to be operational (~90-120 minutes ime).
3. In the Azure Pipelines config screen, disable all the nodes of the
   "old" cluster, so all new jobs get assigned to the temp cluster. Wait
   for all jobs to finish on the old cluster.
4. Update the old cluster. Wait for it to be deployed. (Second PR.)
5. In Azure, disable temp nodes, wait for jobs to drain.
6. Delete temp nodes (third PR).

Reviewing this PR is best done by verifying you can reproduce the
following shell session:

```
$ diff vsts_agent_linux.tf vsts_agent_linux_temp.tf
4,7c4,5
< resource "secret_resource" "vsts-token" {}
<
< data "template_file" "vsts-agent-linux-startup" {
<   template = "${file("${path.module}/vsts_agent_linux_startup.sh")}"
---
> data "template_file" "vsts-agent-linux-startup-temp" {
>   template =
"${file("${path.module}/vsts_agent_linux_startup_temp.sh")}"
16c14
< resource "google_compute_region_instance_group_manager"
"vsts-agent-linux" {
---
> resource "google_compute_region_instance_group_manager"
"vsts-agent-linux-temp" {
18,19c16,17
<   name               = "vsts-agent-linux"
<   base_instance_name = "vsts-agent-linux"
---
>   name               = "vsts-agent-linux-temp"
>   base_instance_name = "vsts-agent-linux-temp"
24,25c22,23
<     name              = "vsts-agent-linux"
<     instance_template =
"${google_compute_instance_template.vsts-agent-linux.self_link}"
---
>     name              = "vsts-agent-linux-temp"
>     instance_template =
"${google_compute_instance_template.vsts-agent-linux-temp.self_link}"
36,37c34,35
< resource "google_compute_instance_template" "vsts-agent-linux" {
<   name_prefix  = "vsts-agent-linux-"
---
> resource "google_compute_instance_template" "vsts-agent-linux-temp" {
>   name_prefix  = "vsts-agent-linux-temp-"
52c50
<     startup-script =
"${data.template_file.vsts-agent-linux-startup.rendered}"
---
>     startup-script =
"${data.template_file.vsts-agent-linux-startup-temp.rendered}"
$ diff vsts_agent_linux_startup.sh vsts_agent_linux_startup_temp.sh
149c149
< su --command "sh <(curl https://nixos.org/nix/install) --daemon"
--login vsts
---
> su --command "sh <(curl -sSfL https://nixos.org/nix/install) --daemon"
--login vsts
$
```

and reviewing that diff, rather than looking at the added files in their
entirety. The name changes are benign and needed for Terraform to
appropriately keep track of which node belongs to the old vs the temp
group. The only change that matters is the new group has the `-sSfL`
flag so they will actually boot up. (Hopefully.)

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
See #6400; split out as separate PR so master == reality and we can
track when this is done. @nycnewman please merge this once the change
is deployed.

Note: it has to be deployed before the next restart; nodes will _not_ be
able to boot with the current configuration.

CHANGELOG_BEGIN
CHANGELOG_END
garyverhaegen-da added a commit that referenced this pull request Jun 18, 2020
See #6400; split out as separate PR so master == reality and we can
track when this is done. @nycnewman please merge this once the change
is deployed.

Note: it has to be deployed before the next restart; nodes will _not_ be
able to boot with the current configuration.

CHANGELOG_BEGIN
CHANGELOG_END
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant