Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform not honouring OS IPv4 settings, using IPv6 dst to call *.googleapis.com #6782

Open
mhanline opened this issue Jul 12, 2020 · 19 comments
Labels
persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work service/terraform size/m
Milestone

Comments

@mhanline
Copy link

mhanline commented Jul 12, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

tf version
Terraform v0.12.28
+ provider.google v3.29.0
+ provider.google-beta v3.29.0

Affected Resource(s)

All resources, not specific to any one.

Terraform Configuration Files

While this happens intermittently and it's not specific to this config, it seems to happen with longer Terraform runs. You may need to apply / destroy 1-2 times before seeing this issue.

gist link to config

Debug Output

I see this output sporadically, and not on the same API call. Note the DST IP is an IPv6 address, but Cloud Shell does not enable IPv6 in the OS:
Link to gist

Console output when issue occurs (Note the IPv6 address is being used):

Error: Error when reading or editing Project Service [project-id]/trafficdirector.googleapis.com: Get "https://cloudresourcemanager.googleapis.com/v1/projects/[project-id]?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c00::5f]:443: connect: cannot assign requested address
Error: Error retrieving available container cluster versions: Get "https://container.googleapis.com/v1beta1/projects/[project-id]/locations/asia-east1-c/serverConfig?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c04::5f]:443: connect: cannot assign requested address
Error: Error when reading or editing Project Service [project-id]/trafficdirector.googleapis.com: Get "https://cloudresourcemanager.googleapis.com/v1/projects/[project-id]?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c03::5f]:443: connect: cannot assign requested address

Expected Behavior

Terraform / Google provider should respect the OS network settings and use IPv4 addresses to call out to *.googleapis.com.

Actual Behavior

tf apply / tf destroy does not always successfully complete, and will return the errors above.

Steps to Reproduce

  1. Open Google Cloud Shell (no IPv6 stack)
  2. Run tf apply or tf destroy on the linked config
  3. Most times it will succeed, but about every second attempt it report the above errors

Note, if I statically configure /etc/hosts to resolve to a specific IPv4 address - say 199.36.153.8, the above errors never occur.

Important Factoids

Authenticating using application default credentials, built into Cloud Shell.

Confirm IPv6 is not enabled on the OS:

myusername@cloudshell:~$ sudo sysctl -n net.ipv6.conf.all.disable_ipv6 && sysctl -n net.ipv6.conf.default.disable_ipv6
1
1

References

Similar issue 1 (with Go)
Similar issue 2
Workaround solution

  • b/160321706
@danawillow
Copy link
Contributor

Here's what I know so far:

Based on golang/go#25321 and hashicorp/terraform-provider-vsphere#636, something that could fix it would be to compile with CGO enabled.
The build script that I assume our release pipeline uses explicitly disables CGO. This was introduced in hashicorp/terraform#7107 because it ensures the compiled binaries are statically linked (hashicorp/terraform#6714).
If I'm reading https://blog.madewithdrew.com/post/statically-linking-c-to-go/ right, then there should be a way to resolve this without having to explicitly disable CGO. It's also possible that things are different now than they were 4 years ago when the previous issues were brought up.

@megan07, is that indeed the build script that's used for the providers? If you don't mind, could you ask around to see if anyone at HashiCorp has any ideas on this? In the meantime, marking it upstream since I think it'll be good to have open as a reference for people that run into this, but I don't expect there being much we can do on the provider end.

@shermanyin
Copy link

I'm running into similar issue intermittently as well in GCP cloud shell.

$ ~/bin/terraform --version
Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/google v3.49.0
+ provider registry.terraform.io/hashicorp/google-beta v3.49.0
+ provider registry.terraform.io/hashicorp/http v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/time v0.6.0

In my particular case, my script uploads a file to a Windows Server. I first get this error:

module.dc.null_resource.upload-scripts: Still creating... [4m50s elapsed]
Error: timeout - last error: unknown error Post "https://35.236.28.181:5986/wsman": dial tcp 35.236.28.181:5986: i/o timeout

I checked that the firewalls are opened to the IP of the cloudshell instance. I try to do terraform apply again, and I would run into these "cannot assign requested address" errors while refreshing state. e.g. my first run I get:

module.cac.module.cac-regional[0].random_shuffle.zone: Refreshing state... [id=-]

Error: Error when reading or editing ComputeNetwork "projects/[project-id]/global/networks/vpc-cas": Get "https://compute.googleapis.com/compute/v1/projects/[project-id]/global/networks/vpc-cas?alt=json": dial tcp [2607:f8b0:400e:c09::5f]:443: connect: cannot assign requested address

Then immediately I run terraform apply again, and it would fail in a different place.

google_compute_router_nat.nat: Refreshing state... [id=[project-id]/us-west2/router/nat]

Error: Error when reading or editing Storage Bucket "pcoip-scripts-7d731c": Get "https://storage.googleapis.com/storage/v1/b/pcoip-scripts-7d731c?alt=json&prettyPrint=false": dial tcp [2607:f8b0:400e:c07::80]:443: connect: cannot assign requested address

Finally, 3rd time it would let me type "yes" to apply the changes, but it will fail again timing out trying to upload the files. We run this same script a few times a week but most of the time there are no issues.

$ sysctl  net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
$ sysctl  net.ipv6.conf.default.disable_ipv6
net.ipv6.conf.default.disable_ipv6 = 1

@bharathkkb
Copy link

@c2thorn @rileykarson some of the team members have been running into this lately. Any possibilities for a fix?

/cc @daniel-cit

@ocervell
Copy link

ocervell commented May 6, 2021

same here, could you give steps to resolve ?

@ferrarimarco
Copy link

Hi there :)

We experienced this as well in a relatively long terraform apply (5-6 mins), running from Cloud Shell. Thanks for your support!

Error: Error creating service account: Post "https://iam.googleapis.com/v1/projects/[REDACTED_PROJECT_ID]/serviceAccounts?alt=json&prettyPrint=false": dial tcp [REDACTED_IP_V6_ADDRESS]:443: connect: cannot assign requested address

/cc @jbrook

@isimluk
Copy link

isimluk commented Jul 6, 2021

Quick and dirt plug:

# Workaround https://github.com/hashicorp/terraform-provider-google/issues/6782
    sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1 > /dev/null
    export APIS="googleapis.com www.googleapis.com storage.googleapis.com iam.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com"
    for name in $APIS
    do
      ipv4=$(getent ahostsv4 "$name" | head -n 1 | awk '{ print $1 }')
      grep -q "$name" /etc/hosts || ([ -n "$ipv4" ] && sudo sh -c "echo '$ipv4 $name' >> /etc/hosts")
    done
# Workaround end

@lirlia
Copy link

lirlia commented Sep 14, 2022

get all gcp api endpoints

gcloud services list --available --filter="name:googleapis.com" --format "csv[no-heading](ID)" --format "value(NAME)"

@rhyas
Copy link

rhyas commented Oct 12, 2022

I can't believe this is still an issue 2+ years after the bug was opened.

@Jubblin
Copy link

Jubblin commented Nov 4, 2022

I have exactly the same issue when executing from a mac

@sean9999
Copy link

sean9999 commented Nov 9, 2022

+1

@rubber-ant
Copy link

any update on this ?

@jlenuffgsoi
Copy link

2023 : this is still an issue. I also encounter this problem.

@liamstevens
Copy link

Another confirmation that this is still occurring. Very painful.

@kevin-dimichel
Copy link

Error: Error retrieving available secret manager secret versions: Get "https://secretmanager.googleapis.com/v1/projects//secrets/<SECRET_NAME>/versions/latest?alt=json": Post "https://oauth2.googleapis.com/token": dial tcp [2607:f8b0:400f:807::200a]:443: connect: no route to host

While on a different than the OP, I recently encountered a similar issue ^ on my macOS system. For me, the resolution was changing the network WI-FI settings for DNS (from my ISP's router to a public DNS (like 1.1.1.1)). After this change, terraform plan and terraform apply were successful. Maybe this will help other users too.

@rpjeff
Copy link

rpjeff commented Nov 14, 2023

The suggested work around by @kevin-dimichel ( change DNS to 1.0.0.1 and 1.1.1.1 ) fix this for me.

@melinath melinath added the persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work label Apr 5, 2024
@pspot2
Copy link

pspot2 commented Apr 15, 2024

Can confirm this with Google CloudShell.

@melinath
Copy link
Collaborator

melinath commented Apr 16, 2024

I've been looking into this and it looks like it should be possible for us to resolve on the provider side. We should be able to use nettest.SupportsIPv6 to detect whether the current environment supports IPv6 and then force the transport layer to use IPv4 if not. Something like adding the following after this line:

client.Transport = headerTransport

client.Transport.DialContext = func(ctx context.Context, network string, addr string) (net.Conn, error) {
	d := &net.Dialer{}
	if !nettest.SupportsIPv6() {
		return d.DialContext(ctx, "tcp4", addr)
	}
	return d.DialContext(ctx, network, addr)
}

However, I can't actually reproduce this bug on cloud shell, so I can't tell if the fix actually works. If anyone has a configuration that consistently and quickly causes this error in cloud shell, that would be extremely helpful!

EDIT: apparently the override isn't quite that simple, continuing to dig, but still - reproducible cases would be great. Alternative fix would be to force setting the GODEBUG=netdns=cgo when initializing the config, but that is definitely hackier than I would prefer (and may also not work.)

yaqs/47302089738551296

@der-ali
Copy link

der-ali commented May 23, 2024

I am facing similar issue with api.cloudflare.com

@nhairs
Copy link

nhairs commented Jun 28, 2024

I had similar issue and resolution to kevin-dimichel (above)

% terraform init

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...

│ Error: Failed to query available provider packages

│ Could not retrieve the list of available versions for provider hashicorp/aws: could not
│ query provider registry for registry.terraform.io/hashicorp/aws: the request failed after
│ 2 attempts, please try again later: Get
│ "https://registry.terraform.io/v1/providers/hashicorp/aws/versions": dial tcp
│ [2600:9000:2212:ee00:16:1aa3:1440:93a1]:443: connect: network is unreachable

Version Info:

  • Terraform: 1.8.5
  • Debian 12.5
  • Linux 6.1.0-21-amd64

I resolved this by overriding the DNS servers for both IPv4/6 with the Quad9 servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
persistent-bug Hard to diagnose or long lived bugs for which resolutions are more like feature work than bug work service/terraform size/m
Projects
None yet
Development

Successfully merging a pull request may close this issue.