-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Equinix Metal metadata service #680
Conversation
Hi @displague , thanks for the contribution. Are you looking for review of this yet? I noticed this is still in draft state and tests don't all pass. @mitechie notified me you were looking for reviewers, but I just want to make sure it's not premature. |
@TheRealFalcon I'm looking for some direction, actually. I've provided examples of the available metadata. I'm not sure how best to take advantage of what is offered. What I include in the PR is the lowest common denominator implementation based on what fits from the Alibaba provider. I've not attempted to take advantage of any of the EquinixMetal specific features to match them to available CloudInit features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make this actually work on distros that use the cloud-init generator (both centos 8 and current ubuntu do), you'll need to change tools/ds-identify also.
You should hopefully be able to follow how to make those changes. And then also you'll want to add tests to tests/unittests/test_ds_identify.py
And then you should also add something to doc/rtd/topics/datasources/
@smoser Thanks for the hints. I've left some TODO's where I'm not sure how this provider should behave. For example, some but not all models will return a Product or Vendor Id in DMI that reports, 'Packet' (or some variation). This is not common, and even if it were, this value would eventually report 'Equinix Metal' (or some variation) in newer models and reprovisioned devices. I can not rely on DMI to detect that an instance is running in Equinix Metal. The only factor that I am confident of is that the metadata service, which is available at metadata.platformequinix.com, will be accessible (on a public IP IPv4 address, different between facilities). I can also be certain of the iqn pattern within that metadata. A user may choose to provision Equinix Metal without a public address, in these cases I do not have a way to identify that the instance is running on Equinix Metal. I'd really like to take better advantage of the additional metadata provided outside of the |
When an EquinixMetal DS is well-known to cloud-init, and for distributions that include a sufficient version of cloud-init, the Equinix Metal pre-installed cloud.cfg will offer an EquinixMetal DS, rather than EC2. (This can be seen in the tinkerbell/osie link included in the PR description). |
}, | ||
'EquinixMetal': { | ||
'ds': 'EquinixMetal', | ||
'mocks': [], # TODO how do I mock metadata responses? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't. at least not for ds-identify. It only identifies via locally available data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add DS support for the models that do report "Packet" back somewhere in the DMI, but I would have to survey the available models to identify any variations in the fields and values.
Should I (can I?) remove DS support and only rely on metadata service detection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "DS support".
Maybe some background would help you to understand what ds-identify does here.
-
We want vendors to be able to make OS images (like https://cloud-images.ubuntu.com/) that "just work" wherever you try to run them, and an OS with cloud-init installed but not on a cloud would not do anything differently.
datasource_list
can configure which datasources will be searched, but ideally there would be no need for such a thing. cloud-init would just "do the right thing". -
Previously, cloud-init (in python) would walk through each datasource in datasource_list and try to get data. That meant that boot was always impacted (cloud-init always ran). On EC2, that meant doing a dhcp and checking to see if the metadata service was there. That is obviously less than ideal. It was slow, and meant if you booted such an image elsewhere, and there happened to be http://169.254.169.254/latest/user-data, then it would execute that code.
Now, with ds-identify if it determines that the system is not on a cloud platform, cloud-init does not run at all. From systemd's perspective, cloud-init.target is not even enabled. But in order to do that... we only look at local data. We want those checks to be very fast, and thus far, they are. When ds-identify finds that it is on Equinox, as told to it by DMI data, it knows that it will find an equinox metadata service (or... if someone is lying to it, then failure is somewhat expected. As an example... if cpu identifies itself as x86_64, but didn't implement some of the interfaces, you'd expect that a program might fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there examples of providers that have no local representation (no guarantee of identifiable DMI ids)?
The presence of packet.net (or equinix) in metadata.platformequinix.com/2009-04-04/iqn is the only approach that will work for a majority of our infrastructure (that I am aware of). Very few devices report Packet somewhere in their DMI (that I am aware of).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smoser Is there a possible alternative to dmi data? We can likely go that route for our own machines, but should not be an expectation for making use of tinkerbell. We can control kernel boot args very easily, can we have cloud-init also check there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait this would apply at runtime not install-time 🤦♂️ so we don't have as much control over kernel args :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smoser Tinkerbell has an EM descendent metadata service and deployment architecture, but in that environment users bring their own hardware and DMI stamping may not be possible.
Related issue: tinkerbell/cluster-api-provider-tinkerbell#6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving Tinkerbell concerns out of this PR, but I was hopeful that we could leverage this PR in some way in support of https://tinkerbell.org/ environments later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there examples of providers that have no local representation (no guarantee of identifiable DMI ids)?
MAAS. It sounds like MAAS is something very similar to what you're working on.
The way MAAS works is:
-
network booted environment sends cmdline with 'ci.ds=MAAS'
ds-identify generically reads the ci.ds kernel parameter to be declaring
the datasource to use. -
installed system declares the datasource_list to have only MAAS in it.
during the install, maas writes a cloud-init config file to the target system. that declares 'datasource_list' to just have MAAS and ds-identify reespects that.
Well... perhaps we can just ignore it and wait until all nodes report Remember that, generally speaking, this change mostly affects new instances, which then perhaps are provisioned on new nodes that have the new values ? Ie, upgrade isn't terribly important here. An old node that didn't realize it was on Equinix will just continue to not know that after a cloud-init upgrade.
Yes please. Such a change might end up causing you to do your own DS rather than riding on Amazon, but that is fine. |
@smoser Is this to say that local datasource detection is the only option? |
Polling "arbitrary" network sources for information without definitive data suggesting its presence is something cloud-init has not done since sometime in 2018. So the options are:
Again I'd like to point out that MAAS datasource relies on 'b', and for a platform that is doing an install, that just requires putting down that local indication during the install. |
Support may be in more device models than I originally thought. Which fields are populated may not be entirely consistent, but there may be some common ground. I'll survey the platform plans to confirm. For example, on a c3.small.x86:
The t1.small.x86 has no packet or equinix strings in dmidecode or /sys/class/dmi files. |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon. (If the pull request is closed, please do feel free to reopen it if you wish to continue working on it.) |
Welp. I'll reopen this when the internal discoveries are available and some form of DS identify based on hardware is available. |
It is acceptable... its just less than ideal. there are other examples of it. see #827. |
Hey there. This branch was reopened because we saw over on https://bugs.launchpad.net/cloud-init/+bug/1745920 that you're looking for work on this to be continued. Since it has been closed for a while, I'm guessing there's still work to do on your end before this PR would be considered ready to ready or merge. Is that correct? |
Something that Equinix Metal support should add is the This would be consumed as:
|
@TheRealFalcon I think a first step would be to add support for an EquinixMetal datastore, even though it will not be a detectable environment (ds= will need to be supplied). |
@displague , I'm not entirely sure the context here, but in order for |
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
A relative of this code is used to deploy cloud-config data on Equinix Metal nodes: https://github.com/tinkerbell/osie/blob/71da59e10bf66cc352366615060171939412d397/docker/scripts/osie.sh#L363-L369 This change makes the EM datasource match that timeout setting. Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
@TheRealFalcon "customdata" is one field that is returned in the /metadata response, it is not a separate API path.
I expected that this field would be available using the current EC2 2009-04-04 support (pointed at the metadata URL above). EM devices today include an datasource_list: [Ec2]
datasource:
Ec2:
timeout: 60
max_wait: 120
metadata_urls: [ 'https://metadata.platformequinix.com/metadata' ]
dsmode: net However, when I tried to use
|
Is everything else at that address getting attached to your metadata as you'd expect? Does the cloud-init.log tell you anything about attempts or failures in fetching the correct URL? EC2 checks a |
https://gist.github.com/displague/b9f360125e0b8c04d3c53a0f8b3b68b2
(In this example I don't have customdata values set, but I get the same error when there are keys in the customdata) |
I see what's happening. The
|
I presume if |
I may want to pursue getting that field added to the metadata response while this PR remains unpackaged. |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.) |
Proposed Commit Message
The Equinix Metal (formerly Packet.com) metadata offers a data rich metadata with some compatibility with the AWS metadata service, not as much compatibility as some other providers. I based this branch on the AliYun provider and some things are still definitely wrong for Equinix Metal use.
Equinix Metal does have other metadata values that could be pieced together here. By opening this draft PR, I'm hoping to sync the cloud-init expectations with the features of the available metadata service.
Additional Context
The Equinix Metal metadata service is the basis for the Tinkerbell metadata service, so support for Tinkerbell will benefit from this integration.
Equinix Metal sample metadata: (c3.small.x86 / "Layer3" network configuration):
https://gist.github.com/displague/dff585c6ef57510c37f475b6c68e6427
Tinkerbell's metadata service (2009-04-04 AWS compatibility):
https://github.com/tinkerbell/hegel/blob/a3138d417536903dcdedd674d634e13ebc19fc79/http_handlers.go#L23-L45
Equinix Metal devices create this cloud.cfg on OS disks following image-based provisioning: https://github.com/tinkerbell/osie/blob/71da59e10bf66cc352366615060171939412d397/docker/scripts/osie.sh#L363-L369
Test Steps
Checklist: