-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AzureDisk support for vmss nodes #59716
Conversation
/assign @khenidak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some functions like GetNextDiskLun
have too many duplicated code with original vmav code.
Functions like AttachDisk
, DetachDisk
should depend on #59693 which is removing duplicated code.
func (ss *scaleSet) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error { | ||
ssName, instanceID, vm, err := ss.getVmssVM(string(nodeName)) | ||
if err != nil { | ||
if err == ErrorNotVmssInstance { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is weird, in the code logic, it uses vmss.AttachDisk, while error says it should not use vmss.AttachDisk, should return to use vmas.AttachDisk ? Is there a real case for such issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If master nodes are managed by as, and pods are running on master nodes, then there will be such problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Also will allow us to support users who have mixed clusters (avsets + scale sets). I do however strongly believe that we should use controller common as the abstraction layer.
|
||
// GetDiskLun finds the lun on the host that the vhd is attached to, given a vhd's diskName and diskURI | ||
func (ss *scaleSet) GetDiskLun(diskName, diskURI string, nodeName types.NodeName) (int32, error) { | ||
_, _, vm, err := ss.getVmssVM(string(nodeName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is only different from Original GetNextDiskLun
, I would encourage to combine these two different GetNextDiskLun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #59716 (comment)
func (ss *scaleSet) DetachDiskByName(diskName, diskURI string, nodeName types.NodeName) error { | ||
ssName, instanceID, vm, err := ss.getVmssVM(string(nodeName)) | ||
if err != nil { | ||
if err == ErrorNotVmssInstance { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
// GetNextDiskLun searches all vhd attachment on the host and find unused lun | ||
// return -1 if all luns are used | ||
func (ss *scaleSet) GetNextDiskLun(nodeName types.NodeName) (int32, error) { | ||
_, _, vm, err := ss.getVmssVM(string(nodeName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
attached[diskName] = false | ||
} | ||
|
||
_, _, vm, err := ss.getVmssVM(string(nodeName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
Please note that VirtualMachineScaleSetVM and VirtualMachine are different data structures (not interfaces). Although disk operations are in same logic with original VirtualMachine, some duplication are still required now. Update: VirtualMachineScaleSetVM and VirtualMachine are using different api versions now. They should use same api version in the future and we can merge functions togather then. |
@rootfs this cmmit 0ca2690 removes instanceid truncating code:
Do you know why you would truncate instance id, e.g. from I have checked the code, it looks ok if no truncating of instance id, still want to confirm with you, thanks. |
Controller Common is the abstraction and this is where we should do if vmss or avset). Let us not default to VMSS now. |
|
||
if resp != nil { | ||
// HTTP 4xx or 5xx suggests we should retry | ||
if 399 < resp.StatusCode && resp.StatusCode < 600 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not correct status code such as 403 are terminal in all cases. The will occur if the service principal expired, or principal/MSI/EMSI don't have proper permission
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If principal expired, we should surely retry the API call. Isn't this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if retry will exhaust API quota quicker, then we need to be more frugal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is same with shouldRetryAPIRequest, just with a different param (http.Response instead of autorest.Response). It doesn't change existing retry logic.
|
||
// AttachDisk attaches a vhd to vm | ||
// the vhd must exist, can be identified by diskName, diskURI, and lun. | ||
func (ss *scaleSet) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to find a way to find if a VM is part of availability set or Scale Set. We can not try fail then retry. This information is either part of the VM/config/node label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is solved by availabilitySetNodesCache, which holds a list of VMs not managed by vmss.
func (ss *scaleSet) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error { | ||
ssName, instanceID, vm, err := ss.getVmssVM(string(nodeName)) | ||
if err != nil { | ||
if err == ErrorNotVmssInstance { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Also will allow us to support users who have mixed clusters (avsets + scale sets). I do however strongly believe that we should use controller common as the abstraction layer.
|
||
computerName := strings.ToLower(*vm.OsProfile.ComputerName) | ||
localCache[computerName] = ssName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why break? This for loop is aiming to hold all VMs.
If vmType is set to vmss, we are default to vmss because most nodes are expected running on vmss in such case. Updated comments #59716 (comment). We could merge them together in the future after vmss and vm are using same compute api version, but currently they should still be separated. Opened #59736 to track this issue. |
This is because the last part of VMSS instances are numbers, which may be same if there are multiple VMSS within same cluster.
vmss cache PR has been merged. Made another rebase. |
/LGTM Let's merge :-) |
/approve |
@@ -0,0 +1,182 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
// vmType is Virtual Machine Scale Set (vmss). | ||
ss, ok := c.cloud.vmSet.(*scaleSet) | ||
if !ok { | ||
return fmt.Errorf("error of converting vmSet (%q) to scaleSet", c.cloud.vmSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also dump VMType for diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
} | ||
|
||
// AttachDisk attaches a vhd to vm. The vhd must exist, can be identified by diskName, diskURI, and lun. | ||
func (c *controllerCommon) AttachDisk(isManagedDisk bool, diskName, diskURI string, nodeName types.NodeName, lun int32, cachingMode compute.CachingTypes) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments on how this works. TBH, this flow looks quite cryptic to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in the new commit
"time" | ||
|
||
"github.com/Azure/azure-sdk-for-go/arm/compute" | ||
"github.com/Azure/azure-sdk-for-go/arm/disk" | ||
"github.com/Azure/azure-sdk-for-go/arm/network" | ||
"github.com/Azure/azure-sdk-for-go/arm/storage" | ||
computepreview "github.com/Azure/azure-sdk-for-go/services/compute/mgmt/2017-12-01/compute" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why rename it? This rename causes lots of diffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an existing package with name compute
: "github.com/Azure/azure-sdk-for-go/arm/compute"
Lets clear the test and merge. Thanks a lot for this (and the rest of the VMSS work) been a long time coming! |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brendandburns, feiskyer, khenidak The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these OWNERS Files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 59489, 59716). If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
This PR adds AzureDisk support for vmss nodes. Changes include
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #43287
Special notes for your reviewer:
Depending on #59652 (the first two commits are from #59652).Release note: