-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage capacity #248
Storage capacity #248
Conversation
3f7ea83
to
986fbad
Compare
I can split out the locking change into a separate PR, if desired. It's already provided as a separate commit. |
986fbad
to
4ba92c3
Compare
/test pull-kubernetes-csi-csi-driver-host-path-distributed-on-kubernetes-master |
/assign @verult |
e2fd59b
to
2317581
Compare
name: csi-data-dir | ||
|
||
- name: hostpath | ||
image: k8s.gcr.io/sig-storage/hostpathplugin:v1.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I had "canary" here with a TODO. I think we can switch directly to using the next release number, then we avoid some churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool! Largely looks good to me
pkg/hostpath/hostpath.go
Outdated
volumes: map[string]hostPathVolume{}, | ||
snapshots: map[string]hostPathSnapshot{}, | ||
} | ||
hp.discoverExistingSnapshots() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: does the order of operations matter between discovering snapshots and volumes? Wondering why you moved snapshots before volumes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters and I didn't have any particular reason for changing it. I can revert the order, just to be safe.
// gRPC calls involving any of the fields below must be serialized | ||
// by locking this mutex before starting. Internal helper | ||
// functions assume that the mutex has been locked. | ||
mutex sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of locking/unlocking in top level GRPC functions only where needed (and having to inspect the helper code to identify where locking should take place), what do you think about moving these 3 fields into a separate struct and define read/write methods that lock/unlock at the top of the method body?
e.g.
type hostPathCache { // TODO need better name
mutex sync.Mutex
volumes map[string]hostPathVolume
snapshots map[string]hostPathSnapshot
}
func (c *hostPathCache) GetVolume(volName string) {
c.Lock()
defer c.Unlock()
...
}
...
type hostPath struct {
...
cache *hostPathCache
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And is it worth having a RWMutex instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A production-quality driver would definitely want to be more intelligent about its locking. But this is just a demo driver where the main priority is correctness, not performance when executing many operations in parallel.
Therefore I prefer to keep the locking as simple as possible, i.e. lock in the gRPC functions after inspecting parameters and before using local state. The rule for helper functions is simple: helpers don't lock and a gRPC function must take the lock before calling them. Only a few read-only functions may (but don't have to) be treated as exceptions.
What you propose for example may have a correctness issues: consider CreateVolume
function checks for a volume with GetVolume
, then proceeds to create it. In parallel, the function is called again for the same volume and also proceeds to create it. Both complete and we leak one volume because only one of them gets stored and returned to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah thanks for the catch, sgtm
pkg/hostpath/controllerserver.go
Outdated
|
||
vol, err := createHostpathVolume(volumeID, req.GetName(), capacity, requestedAccessType, false /* ephemeral */) | ||
kind := req.GetParameters()[storageKind] | ||
vol, err := hp.createVolume(volumeID, req.GetName(), capacity, requestedAccessType, false /* ephemeral */, kind) | ||
if err != nil { | ||
return nil, status.Errorf(codes.Internal, "failed to create volume %v: %v", volumeID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create volume RPC errors codes never gets returned here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
I think all of these cases where we wrap some error can simply be replaced with returning the error or wrapping it. The result will be that instead of the non-descriptive codes.Internal
the caller will get a codes.Unknown
, which has the same effect.
There's also an opportunity to remove duplicate code by letting the getVolume/SnapshotBy...
functions return a NotFound
error themselves.
pkg/hostpath/hostpath.go
Outdated
@@ -343,6 +375,10 @@ func (hp *hostPath) deleteVolume(volID string) error { | |||
if err := os.RemoveAll(path); err != nil && !os.IsNotExist(err) { | |||
return err | |||
} | |||
glog.V(4).Infof("deleting hostpath volume: %s = %+v", volID, vol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log this before the RemoveAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I put it after the work is done to mirror the "adding hostpath volume" message, i.e. once we get that message, we know that hp.volumes
is really being changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, but maybe we could move this after all the work is done (i.e. after delete()
) and log "added/deleted hostpath volume"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's better. Changed.
// adds the volume to the list. | ||
// | ||
// It returns the volume path or err if one occurs. That error is suitable as result of a gRPC call. | ||
func (hp *hostPath) createVolume(volID, name string, cap int64, volAccessType accessType, ephemeral bool, kind string) (hpv *hostPathVolume, finalErr error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return invalid arg error if kind is specified but capacity is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added.
7f6697a
to
86b2ad1
Compare
/test pull-kubernetes-csi-csi-driver-host-path-distributed-on-kubernetes-master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
(I don't think I have approval permissions though)
pkg/hostpath/nodeserver.go
Outdated
@@ -175,9 +176,9 @@ func (hp *hostPath) NodePublishVolume(ctx context.Context, req *csi.NodePublishV | |||
} | |||
} | |||
|
|||
volume := hostPathVolumes[req.GetVolumeId()] | |||
volume := hp.volumes[req.GetVolumeId()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the requested volume ID not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. This change came from a global seach/replace, I haven't really thought about it.
There was indeed a check that the volume exists earlier and we even have a copy of that volume in vol
. So the code is correct, just a bit unusual. I think this is better:
- volume := hp.volumes[req.GetVolumeId()]
- volume.NodeID = hp.nodeID
- hp.volumes[req.GetVolumeId()] = volume
+ vol.NodeID = hp.nodeID
+ hp.updateVolume(req.GetVolumeId(), vol)
return &csi.NodePublishVolumeResponse{}, nil
pkg/hostpath/hostpath.go
Outdated
@@ -343,6 +375,10 @@ func (hp *hostPath) deleteVolume(volID string) error { | |||
if err := os.RemoveAll(path); err != nil && !os.IsNotExist(err) { | |||
return err | |||
} | |||
glog.V(4).Infof("deleting hostpath volume: %s = %+v", volID, vol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, but maybe we could move this after all the work is done (i.e. after delete()
) and log "added/deleted hostpath volume"
@@ -428,7 +428,7 @@ func (hp *hostPath) loadFromSnapshot(size int64, snapshotId, destPath string, mo | |||
out, err := executor.Command(cmd[0], cmd[1:]...).CombinedOutput() | |||
glog.V(4).Infof("Command Finish: %v", string(out)) | |||
if err != nil { | |||
return status.Errorf(codes.Internal, "failed pre-populate data from snapshot %v: %v: %s", snapshotId, err, out) | |||
return fmt.Errorf("failed pre-populate data from snapshot %v: %w: %s", snapshotId, err, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more instance of codes.Internal
in this function above. Should that be changed too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had kept it for consistency with the two error returns around it which both return a status error. But thinking about it again, I agree that it is better to also change it. Done.
The code used to share state via global variables (bad practice!) without proper locking (even worse). The different servers were implemented as individual structs, which meant that parameters had to be duplicated. A simpler approach is to have one struct with clearly defined read-only and read/write members, with coarse locking at the outer gRPC layer.
This fakes capacity by pretending to have linear storage for different kinds and subtracting the size of existing volumes from that. A test deployment with some example storage classes, an example app with a generic ephemeral inline volume, and storage capacity tracking enabled is provided for use on a Kubernetes cluster where these alpha features are enabled. When that feature is not enabled in the cluster, also the driver deployment is done without storage capacity enabled. This then serves as a test that distributed provisioning works. prow.sh can be used to test this new deployment, for example with: CSI_PROW_SANITY_POD=csi-hostpath-socat-0 \ CSI_PROW_SANITY_CONTAINER=socat \ CSI_PROW_DEPLOYMENT=kubernetes-distributed \ CSI_PROW_KUBERNETES_VERSION=1.19.0 \ ./.prow.sh
Wrapping errors with an "Internal" status error was at best useless (it does not tell the caller anything and the default "Unknown" error has the same effect) and at worst hid the actual status (when wrapping createVolume). Therefore most of those wrappers get removed. The handling of "not found" also was inconsistent. Some places checked the map directly, others made assumptions about the error returned by the helper functions. Now the helper functions themselves generate a gRPC status error which can be returned directly.
The parameter becomes invalid when the driver is not configured to have different storage kinds.
86b2ad1
to
65e48f2
Compare
/lgtm |
/assign @jsafrane For approval. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, pohly, verult The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
We need a way to demonstrate and test both distributed provisioning and storage capacity tracking.
This PR also fixes the long-standing issue of not properly protecting global variables against concurrent access.
Which issue(s) this PR fixes:
Fixes #72
Special notes for your reviewer:
This needs to be complemented by special Prow jobs that use the new deployment: kubernetes/test-infra#20848
Does this PR introduce a user-facing change?: