Skip to content

Figure out and implement custom handling for MatchInterPodAffinity predicate #257

Open
@MaciekPytel

Description

As part of working on CA performance we run a large scale-up test with some additional logging including call count and total duration spent in each predicate. The results are as follows:

E0824 09:27:54.815425       8 predicates.go:192] Predicate statistics for MaxAzureDiskVolumeCount: called 59985 times, total time 25.328538ms, mean duration 422ns
E0824 09:27:54.815489       8 predicates.go:192] Predicate statistics for MatchInterPodAffinity: called 59985 times, total time 1m48.73252767s, mean duration 1.812661ms
E0824 09:27:54.815497       8 predicates.go:192] Predicate statistics for CheckNodeCondition: called 59985 times, total time 49.05121ms, mean duration 817ns
E0824 09:27:54.815502       8 predicates.go:192] Predicate statistics for MaxEBSVolumeCount: called 59985 times, total time 24.906838ms, mean duration 415ns
E0824 09:27:54.815508       8 predicates.go:192] Predicate statistics for GeneralPredicates: called 59985 times, total time 114.434325ms, mean duration 1.907µs
E0824 09:27:54.815534       8 predicates.go:192] Predicate statistics for NoDiskConflict: called 59985 times, total time 26.067526ms, mean duration 434ns
E0824 09:27:54.815553       8 predicates.go:192] Predicate statistics for NoVolumeNodeConflict: called 59985 times, total time 38.554035ms, mean duration 642ns
E0824 09:27:54.815559       8 predicates.go:192] Predicate statistics for CheckNodeDiskPressure: called 59985 times, total time 19.062642ms, mean duration 317ns
E0824 09:27:54.815564       8 predicates.go:192] Predicate statistics for PodToleratesNodeTaints: called 59985 times, total time 22.448605ms, mean duration 374ns
E0824 09:27:54.815568       8 predicates.go:192] Predicate statistics for MaxGCEPDVolumeCount: called 59985 times, total time 61.944698ms, mean duration 1.032µs
E0824 09:27:54.815572       8 predicates.go:192] Predicate statistics for NoVolumeZoneConflict: called 59985 times, total time 64.231254ms, mean duration 1.07µs 
E0824 09:27:54.815578       8 predicates.go:192] Predicate statistics for PodFitsResources: called 357952 times, total time 514.838808ms, mean duration 1.438µs
E0824 09:27:54.815584       8 predicates.go:192] Predicate statistics for ready: called 59985 times, total time 50.931691ms, mean duration 849ns
E0824 09:27:54.815588       8 predicates.go:192] Predicate statistics for CheckNodeMemoryPressure: called 59985 times, total time 285.070472ms, mean duration 4.752µs

It turns out that MatchInterPodAffinity predicate is 3 orders of magnitude slower compared to other predicates. This is likely because contrary to scheduler we don't do any precomputation for it and we don't maintain predicateMeta object.

After a quick glance at predicate code it makes sense - it needs to iterate over all existing pods to check if any of them has pod antiaffinity on the pod we're running predicates for. This brings up another problem - how does it get all pods and nodes? We only provide NodeInfo for a single node, the rest comes out of informer. However, that means it reflects the real state of the cluster, not our simulated state. If we've already placed a pod with zone-level antiaffinity on a simulated node it won't prevent adding pods to other simulated nodes in the same zone.

Bottom line is that using zone-level antiaffinity can cause CA to "overshoot" creating some nodes for pods that won't be able to schedule on them anyway. Fortunately, this is a pretty unlikely edge case and we will scale-down the unnecessary nodes without any problem.

Metadata

Assignees

Labels

area/cluster-autoscalerarea/core-autoscalerDenotes an issue that is related to the core autoscaler and is not specific to any provider.kind/bugCategorizes issue or PR as related to a bug.kind/featureCategorizes issue or PR as related to a new feature.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions