-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Open
Labels
area/cluster-autoscalerkind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.
Description
In scale-down, there's a timeout on initiating eviction of a pod. It's controlled by max-pod-eviction-time flag, which defaults to 2 minutes.
In some scenarios this is too aggressive. Recreating a pod protected by PDB can take much longer than that, especially if things like termination grace period, startup probe or readiness probe are configured.
The user can just increase that timeout with the flag, but that's not perfect either. Eviction can fail due to other issues, like misconfiguration of the PDB or the workload.
Ideally, there would be a smarter mechanism of evicting the pods, possible improvements could include:
- having an overall node drain timeout instead of a timeout for a single pod eviction
- differentiating between 429 and 500 errors from eviction API, retrying only on 429
- making the timeout dynamic based on the workload's termination grace period, readiness probe's
initialDelaySeconds, and possibly other configurations
/kind feature
/area cluster-autoscaler
Metadata
Metadata
Assignees
Labels
area/cluster-autoscalerkind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.