Skip to content

OCI Cluster autoscaler not using secondary nodepool, when OCI is Out of Host Capacity #8700

@tobmad

Description

@tobmad

Which component are you using?:
Cluster-autoscaler

What version of the component are you using?:
1.32.2

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Server Version: v1.32.1

What environment is this in?:

OCI

What did you expect to happen?:
When a nodepool is marked as status: Unhealthy due to OCI being Out of Host Capacity, then it should stop trying to schedule Pending pods on that nodepool and switch to a different nodepool with lower priority.

What happened instead?:
Cluster autoscaler keeps trying to template the Pending pod on an upcoming node, that is not getting created. As seen from the logs:

Pod can be moved to template-node-for--upcoming-0

Same nodepool is set to Unhealthy in the cluster-autoscaler-status configmap, due to OCI being out of capacity and unable to remove the upcoming node, since OCI has not given upcoming nodes an ID yet

Found 1 instances with errorCode OutOfResource.InternalError in nodeGroup
Deleting 1 from node group because of create errors
Error while trying to delete nodes from: Node doesn't have an instance id so it can't be deleted.

How to reproduce it (as minimally and precisely as possible):
Create 2 nodepools with different instance types in OKE cluster, where one nodepool is out of capacity on OCI. And configure Cluster autoscaler with different priority, with the nodepool that is out of capacity having higher priority. Then create new pods to be scheduled on these nodepools and wait for cluster-autoscaler to mark the nodepool as unhealthy and not scale up the secondary nodepool

Anything else we need to know?:
Not sure if it changes the behaviour, but the nodepool is trying scale from 0 nodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions