← Back to articles
Kubernetes

Kubernetes Auto-Scaling using External Custom Metrics

#k8s#hpa#autoscaling#prometheus

Kubernetes Auto-Scaling using External Custom Metrics

Traditional Kubernetes Horizontal Pod Autoscaling (HPA) typically scales based on CPU and memory usage. However, depending on their purpose, some applications are not suited to scaling based purely on CPU or memory resource usage. This guide demonstrates how to scale your Kubernetes workloads based on custom metrics like requests per second (RPS).

Why Custom Metrics?

While CPU and memory metrics work well for many applications, some workloads benefit from scaling on application-specific metrics:

  • Web applications: Scale on requests per second or response time
  • Message queue consumers: Scale on queue depth
  • API services: Scale on active connections or request latency

Architecture Overview

In this guide, we'll use:

  • Nginx Ingress Controller to expose metrics
  • Prometheus-to-sd sidecar to export metrics to Stackdriver
  • Stackdriver adapter to make metrics available to Kubernetes HPA
  • Horizontal Pod Autoscaler to scale based on the custom metric

Prerequisites

Before starting, ensure you have:

  • A Kubernetes cluster (GKE recommended)
  • kubectl command-line tool configured
  • Helm and Tiller installed
  • Stackdriver adapter deployed

Step 1: Deploy Stackdriver Adapter

The Stackdriver adapter enables Kubernetes to use custom metrics from Google Cloud Monitoring (formerly Stackdriver):

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml

Verify the adapter is running:

kubectl get pods -n custom-metrics

Step 2: Configure Nginx Ingress Controller

Create a values file for the Nginx Ingress Controller that includes the Prometheus-to-sd sidecar container. This sidecar exports metrics from Nginx to Stackdriver:

# nginx-values.yaml
controller:
  extraContainers:
    - name: prometheus-to-sd
      image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
      command:
        - /monitor
        - --stackdriver-prefix=custom.googleapis.com
        - --source=nginx-ingress-controller:http://localhost:10254/metrics
        - --pod-id=$(POD_NAME)
        - --namespace-id=$(POD_NAMESPACE)
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

  metrics:
    enabled: true
    service:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "10254"

Step 3: Deploy Nginx Ingress Controller

Install the Nginx Ingress Controller using Helm with the custom values:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

helm install nginx-ingress ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --values nginx-values.yaml

Verify the deployment:

kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx

Step 4: Configure Horizontal Pod Autoscaler

Now create an HPA that scales based on requests per second. This example scales your deployment when RPS exceeds 1000:

# hpa-custom-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: your-app-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_requests
        selector:
          matchLabels:
            resource.type: k8s_pod
            resource.labels.namespace_name: ingress-nginx
      target:
        type: AverageValue
        averageValue: "1000"

Apply the HPA:

kubectl apply -f hpa-custom-metrics.yaml

Step 5: Verify the Setup

Check that the HPA can read the custom metrics:

kubectl get hpa app-hpa
kubectl describe hpa app-hpa

You should see output showing the current metric value and target:

NAME      REFERENCE                    TARGETS       MINPODS   MAXPODS   REPLICAS
app-hpa   Deployment/your-app          850/1000      2         10        2

Monitoring and Troubleshooting

View Available Custom Metrics

List all custom metrics available in your cluster:

kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

Check Nginx Metrics

Access the Nginx Ingress Controller metrics endpoint:

kubectl port-forward -n ingress-nginx deployment/nginx-ingress-controller 10254:10254
curl http://localhost:10254/metrics

Debug HPA Issues

If the HPA shows "unknown" for metrics:

  1. Verify Stackdriver adapter is running
  2. Check that metrics are being exported to Stackdriver
  3. Ensure the metric name matches exactly in your HPA configuration
  4. Review adapter logs: kubectl logs -n custom-metrics deployment/custom-metrics-stackdriver-adapter

Best Practices

  1. Set appropriate thresholds: Test your application to determine realistic scaling thresholds
  2. Use stabilization windows: Configure behavior in HPA to prevent flapping
  3. Monitor costs: Scaling on custom metrics can increase GCP monitoring costs
  4. Combine metrics: Consider using multiple metrics (CPU + RPS) for more robust scaling

Conclusion

Custom metrics-based auto-scaling provides more precise control over your Kubernetes deployments. By scaling on application-specific metrics like requests per second, you can ensure your services maintain performance under varying loads while optimizing resource usage.

This approach is particularly valuable for:

  • High-traffic web applications
  • API services with variable load patterns
  • Applications where CPU/memory doesn't correlate with actual load

Remember to monitor your applications and adjust thresholds as needed to find the optimal scaling configuration for your workload.