KEDA — LearnwithVishnu

⚡KEDA

BeginnerEngineerProductionArchitectEvent-driven autoscaling — Kafka lag, scale to zero, batch jobs, 60+ scalers

What is KEDA ScaledObject ScaledJob Interview Q&A

⚡ What is KEDA?

›

HPA vs KEDA — the key gap

Metric	HPA	KEDA
CPU/memory	✅ Native	✅ Via Prometheus trigger
Kafka lag	❌	✅ Native Kafka trigger
SQS/Service Bus	❌	✅ Native cloud triggers
Custom Prometheus	⚠️ Needs adapter	✅ Direct query
Scale to zero	❌ Min 1 replica	✅ Min 0 replicas
ScaledJob (batch)	❌	✅ Native

Real Scenario — Telecom Alarm ProcessingA telecom alarm notification service processed network events from Kafka. Before KEDA: 3 replicas always running. After KEDA Kafka trigger: 0 replicas at 2am (no alarms), auto-scales to 8+ during morning alarm storms when hundreds of network elements report simultaneously. Consumer lag never exceeds 50 messages. Cost: 65% reduction on that service alone. The business benefit: alarms are processed faster during peak (more pods), less waste during quiet periods (zero pods).

Install KEDA + concept overview

📄 ScaledObject

›

Kafka + Azure Service Bus + Prometheus triggers

🔨 ScaledJob — Batch Processing

›

ScaledJob vs ScaledObject

ScaledObject: wraps a Deployment. Scales a long-running consumer up/down based on queue depth. The pods keep running, processing messages.

ScaledJob: creates Kubernetes Jobs. Each job processes a fixed batch and exits. Use for: report generation, data migration, one-shot tasks where each unit of work should be isolated.

ScaledJob + TriggerAuthentication with IRSA

⚡ How KEDA Works

›

KEDA fills the gap that HPA cannot fill

Standard HPA scales on CPU and memory only. Most real workloads need to scale on business metrics: queue depth, message count, custom Prometheus metrics. KEDA extends HPA to support 60+ event sources natively.

External trigger (Queue, Kafka, Cron, Prometheus)
    ↓
KEDA Metrics Adapter — reads trigger, converts to K8s metrics
    ↓
KEDA Operator — creates/manages HPA on your behalf
    ↓
HPA scales Deployment/StatefulSet/Job
    ↓
Scale to ZERO when no messages (HPA minimum is 1 — KEDA goes to 0)

Feature	HPA	KEDA
Scale metric	CPU, Memory only	60+ sources: Kafka, SQS, RabbitMQ, Prometheus, Cron, HTTP...
Scale to zero	No — min 1 replica	Yes — 0 replicas when no events
ScaledJob	No	Yes — K8s Job per message, not long-running Deployment

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

📦 ScaledObject Examples

›

Kafka trigger — scale consumers with queue depth

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: payment-processor-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: payment-processor
  minReplicaCount: 0           # scale to ZERO when queue empty
  maxReplicaCount: 50
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka-broker:9092
      consumerGroup: payment-processors
      topic: payment-events
      lagThreshold: "100"      # 1 replica per 100 unprocessed messages
    authenticationRef:
      name: kafka-trigger-auth

Prometheus trigger

triggers:
- type: prometheus
  metadata:
    serverAddress: http://prometheus.monitoring:9090
    metricName: active_checkout_sessions
    threshold: "50"
    query: sum(active_checkout_sessions{app="checkout"})

Cron trigger — scale on schedule

triggers:
- type: cron
  metadata:
    timezone: Asia/Dubai
    start: "0 8 * * 1-5"      # scale up at 8am weekdays
    end: "0 18 * * 1-5"       # scale down at 6pm weekdays
    desiredReplicas: "10"

🎯 Interview Questions

›

KEDA · ENGINEER

What is KEDA and why is it better than HPA for event-driven workloads?

KEDA is Kubernetes Event-Driven Autoscaling. It extends the native Horizontal Pod Autoscaler to support scaling based on external event sources — Kafka consumer lag, message queue depth, HTTP request rate, custom Prometheus metrics. The fundamental limitation of HPA: it scales based only on CPU and memory. If you have a Kafka consumer processing messages, its CPU usage might be low even when the queue has 100,000 messages backed up — HPA would not scale it up. KEDA directly queries the Kafka consumer group lag and scales based on that real signal. The other critical KEDA capability: scale to zero. HPA has a minimum of 1 replica. KEDA can scale to 0. For event-driven workloads that are idle most of the time — batch jobs, off-hours processors, weekend report generators — scale-to-zero saves significant cost. Real scenario: at HPE, a telecom alarm notification service ran 24/7 at 3 replicas even though 90% of alarms arrive during business hours. After KEDA with Kafka trigger: 0 replicas at night, scales to 5+ during alarm storms, automatically. Monthly compute cost for that service dropped by 65%.

KEDA · ARCHITECT

How do you use KEDA for cost optimisation in a Kubernetes environment?

Scale-to-zero is the primary cost optimisation. Identify services that have predictable idle periods: batch processors, report generators, dev/staging environments overnight, event consumers in low-traffic periods. For each: replace fixed replica counts with KEDA ScaledObjects. The pattern: minReplicaCount: 0, activationLagThreshold to wake up quickly when work arrives, cooldownPeriod long enough to avoid flapping. For dev/staging: KEDA cron scaler shuts down all non-production namespaces at 7pm and restores them at 8am. Zero pods = zero node cost if using node autoscaling. Combined with Cluster Autoscaler: KEDA scales pods to zero, CA sees empty nodes, terminates them. When KEDA scales pods back up, CA adds nodes. Full infrastructure elasticity. Additional pattern: HTTP add-on KEDA scaler. Scale web services to zero during off-hours based on HTTP traffic. When first request arrives, KEDA scales from 0 to 1 in seconds (with a small queue to absorb the cold start). At HPE: we estimated 40% monthly compute savings by implementing KEDA across 8 non-critical services that had fixed replicas during predictable idle periods.

KEDA · ENGINEER

What is KEDA and what problem does it solve that standard HPA cannot?

KEDA (Kubernetes Event-Driven Autoscaling) extends Kubernetes HPA to scale workloads based on external event sources. Standard HPA scales on CPU and memory — works well for web services but fails for message queue consumers, batch processors, and scheduled workloads. KEDA supports 60+ scalers: Kafka (scale on consumer group lag), Azure Service Bus (scale on queue depth), AWS SQS, RabbitMQ, Redis, Prometheus (any custom metric), Cron (schedule-based). The unique capability: scale to ZERO. Standard HPA minimum is 1 replica. KEDA scales a Deployment to 0 when no events exist, saving compute cost entirely. When events arrive, KEDA scales from 0 to N automatically. ScaledJob: instead of a long-running consumer Deployment, create one Kubernetes Job per message — truly serverless processing on Kubernetes. Real use case: payment event processor with Kafka. Queue empty at night = 0 pods, zero cost. Queue fills at 9am = KEDA scales to 20 consumers. Each consumer processes 100 messages per minute. KEDA adjusts replica count every 15 seconds based on consumer group lag.

KEDA · ENGINEER

What is the difference between ScaledObject and ScaledJob?

ScaledObject manages scaling of a long-running Deployment or StatefulSet. The pods run continuously, consuming messages from the queue as they arrive. KEDA adjusts the number of replicas based on queue depth. Best for: consumers that maintain connection pools, stateful processing, low-latency requirements. ScaledJob creates a new Kubernetes Job for each unit of work. Each job processes one message (or a batch) and terminates. Best for: stateless processing where each message is independent, variable processing time (some messages take 10 seconds, some 10 minutes — with a Deployment slow messages block others, with ScaledJob each gets its own pod), GPU-intensive or memory-intensive workloads where you want complete isolation per task, image processing, video transcoding, ML inference jobs. When a message arrives in the queue and no pods exist: ScaledJob triggers KEDA to create a Job. The Job pod starts, processes the message, and terminates. Cost is zero when idle. Parallel processing: 1000 items in queue can trigger 1000 simultaneous Jobs (up to maxReplicaCount), processing all in parallel.

KEDA · PRODUCTION

KEDA is not scaling despite messages in the queue. How do you troubleshoot?

Step 1: check ScaledObject status. kubectl describe scaledobject payment-scaler -n production. The Conditions section shows Active=True (scaling), Active=False (no events), or Error=True with message. Step 2: check KEDA operator logs. kubectl logs -n keda deploy/keda-operator. Shows authentication failures or connection errors. Step 3: check the HPA KEDA created. kubectl get hpa -n production. kubectl describe hpa keda-hpa-payment-scaler shows current metric value and desired replicas. If desired replicas is correct but pods do not start — issue is K8s scheduling, not KEDA. Step 4: check TriggerAuthentication. If credentials rotated (Kafka password, ServiceBus connection string changed) the trigger fails silently. Verify the secret exists and is correct. Step 5: test the trigger source directly. For Kafka: kafka-consumer-groups.sh --bootstrap-server kafka:9092 --group my-group --describe shows the actual lag. If lag is 0: no messages, KEDA is correctly at 0. If lag is high but KEDA shows 0: connectivity or auth issue. Step 6: cooldownPeriod — if the Deployment just scaled down, KEDA waits the cooldown period before scaling up again. Reduce cooldownPeriod for testing.

Continue Learning

☸️ Kubernetes 📨 Kafka 🔥 Prometheus 🏠 All Topics