⚡KEDA
⚡ What is KEDA?
›HPA vs KEDA — the key gap
| Metric | HPA | KEDA |
|---|---|---|
| CPU/memory | ✅ Native | ✅ Via Prometheus trigger |
| Kafka lag | ❌ | ✅ Native Kafka trigger |
| SQS/Service Bus | ❌ | ✅ Native cloud triggers |
| Custom Prometheus | ⚠️ Needs adapter | ✅ Direct query |
| Scale to zero | ❌ Min 1 replica | ✅ Min 0 replicas |
| ScaledJob (batch) | ❌ | ✅ Native |
Real Scenario — Telecom Alarm ProcessingA telecom alarm notification service processed network events from Kafka. Before KEDA: 3 replicas always running. After KEDA Kafka trigger: 0 replicas at 2am (no alarms), auto-scales to 8+ during morning alarm storms when hundreds of network elements report simultaneously. Consumer lag never exceeds 50 messages. Cost: 65% reduction on that service alone. The business benefit: alarms are processed faster during peak (more pods), less waste during quiet periods (zero pods).
Install KEDA + concept overview
📄 ScaledObject
›Kafka + Azure Service Bus + Prometheus triggers
🔨 ScaledJob — Batch Processing
›ScaledJob vs ScaledObject
ScaledObject: wraps a Deployment. Scales a long-running consumer up/down based on queue depth. The pods keep running, processing messages.
ScaledJob: creates Kubernetes Jobs. Each job processes a fixed batch and exits. Use for: report generation, data migration, one-shot tasks where each unit of work should be isolated.
ScaledJob + TriggerAuthentication with IRSA
⚡ How KEDA Works
›KEDA fills the gap that HPA cannot fill
Standard HPA scales on CPU and memory only. Most real workloads need to scale on business metrics: queue depth, message count, custom Prometheus metrics. KEDA extends HPA to support 60+ event sources natively.
External trigger (Queue, Kafka, Cron, Prometheus)
↓
KEDA Metrics Adapter — reads trigger, converts to K8s metrics
↓
KEDA Operator — creates/manages HPA on your behalf
↓
HPA scales Deployment/StatefulSet/Job
↓
Scale to ZERO when no messages (HPA minimum is 1 — KEDA goes to 0)
↓
KEDA Metrics Adapter — reads trigger, converts to K8s metrics
↓
KEDA Operator — creates/manages HPA on your behalf
↓
HPA scales Deployment/StatefulSet/Job
↓
Scale to ZERO when no messages (HPA minimum is 1 — KEDA goes to 0)
| Feature | HPA | KEDA |
|---|---|---|
| Scale metric | CPU, Memory only | 60+ sources: Kafka, SQS, RabbitMQ, Prometheus, Cron, HTTP... |
| Scale to zero | No — min 1 replica | Yes — 0 replicas when no events |
| ScaledJob | No | Yes — K8s Job per message, not long-running Deployment |
helm repo add kedacore https://kedacore.github.io/charts helm install keda kedacore/keda --namespace keda --create-namespace
📦 ScaledObject Examples
›Kafka trigger — scale consumers with queue depth
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: payment-processor-scaler
namespace: production
spec:
scaleTargetRef:
name: payment-processor
minReplicaCount: 0 # scale to ZERO when queue empty
maxReplicaCount: 50
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: kafka
metadata:
bootstrapServers: kafka-broker:9092
consumerGroup: payment-processors
topic: payment-events
lagThreshold: "100" # 1 replica per 100 unprocessed messages
authenticationRef:
name: kafka-trigger-authPrometheus trigger
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring:9090
metricName: active_checkout_sessions
threshold: "50"
query: sum(active_checkout_sessions{app="checkout"})Cron trigger — scale on schedule
triggers:
- type: cron
metadata:
timezone: Asia/Dubai
start: "0 8 * * 1-5" # scale up at 8am weekdays
end: "0 18 * * 1-5" # scale down at 6pm weekdays
desiredReplicas: "10"🎯 Interview Questions
›KEDA · ENGINEER
What is KEDA and why is it better than HPA for event-driven workloads?
KEDA is Kubernetes Event-Driven Autoscaling. It extends the native Horizontal Pod Autoscaler to support scaling based on external event sources — Kafka consumer lag, message queue depth, HTTP request rate, custom Prometheus metrics. The fundamental limitation of HPA: it scales based only on CPU and memory. If you have a Kafka consumer processing messages, its CPU usage might be low even when the queue has 100,000 messages backed up — HPA would not scale it up. KEDA directly queries the Kafka consumer group lag and scales based on that real signal. The other critical KEDA capability: scale to zero. HPA has a minimum of 1 replica. KEDA can scale to 0. For event-driven workloads that are idle most of the time — batch jobs, off-hours processors, weekend report generators — scale-to-zero saves significant cost. Real scenario: at HPE, a telecom alarm notification service ran 24/7 at 3 replicas even though 90% of alarms arrive during business hours. After KEDA with Kafka trigger: 0 replicas at night, scales to 5+ during alarm storms, automatically. Monthly compute cost for that service dropped by 65%.
KEDA · ARCHITECT
How do you use KEDA for cost optimisation in a Kubernetes environment?
Scale-to-zero is the primary cost optimisation. Identify services that have predictable idle periods: batch processors, report generators, dev/staging environments overnight, event consumers in low-traffic periods. For each: replace fixed replica counts with KEDA ScaledObjects. The pattern: minReplicaCount: 0, activationLagThreshold to wake up quickly when work arrives, cooldownPeriod long enough to avoid flapping. For dev/staging: KEDA cron scaler shuts down all non-production namespaces at 7pm and restores them at 8am. Zero pods = zero node cost if using node autoscaling. Combined with Cluster Autoscaler: KEDA scales pods to zero, CA sees empty nodes, terminates them. When KEDA scales pods back up, CA adds nodes. Full infrastructure elasticity. Additional pattern: HTTP add-on KEDA scaler. Scale web services to zero during off-hours based on HTTP traffic. When first request arrives, KEDA scales from 0 to 1 in seconds (with a small queue to absorb the cold start). At HPE: we estimated 40% monthly compute savings by implementing KEDA across 8 non-critical services that had fixed replicas during predictable idle periods.
KEDA · ENGINEER
What is KEDA and what problem does it solve that standard HPA cannot?
KEDA (Kubernetes Event-Driven Autoscaling) extends Kubernetes HPA to scale workloads based on external event sources. Standard HPA scales on CPU and memory — works well for web services but fails for message queue consumers, batch processors, and scheduled workloads. KEDA supports 60+ scalers: Kafka (scale on consumer group lag), Azure Service Bus (scale on queue depth), AWS SQS, RabbitMQ, Redis, Prometheus (any custom metric), Cron (schedule-based). The unique capability: scale to ZERO. Standard HPA minimum is 1 replica. KEDA scales a Deployment to 0 when no events exist, saving compute cost entirely. When events arrive, KEDA scales from 0 to N automatically. ScaledJob: instead of a long-running consumer Deployment, create one Kubernetes Job per message — truly serverless processing on Kubernetes. Real use case: payment event processor with Kafka. Queue empty at night = 0 pods, zero cost. Queue fills at 9am = KEDA scales to 20 consumers. Each consumer processes 100 messages per minute. KEDA adjusts replica count every 15 seconds based on consumer group lag.
KEDA · ENGINEER
What is the difference between ScaledObject and ScaledJob?
ScaledObject manages scaling of a long-running Deployment or StatefulSet. The pods run continuously, consuming messages from the queue as they arrive. KEDA adjusts the number of replicas based on queue depth. Best for: consumers that maintain connection pools, stateful processing, low-latency requirements. ScaledJob creates a new Kubernetes Job for each unit of work. Each job processes one message (or a batch) and terminates. Best for: stateless processing where each message is independent, variable processing time (some messages take 10 seconds, some 10 minutes — with a Deployment slow messages block others, with ScaledJob each gets its own pod), GPU-intensive or memory-intensive workloads where you want complete isolation per task, image processing, video transcoding, ML inference jobs. When a message arrives in the queue and no pods exist: ScaledJob triggers KEDA to create a Job. The Job pod starts, processes the message, and terminates. Cost is zero when idle. Parallel processing: 1000 items in queue can trigger 1000 simultaneous Jobs (up to maxReplicaCount), processing all in parallel.
KEDA · PRODUCTION
KEDA is not scaling despite messages in the queue. How do you troubleshoot?
Step 1: check ScaledObject status. kubectl describe scaledobject payment-scaler -n production. The Conditions section shows Active=True (scaling), Active=False (no events), or Error=True with message. Step 2: check KEDA operator logs. kubectl logs -n keda deploy/keda-operator. Shows authentication failures or connection errors. Step 3: check the HPA KEDA created. kubectl get hpa -n production. kubectl describe hpa keda-hpa-payment-scaler shows current metric value and desired replicas. If desired replicas is correct but pods do not start — issue is K8s scheduling, not KEDA. Step 4: check TriggerAuthentication. If credentials rotated (Kafka password, ServiceBus connection string changed) the trigger fails silently. Verify the secret exists and is correct. Step 5: test the trigger source directly. For Kafka: kafka-consumer-groups.sh --bootstrap-server kafka:9092 --group my-group --describe shows the actual lag. If lag is 0: no messages, KEDA is correctly at 0. If lag is high but KEDA shows 0: connectivity or auth issue. Step 6: cooldownPeriod — if the Deployment just scaled down, KEDA waits the cooldown period before scaling up again. Reduce cooldownPeriod for testing.
Continue Learning