AWS Complete Guide — LearnwithVishnu

AWS

BeginnerEngineerProductionArchitectAmazon Web Services — core services, VPC, IAM, EKS, cost optimisation

Regions & AZs IAM VPC EC2 S3 EKS Cost Interview Q&A Roadmap

🌍 AWS Global Infrastructure

›

What is a Region and Availability Zone?

AWS has 33+ Regions globally. Each Region is an independent geographic area with its own electricity, cooling, and networking. Each Region contains 2-6 Availability Zones (AZs). An AZ is one or more physical data centres with independent power and connectivity, connected to other AZs in the same region via private high-bandwidth links.

Why this matters for architecture: Deploying across 2+ AZs gives you automatic resilience. If one AZ goes down (fire, power failure), your application keeps running in the others. Deploying across 2+ Regions protects against regional failures and lets you serve users with lower latency.

Region	Code	Common use
Mumbai	ap-south-1	Indian companies, low latency for India
N. Virginia	us-east-1	Default, cheapest, all services available first
Singapore	ap-southeast-1	Southeast Asia
Ireland	eu-west-1	Europe, GDPR-compliant

Regions and AZ commands

🔐 IAM — Identity and Access Management

›

The Most Important AWS Concept

IAM controls who can do what to which AWS resources. Every API call to AWS is authenticated (who are you?) and authorised (are you allowed to do this?). Getting IAM wrong is the #1 cause of AWS security incidents.

IAM Concepts

Concept	What it is	When to use
User	Long-term identity with permanent credentials	Only when roles impossible. Always enforce MFA.
Role	Temporary credentials assumed by services or users	EC2, Lambda, EKS pods, cross-account — everything
Group	Collection of users	Organise humans by team. Attach policies to group.
Policy	JSON document: what actions on what resources	Attach to user, group, or role
IRSA	IAM role for Kubernetes service account (EKS)	Give pods AWS access without stored keys

Policy Evaluation Logic

By default: everything is DENIED. An explicit DENY always overrides an ALLOW (even from another policy). Only an explicit ALLOW grants access. For cross-account: BOTH the resource policy and the identity policy must allow the action.

IAM policies, roles, and IRSA

🌐 VPC — Virtual Private Cloud

›

What is a VPC?

A VPC is your private network inside AWS. All AWS resources (EC2, RDS, EKS) run inside a VPC. You control: IP address ranges, subnets, routing, and firewalls. Nothing is reachable from the internet unless you explicitly allow it.

Key Components

Component	Purpose
Subnet	Subdivision of VPC in one AZ. Public = has internet route. Private = no direct internet.
Internet Gateway	Allows public subnets to reach internet. One per VPC.
NAT Gateway	Allows private subnets to initiate internet connections (for updates etc) without being reachable from internet.
Security Group	Stateful firewall at instance level. Allow rules only.
NACL	Stateless firewall at subnet level. Allow and deny rules.
VPC Peering	Connect two VPCs privately (no internet). Non-transitive.
VPC Endpoint	Access AWS services (S3, DynamoDB) without internet traffic. Saves NAT cost.

Production VPC design

💻 EC2 — Compute

›

Choosing the Right Instance Type

Instance family tells you what it is optimised for. Size (small/medium/large/xlarge) tells you how much. Always start with general purpose (m6i), then right-size based on metrics after 2 weeks in production.

EC2 launch + Spot + SSM

🪣 S3 — Object Storage

›

S3 Fundamentals

S3 is object storage — you store objects (files) identified by a key (path). Not a filesystem. Infinitely scalable, 11 nines durability (99.999999999%). Used for: backups, static websites, data lakes, application artifacts, logs, container images (ECR), Terraform state.

S3 vs EBS vs EFS

	S3	EBS	EFS
Type	Object storage	Block storage	File system
Access	HTTP API	One EC2 at a time	Multiple instances
Use for	Backups, static files, data lake	OS disk, databases	Shared content across instances
Latency	ms (network)	Sub-ms (local)	ms (network)

S3 best practices + security

☸️ EKS — Kubernetes on AWS

›

What is EKS?

EKS is AWS's managed Kubernetes service. AWS manages the control plane (API server, etcd, scheduler) — you never touch master nodes. You manage worker nodes (EC2 instances or Fargate). The same kubectl commands work on EKS as on any Kubernetes cluster.

EKS vs ECS vs Fargate

	EKS	ECS	Fargate
What it is	Managed Kubernetes	AWS-native container orchestration	Serverless containers (no node management)
Learning curve	High (K8s knowledge needed)	Medium (AWS-specific)	Low
Use when	Team knows K8s, multi-cloud, complex	AWS-only, simpler needs	No node management wanted

EKS setup + storage + load balancers

💰 Cost Optimisation

›

Where AWS Cost Goes — and How to Reduce It

Cost Driver	Savings Strategy	Typical Saving
EC2 (on-demand)	Spot for stateless, Savings Plans, Graviton instances	40-70%
RDS	Reserved instances (1-year), right-size, stop dev instances overnight	30-50%
NAT Gateway	VPC endpoints for S3/DynamoDB traffic	20-40%
S3	Lifecycle policies to Glacier, S3 Intelligent-Tiering	50-80%
Data Transfer	CloudFront for CDN, same-region replication, compress payloads	30-60%

Cost optimisation commands

☸️ EKS — Deep Dive for Production

›

EKS architecture vs AKS — key differences to know

	AWS EKS	Azure AKS
Control plane cost	$0.10/hr per cluster ($73/month) — you pay for it	Free — Azure manages for free
Node identity	IAM Roles for Service Accounts (IRSA)	Workload Identity (Azure AD federation)
Node types	Managed Node Groups, Self-managed, Fargate	Node Pools (system + user)
Networking	VPC CNI — pods get VPC IPs	Azure CNI — pods get VNet IPs
Load Balancer	AWS Load Balancer Controller creates ALB/NLB	AGIC creates Application Gateway
Storage	EBS CSI driver, EFS CSI driver	Azure Disk CSI, Azure Files CSI

IRSA — IAM Roles for Service Accounts

IRSA is the AWS equivalent of Azure Workload Identity. It links a Kubernetes ServiceAccount to an IAM Role, allowing pods to access AWS services (S3, DynamoDB, Secrets Manager) without any stored credentials.

# 1. Create IAM OIDC provider for the EKS cluster
eksctl utils associate-iam-oidc-provider --cluster myeks --approve

# 2. Create IAM role with trust policy for the ServiceAccount
eksctl create iamserviceaccount   --name payment-service-sa   --namespace production   --cluster myeks   --attach-policy-arn arn:aws:iam::123456789:policy/PaymentServicePolicy   --approve

# 3. Pod uses the ServiceAccount — gets AWS credentials automatically
# No access keys stored anywhere

Fargate for EKS — serverless nodes

Fargate eliminates node management — each pod runs on a dedicated micro-VM. You define Fargate profiles: which namespaces/labels use Fargate vs managed node groups. Best for: burst workloads, batch jobs, dev/test environments where you don't want to manage nodes. Not suitable for: DaemonSets (cannot run on Fargate), privileged pods, GPU workloads.

EKS add-ons — managed cluster components

Add-on	What it does
CoreDNS	DNS for service discovery inside cluster
kube-proxy	Network rules on each node
VPC CNI	Pod networking with VPC IPs
EBS CSI Driver	Dynamic persistent volume provisioning with EBS
AWS Load Balancer Controller	Creates ALB for Ingress, NLB for Service type LoadBalancer

⚡ AWS Lambda and Serverless Architecture

›

Lambda — the key concepts

Lambda runs your code in response to events without you managing any servers. You deploy a function (Python, Node.js, Java, Go, etc.), configure what triggers it, and AWS scales it from 0 to thousands of instances automatically.

Concept	Explanation
Trigger	What invokes the function: API Gateway (HTTP), S3 event (file upload), SQS message, EventBridge schedule (cron), DynamoDB stream
Execution environment	Isolated container, 512MB-10GB memory, up to 15 minutes runtime, ephemeral /tmp storage
Cold start	First invocation after idle period: container initialised = 100ms-3s latency. Mitigate with Provisioned Concurrency.
Concurrency	Each concurrent request gets its own execution environment. Default limit: 1000 concurrent per region.
Pricing	Pay per invocation ($0.20 per million) + duration (per GB-second). First 1 million requests/month free.

Lambda in DevOps — common uses

Automated remediation — CloudWatch alarm triggers Lambda which restarts an ECS service or scales up capacity
CI/CD webhook processor — API Gateway receives GitHub webhook, Lambda triggers CodePipeline
Scheduled maintenance — EventBridge cron triggers Lambda to stop dev environments at night
Log processing — S3 event triggers Lambda to process and forward logs to Elasticsearch
Slack/Teams bot — API Gateway + Lambda handles slash commands from your ops chat

🔄 AWS CI/CD — CodePipeline and GitHub Actions

›

AWS native CI/CD stack

Service	Role	Equivalent
CodeCommit	Managed Git repository	GitHub, Azure Repos
CodeBuild	Managed build service — run tests, build Docker images	Jenkins, GitHub Actions runner
CodeDeploy	Deployment service — rolling, canary, blue/green deployments to EC2, ECS, Lambda	Octopus Deploy, Spinnaker
CodePipeline	Orchestrates the full CI/CD workflow — source → build → test → deploy	Azure DevOps, Jenkins Pipeline
ECR	Private Docker image registry	ACR, Docker Hub

GitHub Actions to EKS — OIDC authentication (no stored credentials)

name: Deploy to EKS
on:
  push:
    branches: [main]

permissions:
  id-token: write   # REQUIRED for OIDC
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3

    - name: Configure AWS credentials (OIDC — no secrets stored)
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: arn:aws:iam::123456789:role/GitHubActionsEKSRole
        aws-region: us-east-1

    - name: Login to ECR
      run: aws ecr get-login-password | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

    - name: Build and push
      run: |
        docker build -t myapp:${{ github.sha }} .
        docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:${{ github.sha }}

    - name: Deploy to EKS
      run: |
        aws eks update-kubeconfig --name myeks --region us-east-1
        helm upgrade --install myapp ./charts/myapp           --set image.tag=${{ github.sha }} --atomic --wait

🛡️ AWS High Availability and Disaster Recovery

›

HA design principles on AWS

Pattern	What it means	AWS implementation
Multi-AZ	Run across multiple Availability Zones in one region	RDS Multi-AZ, ALB across AZs, EKS nodes in multiple AZs
Multi-Region Active-Passive	Primary region active, secondary on standby. Failover on disaster.	Route53 health checks + failover routing, RDS read replica in secondary region
Multi-Region Active-Active	Both regions serve traffic simultaneously	Route53 latency routing, DynamoDB Global Tables, S3 Cross-Region Replication

RTO and RPO — the two DR metrics

RTO (Recovery Time Objective) — how long can the business be down? "We must be back online within 4 hours." Drives: how fast your failover automation must work.
RPO (Recovery Point Objective) — how much data can we lose? "We cannot lose more than 15 minutes of transactions." Drives: how frequently you must backup/replicate.

DR strategies by cost and RTO

Strategy	RTO	Cost	How
Backup & Restore	Hours	Low	S3 backups, restore from scratch when disaster strikes
Pilot Light	30-60 min	Medium	Core DB running in secondary region, scale out compute on failover
Warm Standby	Minutes	High	Scaled-down running copy in secondary, scale up on failover
Active-Active	Seconds	2x	Full capacity in both regions, instant failover via DNS

🎯 Interview Questions

›

AWS · ARCHITECT

Design a production-grade VPC architecture for a 3-tier application on AWS.

Three-tier VPC with public, private-app, and private-data subnets across 3 AZs. VPC CIDR 10.0.0.0/16 giving 65,536 IPs. Public subnets host: Application Load Balancer (inbound 443 from 0.0.0.0/0), NAT Gateways (one per AZ for HA), Bastion host (if needed). Private app subnets host EKS nodes and EC2 — they route outbound traffic through NAT Gateway. Private data subnets host RDS, ElastiCache — NO internet route at all, fully isolated. Security groups implement least-privilege: ALB-SG allows inbound 443 from anywhere. App-SG allows inbound only from ALB-SG. DB-SG allows inbound only from App-SG. VPC Flow Logs enabled for security audit. VPC endpoints for S3 and DynamoDB avoid NAT Gateway costs for AWS service traffic. At scale: inter-AZ data transfer costs money — keep app pods and their RDS AZ aligned.

AWS · ENGINEER

What is the difference between Security Groups and NACLs in AWS?

Security Groups are stateful — if you allow inbound traffic, return traffic is automatically allowed. They operate at the instance/ENI level. You can only create ALLOW rules. Changes take effect immediately. NACLs (Network ACLs) are stateless — you must explicitly allow both inbound and outbound traffic for a connection to work. They operate at the subnet level and apply to all instances in the subnet. Rules are evaluated in order by rule number — first match wins. You can create both ALLOW and DENY rules. Use case for NACLs: blocking a specific IP range at the subnet level (DDoS mitigation), quick emergency block. Use case for Security Groups: fine-grained instance-level control. Best practice: Security Groups for normal operations. NACLs as an additional layer for subnet-level blocking only.

AWS · ARCHITECT

Explain IRSA — IAM Roles for Service Accounts — and why it matters for EKS security.

Before IRSA: to give a pod AWS access, you stored access keys as Kubernetes secrets or gave the EC2 node role broad permissions (all pods on that node get all permissions). Both are security risks. IRSA uses OIDC federation: EKS cluster has an OIDC endpoint. You associate this with your AWS account. Create an IAM role with a trust policy allowing only the specific Kubernetes service account in a specific namespace. Pod uses that service account. At runtime, the pod automatically gets a temporary credential via the OIDC token. No keys stored anywhere. If the pod is compromised, credentials expire in 1 hour. Blast radius is limited to exactly the permissions in that role. Implementation: eksctl utils associate-iam-oidc-provider, then eksctl create iamserviceaccount. This is the production standard for EKS and what every AWS interview expects you to know for containerised workloads.

AWS · PRODUCTION

S3 bucket was accidentally made public. What do you do in the next 60 seconds?

First 60 seconds: block public access immediately at both account level and bucket level using aws s3control put-public-access-block with all four flags set to true. This stops new exposure instantly. Next 5 minutes: check what was in the bucket using aws s3api list-objects-v2. Check CloudTrail for GetObject API calls in the last hour to understand what was potentially accessed. Check the bucket policy and ACL that allowed public access. Next 30 minutes: file a security incident. Notify your security team. If the bucket contained PII, initiate your GDPR/data breach notification process (72 hours under GDPR). Fix the root cause: identify what Terraform/IaC missed the block_public_acls setting. Prevention for future: AWS Config rule s3-bucket-public-read-prohibited that fires immediately. SCP at organization level blocking all public bucket creation. In Terraform: always include aws_s3_bucket_public_access_block resource with all four booleans set to true.

AWS · ENGINEER

What is the difference between EBS, EFS, and S3 storage on AWS?

EBS (Elastic Block Store): block storage, attached to one EC2 instance at a time (ReadWriteOnce). Like a hard drive. Use for OS volumes, databases (RDS uses EBS), single-instance app data. High IOPS, low latency. Types: gp3 (general), io2 (high performance databases), st1 (throughput — Kafka), sc1 (cold, infrequent access). EFS (Elastic File System): network file system, multiple EC2 instances can mount simultaneously (ReadWriteMany). NFS protocol. Use for shared content (web tier accessing same files), EKS pods needing shared storage across nodes. More expensive than EBS. S3: object storage, HTTP API (not mountable as filesystem natively). Unlimited scale. Use for: backups, static content, data lake, application artifacts, logs. Cannot run a database on S3. Differences in EKS context: EBS for databases in StatefulSets, EFS for shared config or content across pods, S3 for application data and backups via SDK.

AWS · PRODUCTION

How do you troubleshoot an EC2 instance that is unreachable via SSH?

Systematic approach. First: check EC2 console — is the instance state Running? Check system status checks and instance status checks. System check failure means AWS hardware issue — stop and start the instance (this migrates to new hardware). Instance check failure means OS-level issue. Second: check Security Group — does it allow inbound 22 from your IP? Connections silently drop without SG rule. Third: check NACLs — is there a DENY rule on port 22? Fourth: check instance system log — EC2 Console → Get system log — shows Linux boot messages and any panic/crash. Fifth: if no SSH key access, use AWS Systems Manager Session Manager — no SSH needed, works through Systems Manager agent. Sixth: for EBS-backed instances, detach the root volume, attach to a working instance as secondary volume, fix the issue (bad sshd_config, full disk), reattach. Production rule: disable SSH entirely and use SSM Session Manager — no inbound ports, full audit trail.

AWS · ARCHITECT

How does AWS Auto Scaling work with ALB for a production application?

Full flow: ALB receives traffic, distributes to target group. Target group contains EC2 instances or EKS pods. Auto Scaling Group manages the EC2 instances. When CPU/memory/custom metric crosses threshold, ASG launches new instances, registers them with the target group, ALB starts sending traffic once health check passes. Scale-down: after cooldown period (300 seconds default), underutilised instances are terminated, deregistered from target group first so in-flight requests complete. Key settings: min/desired/max capacity, health check grace period (give new instances time to start), cooldown (prevent rapid scale-up/down flapping), instance warm-up (how long before new instance counted in metrics). ALB health check vs EC2 health check: ALB health check tests HTTP endpoint. If it fails, ALB removes instance from rotation but ASG does not know. Configure ASG to use ELB health checks to replace unhealthy instances automatically.

AWS · ENGINEER

What is IRSA in EKS and how does it compare to Azure Workload Identity?

IRSA (IAM Roles for Service Accounts) is EKS's mechanism for giving pods access to AWS services without storing credentials. It works through OIDC federation: the EKS cluster has an OIDC issuer URL. An IAM Role is created with a trust policy that trusts tokens from that OIDC issuer for a specific Kubernetes ServiceAccount. The pod uses that ServiceAccount and gets temporary AWS credentials automatically via the AWS SDK credential chain. The trust policy specifies: "I trust tokens from cluster X for ServiceAccount Y in namespace Z." When the payment pod calls S3, boto3 sees the ServiceAccount token in the pod filesystem, exchanges it for temporary AWS credentials, and makes the S3 call. No access keys stored anywhere. Comparison with Azure Workload Identity: identical concept, different implementation. Both use OIDC federation between Kubernetes ServiceAccounts and the cloud identity system (AWS IAM vs Azure AD). Both eliminate stored credentials. The practical difference: IRSA configuration is done per-cluster with eksctl or Terraform. Azure Workload Identity requires the federated credential to be set up on the Managed Identity resource. Both are the current production standard for their respective platforms — never use static access keys in pods.

AWS · ENGINEER

What is the difference between ALB and NLB in AWS? When do you use each?

ALB (Application Load Balancer) operates at Layer 7 (HTTP/HTTPS). It understands the content of the request: URL path routing (/api → backend, /static → S3), host-based routing (api.example.com → API service, app.example.com → frontend), gRPC, WebSocket, SSL termination, content-based routing, and WAF integration. Use ALB for: web applications, microservices with path-based routing, HTTP API endpoints, WebSocket applications. NLB (Network Load Balancer) operates at Layer 4 (TCP/UDP/TLS). It routes based on IP and port only, no content inspection. Handles millions of requests per second with ultra-low latency (microseconds vs milliseconds for ALB). Preserves the source IP of the client (ALB changes source IP to the ALB IP). Use NLB for: TCP/UDP applications (gaming servers, IoT), high-frequency trading where microsecond latency matters, applications requiring source IP preservation, EKS Services with type LoadBalancer where you need a static IP (NLB supports static Elastic IPs, ALB does not). In EKS: use the AWS Load Balancer Controller (LBC). Annotate Service type LoadBalancer with service.beta.kubernetes.io/aws-load-balancer-type: external to get an NLB, or use Ingress with IngressClass alb to get an ALB for HTTP routing.

AWS · ARCHITECT

How do you design a highly available three-tier application on AWS?

Three-tier (presentation, application, data) deployed across two AZs minimum. Presentation tier: static assets in S3 with CloudFront CDN in front. Dynamic frontend in ECS/EKS. ALB distributes across AZs. Auto Scaling Group maintains minimum 2 instances across AZs. Application tier: EKS with node groups spanning two AZs. HPA scales pods. Cluster Autoscaler adds nodes. ALB routes to the EKS service. Security groups allow only the ALB to reach the application tier — no direct internet access. Data tier: RDS with Multi-AZ deployment — primary in AZ-1, standby in AZ-2. Automatic failover if primary fails (1-2 minutes). Read replicas for read-heavy workloads. ElastiCache (Redis) in cluster mode across AZs for session storage and caching. All data tier resources in private subnets — no public access. Supporting services: Route53 for DNS with health checks and automatic failover. CloudFront for global CDN and DDoS protection (absorbs layer 7 attacks at edge). WAF attached to CloudFront and ALB for OWASP Top 10 protection. KMS encrypts all data at rest. Secrets Manager for database credentials (rotate automatically). CloudWatch for monitoring, CloudTrail for audit. For disaster recovery: S3 Cross-Region Replication for static assets, RDS cross-region read replica that can be promoted, infrastructure as Terraform code so you can rebuild in a new region from code.

AWS · PRODUCTION

EKS pods cannot access S3. Walk through your troubleshooting steps.

Step 1: check the error. From inside the pod: kubectl exec -it pod-name -- aws s3 ls s3://my-bucket. The error tells you everything. "Unable to locate credentials" means no IAM role attached. "Access Denied" means wrong permissions. "NoSuchBucket" means wrong bucket name or region. Step 2: verify IRSA setup. kubectl describe serviceaccount my-sa -n namespace — check for the annotation eks.amazonaws.com/role-arn. If missing: the ServiceAccount is not linked to an IAM role. Step 3: verify the pod is using the correct ServiceAccount. kubectl describe pod my-pod — check serviceAccountName. If it says "default", the deployment is not using the annotated ServiceAccount. Step 4: verify the IAM role trust policy. aws iam get-role --role-name MyEKSRole -- check the trust policy allows the EKS cluster OIDC issuer and the specific ServiceAccount. Common mistake: trust policy says the right cluster but wrong namespace or ServiceAccount name. Step 5: verify the IAM role has the right permissions. aws iam simulate-principal-policy checks whether the role can perform s3:GetObject on the bucket ARN without actually doing it. Step 6: check the bucket policy. The S3 bucket might have a policy that denies access regardless of IAM role. Also check if the bucket is in a different region and you need --region flag. Step 7: token expiry. IRSA tokens expire after 24 hours by default. Check pod age — if very old pod, restart it to get fresh credentials.

id="sec-roadmap">

🗺️ Roadmap

›

Week 1

Foundations

Create AWS free account

Understand regions and AZs

IAM users, groups, policies

Launch first EC2, SSH in

Week 2

Networking

Create VPC from scratch (not default)

Public + private subnets

Security groups + NACLs

NAT Gateway

Week 3-4

Core Services

S3 with versioning + encryption

RDS Multi-AZ

ALB + Auto Scaling Group

CloudWatch alarms

Month 2

DevOps on AWS

EKS cluster with eksctl

IRSA for pod permissions

Terraform for all infrastructure

AWS Solutions Architect exam prep

Continue Learning

☸️ Kubernetes 🔷 Terraform 🔵 Azure 🏠 All Topics