Ansible Complete Guide — LearnwithVishnu

🤖Ansible

Beginner Engineer Production Architect Agentless automation — configure servers, deploy apps, enforce compliance

What you will learn: What Ansible is → Inventory (static, dynamic, group_vars) → Playbooks (tasks, handlers, templates) → Roles (reusable automation) → Variables and precedence → Ansible Vault (secrets) → All CLI commands → AAP/Tower (enterprise) → Rolling deployments → Idempotency → CI/CD integration → 10 senior interview Q&As

What is Ansible Inventory Playbooks Roles Variables & Vault Commands AAP / Tower Production Interview Q&A

🤖 What is Ansible?

›

Ansible is an agentless IT automation tool by Red Hat. It connects to servers over SSH and runs tasks defined in YAML files called playbooks. No agent needs to be installed on target servers — this is its biggest advantage over Chef and Puppet.

Ansible vs Chef vs Puppet

Feature	Ansible	Puppet	Chef
Agent required	No — agentless (SSH only)	Yes — Puppet agent	Yes — Chef client
Language	YAML — readable by everyone	Puppet DSL (Ruby-based)	Ruby DSL (complex)
Model	Push — control node pushes	Pull — agents pull config	Pull — agents pull config
Learning curve	Low — write YAML in hours	High — weeks to master	High — weeks to master
Setup time	Minutes — pip install ansible	Days — install agents everywhere	Days — install agents everywhere
Best for	Ad-hoc automation, CI/CD integration	Continuous compliance enforcement	Complex enterprise config management

Where Ansible fits in the DevOps toolchain

Tool	What it does	Analogy
Terraform	Provisions servers, networks, databases in cloud	Builder — creates the house
Ansible	Configures what is inside the servers	Interior designer — furnishes the house
Jenkins/GitHub Actions	Deploys application code	Moving company — brings your stuff in

Install Ansible and test connectivity

📋 Inventory — Static, Dynamic, Group Vars

›

The inventory tells Ansible which servers to manage and how to group them. Groups let you target subsets: run tasks on all webservers, or just databases, or just production servers.

Static inventory — INI and YAML formats

Group and host variables

group_vars and host_vars

Dynamic Inventory — for cloud environments

Static inventory files are unmanageable for cloud environments where VMs are created and destroyed regularly. Dynamic inventory queries cloud APIs at runtime.

Dynamic inventory — AWS EC2 and Azure

📝 Playbooks — Tasks, Handlers, Templates

›

A playbook is a YAML file that defines tasks to run on target servers. Each task uses an Ansible module (copy, template, service, package, command, etc.). Tasks run in order — if one fails, the play stops.

Complete playbook — deploy application

Handlers — run once at the end

Handlers are tasks triggered by notify — they run only once at the end of a play regardless of how many tasks triggered them. Use for: restart service, reload config, clear cache.

Handlers example

Jinja2 Templates — dynamic config files

Jinja2 template and usage

📦 Roles — Reusable Automation

›

A role is a reusable, structured collection of tasks, vars, templates, and handlers. Instead of one giant playbook, roles make automation modular and shareable across projects and teams. This is what separates junior from senior Ansible usage.

Role directory structure

Role structure and usage

Complete role example — nginx with TLS

nginx role — tasks/main.yml

🔧 Variables, Precedence & Vault

›

Variable precedence — most important concept

Ansible has 22 variable precedence levels. For interviews, know these 6 in order:

Priority	Source	Example
Lowest	Role defaults	role/defaults/main.yml
2	Inventory vars	inventory.ini host variables
3	Group vars	group_vars/webservers.yml
4	Host vars	host_vars/web-01.yml
5	Playbook vars	vars: section in playbook
Highest	Extra vars (-e flag)	ansible-playbook deploy.yml -e "env=prod"

Variable precedence examples

Ansible Vault — encrypt secrets

Vault encrypts sensitive data (passwords, API keys) so they can be safely stored in Git. The encrypted file looks like AES256 ciphertext — useless without the vault password.

Ansible Vault — encrypt and use secrets

🖥️ CLI Commands — Complete Reference

›

ansible and ansible-playbook commands

🏢 Ansible Automation Platform (AAP / Tower)

›

AAP (Ansible Automation Platform) is enterprise Ansible. It adds: Web UI, REST API, RBAC, job scheduling, audit logs, and credential management on top of standard Ansible. At HPE/Vodafone scale you need AAP — CLI Ansible is unmanageable for teams.

What AAP adds over CLI Ansible

Before AAP (CLI)	After AAP
Manual inventory.ini files	Dynamic inventory synced from AWS/Azure
SSH keys on engineer laptops	Credentials stored in AAP vault
No audit trail	Full job history: who, what, when, output
No access control	RBAC: Dev team cannot run prod playbooks
Cron jobs on control node	Schedules in AAP with Slack notifications
Manual playbook updates	Projects auto-sync from Git on every push

AAP Core Objects

Object	Purpose
Organization	Top-level tenant grouping (Telecom Org, Healthcare Org)
Inventory	Server lists — static or dynamic. Scoped to an org.
Credential	SSH keys, vault passwords, cloud creds — stored encrypted
Project	Link to a Git repo containing playbooks. Auto-syncs on commit.
Job Template	Defines: which playbook + inventory + credentials + vars. The "run button"
Workflow Template	Chain multiple Job Templates: backup → deploy → verify → notify
Schedule	Run Job Templates on cron schedule without Jenkins

AAP RBAC — role levels

Role	Permissions
Admin	Full control — create, edit, delete, execute
Execute	Can run Job Templates — cannot edit them
Use	Can reference Credential/Inventory — cannot view secret values
Update	Can sync Projects and Inventories
Read	View only — can see job history

AAP REST API — automate from CI/CD

🚀 Production Patterns — CI/CD, Rolling, Idempotency

›

Idempotency — the most important Ansible concept

An idempotent playbook produces the same result whether run once or 100 times. Running a properly written Ansible playbook against an already-configured server should result in "OK" (no changes) not "Changed" or "Failed". This is the difference between professional and amateur Ansible usage.

Idempotent patterns

Rolling deployment with serial

Rolling deployment — zero downtime

Ansible in Jenkins CI/CD pipeline

Jenkins + Ansible pipeline

🔍 Troubleshooting — Common Issues

›

Error	Cause	Fix
SSH connection refused	Wrong IP, firewall blocking port 22, wrong SSH user	Check ansible_host, ansible_user, ansible_port
Permission denied (publickey)	SSH key not added to authorized_keys	Copy SSH key: ssh-copy-id user@host
Python not found	Old server without Python, or wrong path	Set ansible_python_interpreter=/usr/bin/python3
sudo password required	become: yes but no sudo password configured	Use --ask-become-pass or configure NOPASSWD in sudoers
Task not idempotent	Using shell/command module instead of dedicated module	Use package/service/file modules instead of shell
Variable undefined	Variable not set in inventory or vars files	Check variable precedence, use default() filter

Debugging commands

🎯 Interview Questions — Senior Level

›

ANSIBLE · ENGINEER

What is Ansible and how is it different from Chef and Puppet?

Ansible is agentless — it connects to servers over SSH and requires only Python on the target server. No agent daemon to install, maintain, or upgrade. Chef and Puppet require an agent running on every managed server — agent upgrades, agent authentication, agent failures become their own operational problem. Ansible uses YAML playbooks which any developer can read. Chef uses Ruby DSL which requires programming knowledge. Ansible is push-based — control node pushes tasks when you run ansible-playbook. Puppet and Chef are pull-based — agents periodically check for updates. The pull model is better for continuous compliance; the push model is better for on-demand deployments and CI/CD integration. At HPE: we chose Ansible specifically because the infrastructure team could write and understand playbooks without needing Ruby knowledge, and because we needed CI/CD integration that push-based Ansible makes natural.

ANSIBLE · ENGINEER

Explain Ansible variable precedence. Which wins?

Ansible has 22 precedence levels. For interviews, the 6 most important in order from lowest to highest: role defaults (role/defaults/main.yml) — anyone can override these; inventory variables (host and group vars in inventory file); group_vars (files in group_vars/ folder); host_vars (files in host_vars/ folder); playbook vars (vars: section); extra vars (-e flag) — always wins, cannot be overridden. Practical implication: role defaults are the safety net defaults. group_vars/production.yml overrides them for production. host_vars/critical-server.yml can further override for one specific server. And in an emergency, -e "log_level=DEBUG" overrides everything without touching any files. The most common mistake: setting variables in role vars/main.yml (high precedence) instead of defaults/main.yml (low precedence) — then nobody can override them from group_vars, which breaks multi-environment playbooks.

ANSIBLE · ARCHITECT

How do you design Ansible roles to support both on-premise and cloud environments without duplicating code?

The key is parameterization and abstraction through variables. Design roles to be environment-agnostic by default, environment-specific through variable overrides. Example: my nginx role defines nginx_worker_processes in defaults/main.yml as 4. For cloud VMs with 8 cores, group_vars/cloud_webservers.yml sets it to 8. For on-prem servers with 16 cores, group_vars/onprem_webservers.yml sets it to 16. The role code never changes — only the variables differ. For genuinely different behavior (systemd vs init.d, different package managers), use when conditionals on ansible_os_family and ansible_distribution_major_version. For cloud-specific tasks (register with cloud load balancer, fetch secrets from Key Vault), use delegate_to: localhost to run cloud API calls from the control node. The role structure stays identical — cloud tasks are just enabled or disabled via variables like cloud_provider: azure or cloud_provider: none.

ANSIBLE · PRODUCTION

Your Ansible playbook runs successfully against dev but fails against production. What do you investigate?

Systematic approach — differences between dev and prod that could cause failures: First, run with -vvv to see exact SSH and task output. Most common causes in order of frequency: 1) Variable values — prod group_vars has different values (db_host, app_port, credentials). Verify with ansible prod-servers -m debug -a "var=hostvars[inventory_hostname]". 2) Ansible Vault — prod uses different vault password. Verify vault decryption works: ansible-playbook --check --vault-password-file prod_vault.pass. 3) Network/firewall — target port not open, package repository not reachable from prod network. Test with ansible prod-server -m uri -a "url=https://registry.example.com". 4) Permissions — prod has stricter sudo rules or SELinux enforcing. Check with ansible prod-server -m shell -a "getenforce". 5) OS version differences — prod is RHEL 8, dev is RHEL 9. Some modules behave differently. Use --check --diff to preview exactly what would change on prod without making changes.

ANSIBLE · ARCHITECT

What is Ansible Automation Platform and when would you choose it over CLI Ansible?

AAP is enterprise Ansible with Web UI, RBAC, scheduling, audit logs, and centralized credential management. You need AAP when you have: more than 3 engineers running Ansible (SSH keys on laptops = security risk), any compliance requirement (PCI-DSS, SOC2 require audit trails of every change — CLI Ansible has none), production environments that need approval gates (AAP Workflow Templates support approval steps), and 24x7 operations (AAP schedules nightly compliance runs without a Jenkins dependency). Key AAP RBAC use case: Dev team gets Execute permission on dev Job Templates only. Ops team gets Execute on all. Nobody gets SSH key access to servers directly — all access goes through AAP with full logging. At Vodafone scale with 400+ servers across dev/staging/prod, CLI Ansible was a security and audit nightmare. AAP replaced it: every playbook run recorded, every credential centralized, every dev action approved by ops.

ANSIBLE · ENGINEER

What is idempotency in Ansible and why does it matter?

Idempotency means running a playbook once or 100 times produces the same result — the system ends up in the desired state either way, with no side effects from repeated runs. Why it matters: CI/CD pipelines run playbooks on every deployment. If a playbook is not idempotent, running it twice might install duplicate packages, create duplicate users, append duplicate config lines, or fail because a resource already exists. Ansible built-in modules (package, file, service, user, template, lineinfile) are idempotent. The shell and command modules are NOT idempotent by default — they run every time. If you must use shell, use creates or removes flags: shell: create_database.sh creates=/var/lib/db — this skips the command if the file already exists. The measure of a good playbook: run it against an already-configured server — all tasks should show "ok" (unchanged), zero "changed". If any task shows "changed" every time, it is not idempotent.

ANSIBLE · PRODUCTION

Production server configuration drifted from your Ansible playbooks. How do you detect and remediate this?

Configuration drift in Ansible is detected by running playbooks in check mode against production: ansible-playbook site.yml --check --diff -i inventory/prod. This shows every difference between current state and desired state without making any changes. The --diff flag shows exact file content changes. Anything showing "changed" in check mode = drift. Common drift sources: manual emergency fixes during incidents that were never formalized into playbooks, security patches applied manually, and configuration changes made directly on servers by application teams. Remediation decision: if the drift was an intentional improvement, update the playbook first, then apply. If the drift was incorrect, run ansible-playbook site.yml -i inventory/prod to revert to desired state. Prevention: run check mode as a nightly Jenkins job. Any drift detected = Slack alert to the team. At HPE: nightly drift detection on 50+ servers. Alert fires maybe twice per month, usually from manual emergency changes. Having the alert meant we always caught and formalized the change within 24 hours.

ANSIBLE · ENGINEER

What is the difference between include_tasks and import_tasks in Ansible?

import_tasks (static): The tasks file is read and included at parse time, before playbook execution starts. It is as if the tasks were written directly in the playbook. Result: you can use --list-tasks to see all tasks before running, tags applied to the import apply to all imported tasks. Limitation: you cannot use variables in the file path — it must be a static path. include_tasks (dynamic): The tasks file is loaded at runtime when that point in the playbook is reached. You can use variables in the file path: include_tasks: "{{ ansible_os_family }}_tasks.yml" — loads different file based on OS. Tags on the include_tasks do NOT automatically apply to included tasks. Limitation: --list-tasks does not show the included tasks before running. Rule of thumb: use import_tasks for static includes where you always know what to include. Use include_tasks for conditional inclusion based on variables, or when you need to loop over multiple task files.

ANSIBLE · ARCHITECT

How do you handle secrets in Ansible across a team of 20 engineers?

Three-layer secret management strategy. Layer 1: Ansible Vault for playbook secrets (database passwords, API keys in vars/secrets.yml). Vault password stored in a password manager (HashiCorp Vault or 1Password for teams) — never in Git. Each environment has a separate vault password. Layer 2: SSH keys managed in AAP credential store — engineers never see or hold SSH keys. AAP injects them at job execution time. Complete audit: who connected to which server and when. Layer 3: For production secrets that rotate regularly (DB passwords, API tokens), use External Secrets Operator or Vault Agent to inject secrets at playbook runtime from HashiCorp Vault, never hardcode even in vault files. Rotation: when a secret rotates, update in HashiCorp Vault — all playbooks pick it up automatically on next run without any code changes. At HPE: I implemented this three-layer approach. Result: no engineer has direct SSH access to production servers, every secret access is audited, and we passed SOC2 audit without any findings related to credential management.

ANSIBLE · PRODUCTION

A runaway Ansible playbook is running on production and making unintended changes. How do you stop it?

Immediate stop: Ctrl+C in the terminal if you are watching it. Ansible stops after the current task completes — it does not kill mid-task. If it is running in Jenkins/AAP: cancel the job immediately in the UI. For SSH-based playbooks you can also kill the SSH sessions to the target hosts: pkill -f "ssh.*production-server" from the control node — this interrupts the current task on all hosts. Assessment: check what already ran using --start-at-task to understand blast radius. Ansible stores no rollback information — if tasks already ran (files changed, services restarted, packages installed), you must manually reverse them or re-run an earlier version of the playbook. Prevention: always run --check --diff in CI before any prod apply. Use serial to limit blast radius. For high-risk plays, add a manual approval step in AAP workflow before the actual execution stage. At HPE: we had a runaway playbook that restarted all telecom services simultaneously instead of serially. The fix took 2 hours. After this we added serial: 1 to all service-restart playbooks and mandatory --check in CI.

🗺️ Learning Roadmap

›

Week 1

Foundations

Install Ansible: pip install ansible

Create inventory with 3 hosts

Run: ansible all -m ping

Write first playbook: install nginx

Week 2

Core Concepts

Roles — create and reuse

Variables and precedence

Ansible Vault for secrets

Handlers and templates

Week 3-4

Production Patterns

Dynamic inventory (AWS/Azure)

Rolling deployments with serial

Idempotency — write it right

Ansible in CI/CD pipeline

Month 2

Enterprise — AAP

AAP installation and setup

RBAC — teams and job templates

Workflow templates

REST API integration

Continue Learning

🔷 Terraform 🔧 Jenkins ☸️ Kubernetes 🏠 All Topics