Inventory Guide — Configuring hosts.yml
The inventory file is one of only two files you need to edit before running the Diskover Ansible playbook. It tells Ansible three things: which machines to connect to, how to authenticate, and what role each machine plays in the Diskover architecture.
This guide covers the inventory file in depth — every field, every option, and complete examples for common deployment scenarios.
Where Is the Inventory File?
The inventory file lives at:
diskover-ansible/inventory/hosts.yml
The docs/example_docs/ directory in the repository contains several ready-made inventory files you can copy and customize:
Example File | Use Case |
|---|---|
| All components on one machine |
| Distributed deployment with multiple workers and ES nodes |
| SSH key authentication instead of passwords |
| Environments requiring an HTTP proxy for internet access |
To use an example, copy it over the default inventory:
cp docs/example_docs/hosts.single-host.yml inventory/hosts.yml
Then edit it with your actual IPs, hostnames, and credentials.
Inventory Structure
Every Diskover inventory follows the same YAML structure. Here's the skeleton with annotations:
all: # Top-level group (required by Ansible)
vars: # Variables that apply to ALL hosts
ansible_connection: ssh # How Ansible connects to targets
ansible_user: <username> # SSH username
ansible_ssh_pass: "<pw>" # SSH password (or use key-based auth)
become: true # Use sudo on target machines
ansible_become_pass: "<pw>" # Sudo password
children:
diskover: # Parent group for all Diskover components
children:
web: # Web UI host group
hosts:
<ip>:
hostname: <host short name>
rabbitmq: # Message broker host group
hosts:
<ip>:
hostname: <host short name>
worker: # Scanner/worker host group
hosts:
<ip>:
hostname: <host short name>
elasticsearch: # Search backend host group
hosts:
<ip>:
hostname: <host short name>
The nesting must be exactly all → children → diskover → children → <group>. The playbook expects this hierarchy.
Connection Variables
These variables are defined under all.vars and apply to every host in the inventory.
ansible_connection
What it does: Tells Ansible how to connect to the target machines.
Value | When to Use |
|---|---|
| The standard choice. Ansible connects to the target over SSH from the control machine |
| Use this when running Ansible directly on the target machine itself (Ansible won't use SSH — it runs commands locally) |
Example — running Ansible on the same machine as Diskover:
all:
vars:
ansible_connection: local
ansible_user
What it does: The username Ansible uses to SSH into each target machine.
This user must exist on every target machine and must have sudo privileges (since become: true is set).
ansible_user: diskover
ansible_ssh_pass
What it does: The SSH password for ansible_user.
ansible_ssh_pass: "your-ssh-password"
Important: If your password contains special characters (like
!,$,#, or@), wrap it in double quotes. YAML can misinterpret unquoted special characters.
Security note: Storing passwords in plain text in the inventory file is acceptable for initial setup and lab environments. For production, consider using SSH key authentication (see below) or Ansible Vault to encrypt sensitive values.
ansible_become_pass
What it does: The password Ansible uses for sudo on the target machines.
ansible_become_pass: "your-sudo-password"
This is often the same as ansible_ssh_pass, but not always — it depends on how sudo is configured on your target machines.
If your SSH user has passwordless sudo configured (e.g., via a NOPASSWD entry in /etc/sudoers), you can omit this variable entirely.
become
What it does: Enables privilege escalation. When set to true, Ansible uses sudo to run tasks as root on the target machines.
become: true
This must be true for the Diskover playbook — the roles install system packages, configure systemd services, and modify system files, all of which require root access.
SSH Key Authentication
Instead of using passwords, you can configure Ansible to authenticate with an SSH private key. This is the preferred approach for cloud deployments (AWS, Azure, GCP) and environments where password-based SSH is disabled.
Replace ansible_ssh_pass with ansible_ssh_private_key_file:
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_private_key_file: /home/admin/.ssh/diskover_deploy.pem
become: true
ansible_become_pass: "your-sudo-password"
Variable | Description |
|---|---|
| Absolute path to the SSH private key on the control machine (the machine running Ansible) |
Note: You still need
ansible_become_passfor sudo unless the target user has passwordless sudo configured. If passwordless sudo is enabled, remove theansible_become_passline entirely.
Alternative — passing the key on the command line:
Instead of adding the key path to the inventory, you can pass it when running the playbook:
ansible-playbook -i inventory/hosts.yml install_diskover.yml --private-key /path/to/key.pem
This is useful when you don't want to store the key path in the inventory file, or when different team members use different key file locations.
Host Groups
The four host groups under diskover.children map directly to the Diskover architecture. Each group tells the playbook which roles to apply to which machines.
web
Components installed: Diskover Web UI, Diskover Admin, Nginx, PHP 8.4, Kibana, Python 3.x, Diskover MCP (if enabled), plus all Diskover scanners (diskover-scanner-*), plugins (diskover-plugin-*), and file actions (diskover-file-actions-*)
This is the machine that hosts the browser-based interface. Users access Diskover through this host. The web host also receives all available scanner, plugin, and file action packages — these are installed alongside the web stack so that the Diskover Admin UI can manage and configure them.
web:
hosts:
10.0.1.10:
hostname: diskover-web.example.com
rabbitmq
Components installed: RabbitMQ message broker
RabbitMQ handles all task messaging between the web interface and the workers. It routes file action tasks to the correct task worker.
rabbitmq:
hosts:
10.0.1.20:
hostname: diskover-mq.example.com
worker
Components installed: Python 3.x, Diskoverd (scanner daemon), Celery (task queue), plus all Diskover scanners (diskover-scanner-*), plugins (diskover-plugin-*), and file actions (diskover-file-actions-*). NFS and CIFS/SMB utilities are also installed for network filesystem scanning.
Workers do the heavy lifting — scanning filesystems, creating Elasticsearch indices, and executing file actions. All available scanner, plugin, and file action packages are installed on every worker so that each worker can handle any scan or file action task assigned to it. You can have multiple workers for more scanning throughput.
worker:
hosts:
10.0.1.30:
hostname: diskover-worker-1.example.com
10.0.1.31:
hostname: diskover-worker-2.example.com
elasticsearch
Components installed: Elasticsearch 8.x
Elasticsearch stores all indexed file metadata and serves search queries. You can run a single node or a multi-node cluster.
elasticsearch:
hosts:
10.0.1.40:
hostname: diskover-es-1.example.com
10.0.1.41:
hostname: diskover-es-2.example.com
10.0.1.42:
hostname: diskover-es-3.example.com
The hostname Variable
Every host entry requires a hostname variable set to the machine's FQDN (fully qualified domain name) or short hostname.
10.0.1.10:
hostname: diskover-web.example.com
This value is used by the Elasticsearch role to set the ES node.name and cluster.initial_master_nodes in the Elasticsearch configuration. It does not change the system hostname on the target machine — the OS hostname remains whatever it was before the playbook ran.
Complete Examples
Example 1: Single-Host Deployment
All four component groups running on one machine. Ideal for evaluations, demos, and small environments.
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "changeme"
become: true
ansible_become_pass: "changeme"
children:
diskover:
children:
web:
hosts:
10.0.1.50:
hostname: diskover.example.com
rabbitmq:
hosts:
10.0.1.50:
hostname: diskover.example.com
worker:
hosts:
10.0.1.50:
hostname: diskover.example.com
elasticsearch:
hosts:
10.0.1.50:
hostname: diskover.example.com
Key points:
All four host groups point to the same IP (
10.0.1.50)The
hostnamevalue is the same for all groups since it's one machineAll ports communicate over localhost
Example 2: Multi-Host Scaled Deployment
A production-grade deployment with separate machines for each component group, plus multiple workers and a 3-node Elasticsearch cluster for horizontal scaling.
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "changeme"
become: true
ansible_become_pass: "changeme"
children:
diskover:
children:
web:
hosts:
10.0.1.10:
hostname: diskover-web.example.com
rabbitmq:
hosts:
10.0.1.20:
hostname: diskover-mq.example.com
worker:
hosts:
10.0.1.30:
hostname: diskover-worker-1.example.com
10.0.1.31:
hostname: diskover-worker-2.example.com
elasticsearch:
hosts:
10.0.1.40:
hostname: diskover-es-1.example.com
10.0.1.41:
hostname: diskover-es-2.example.com
10.0.1.42:
hostname: diskover-es-3.example.com
Key points:
Each component group has its own dedicated machine(s)
Two worker nodes for more scanning throughput
Three Elasticsearch nodes for a production cluster (provides data redundancy and better search performance)
All machines must be able to reach each other on the required ports (see Network Port Requirements in the Getting Started guide)
Example 3: SSH Key Authentication
For cloud deployments (AWS, Azure, GCP) or environments where password-based SSH is disabled.
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_private_key_file: /home/admin/.ssh/diskover_deploy.pem
become: true
ansible_become_pass: "changeme"
children:
diskover:
children:
web:
hosts:
10.0.1.10:
hostname: diskover-web.example.com
rabbitmq:
hosts:
10.0.1.20:
hostname: diskover-mq.example.com
worker:
hosts:
10.0.1.30:
hostname: diskover-worker.example.com
elasticsearch:
hosts:
10.0.1.40:
hostname: diskover-es.example.com
Key points:
ansible_ssh_private_key_filereplacesansible_ssh_passThe key path must be an absolute path on the control machine
ansible_become_passis still needed for sudo (unless passwordless sudo is configured)The PEM key must have proper permissions:
chmod 400 /home/admin/.ssh/diskover_deploy.pem
Example 4: Proxy Environment
For environments where target hosts require an HTTP proxy to reach the internet (for package downloads from JFrog Artifactory, pip packages, etc.).
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "changeme"
become: true
ansible_become_pass: "changeme"
proxy_env:
http_proxy: "http://proxy.example.com:8080"
https_proxy: "http://proxy.example.com:8080"
no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16"
children:
diskover:
children:
web:
hosts:
10.0.1.10:
hostname: diskover-web.example.com
rabbitmq:
hosts:
10.0.1.20:
hostname: diskover-mq.example.com
worker:
hosts:
10.0.1.30:
hostname: diskover-worker.example.com
elasticsearch:
hosts:
10.0.1.40:
hostname: diskover-es.example.com
Key points:
The
proxy_envblock defines proxy settings that are injected into the environment of every playno_proxyshould include your internal network ranges so that inter-host communication doesn't go through the proxyYou must also use the proxy-aware playbook:
docs/example_docs/install_diskover.proxy.ymlinstead of the standardinstall_diskover.yml
Running with the proxy playbook:
ansible-playbook -i inventory/hosts.yml docs/example_docs/install_diskover.proxy.yml
Testing Your Inventory
Before running the full playbook, validate that Ansible can parse your inventory and reach your hosts.
Validate the Inventory Structure
ansible-inventory -i inventory/hosts.yml --list
This prints the parsed inventory as JSON. Verify that your hosts appear under the correct groups.
Test Connectivity to All Hosts
ansible all -i inventory/hosts.yml -m ping
Expected output for each host:
10.0.1.50 | SUCCESS => {
"changed": false,
"ping": "pong"
}
Test Connectivity to a Specific Group
ansible web -i inventory/hosts.yml -m ping ansible elasticsearch -i inventory/hosts.yml -m ping
Test sudo Access
ansible all -i inventory/hosts.yml -m command -a "whoami" --become
Expected output: root for each host (confirming that sudo is working).
Common Mistakes
Mistake | What Happens | Fix |
|---|---|---|
Wrong YAML indentation | Ansible can't parse the inventory | Use consistent 4-space indentation. YAML is indentation-sensitive |
Special characters in passwords without quotes | YAML misinterprets the value | Wrap passwords containing |
IP address typo | Ansible can't reach the host | Verify the IP by SSH-ing manually: |
Missing | The Elasticsearch role will fail when configuring the node name | Every host entry must include a |
Incorrect group nesting | Roles don't get applied to the right hosts | Follow the exact structure: |
Using Ansible 2.17+ | Some modules break with deprecation errors | Downgrade to |
Next Steps
With your inventory configured, the next step is to set up your deployment variables. See the Variables Reference guide for a complete walkthrough of all.yml.
Comments
0 comments
Please sign in to leave a comment.