Getting Started with Ansible Deployment
This guide walks you through deploying Diskover for the first time using Ansible. By the end, you'll have a fully functional Diskover environment — Elasticsearch storing your index data, the web UI accessible in your browser, and at least one worker ready to scan your filesystems.
We'll cover everything from installing Ansible on your laptop or workstation to verifying that all services are running after the playbook completes.
Before You Begin
What You'll Need
Before starting the deployment, gather the following:
Credentials from Diskover Data:
Credential | Purpose | How to Obtain |
|---|---|---|
JFrog Artifactory username & password | Download Diskover packages during installation | Provided by Diskover Data. Contact Diskover Support if you don't have them |
Diskover license file | Activate the platform after installation | Provided by Diskover Data |
Infrastructure you'll prepare:
Requirement | Detail |
|---|---|
One or more target machines | RHEL/Rocky Linux 8 or 9, or Ubuntu/Debian 22 or 24 |
SSH access | You must be able to SSH from your workstation (the control machine) into every target machine |
sudo privileges | The SSH user on each target machine must have sudo access |
Internet access on target machines | Required for online installations (to download packages from JFrog Artifactory). Not required for offline/air-gapped installs |
A control machine | Your workstation — macOS, Linux, or Windows (WSL) — where you'll run the Ansible playbook from |
Note: The control machine is the machine you run Ansible from. It does not need to be one of the target machines where Diskover is installed (though it can be, using
ansible_connection: local).
Decide Your Topology
Before configuring anything, decide how you want to distribute the Diskover components:
Topology | When to Use | What It Looks Like |
|---|---|---|
Single-host | Evaluations, demos, small environments | All four component groups (web, worker, elasticsearch, rabbitmq) on one machine. All host groups in |
Multi-host | Production environments, large-scale data | Each component group on its own dedicated machine (or machines — you can have multiple workers and multiple Elasticsearch nodes). |
If you're deploying Diskover for the first time and just want to get it running, start with a single-host deployment. You can always redistribute components to dedicated hosts later by updating the inventory and re-running the playbook.
Network Port Requirements
Ensure the following ports are open between your hosts. For a single-host deployment, all traffic is local so firewall rules are less of a concern (the playbook also disables firewalld on all target hosts).
Port | Service | Direction | Notes |
|---|---|---|---|
22 | SSH | Control machine → all target hosts | Required for Ansible to connect |
8000 | Diskover Web UI (Nginx) | Users → web host | HTTP access. Redirects to 443 when SSL is enabled |
443 | Diskover Web UI (Nginx) | Users → web host | HTTPS access. Only used when |
8000 | Diskover Admin | Users → web host | Admin interface |
5601 | Kibana | Users → web host | Dashboard UI (only with Elasticsearch, not OpenSearch) |
9200 | Elasticsearch HTTP | Web + worker hosts → ES host(s) | Search and indexing API |
9300 | Elasticsearch transport | ES node → ES node | Internal cluster communication. Only needed for multi-node ES clusters |
5672 | RabbitMQ (AMQP) | Web + worker hosts → RabbitMQ host | File action routing |
Step 1: Install Ansible on Your Control Machine
Ansible runs on your workstation or a dedicated management server — not on the target machines. Install it using the method appropriate for your operating system.
Linux (RHEL / Rocky / Fedora)
sudo dnf install epel-release sudo dnf install ansible sshpass
macOS
brew install sshpass pip3 install ansible-core==2.16.5
Ubuntu / Debian
sudo apt update sudo apt install ansible sshpass
Verify the Installation
After installing, confirm you have the right version:
ansible --version
You should see version 2.16.x in the output. This is important — Ansible versions 2.17 and later have breaking changes with some modules used by the Diskover playbook. Previous versions of 2.16.x work as well.
If you see a newer version, install the specific version:
pip3 install ansible-core==2.16.5
Step 2: Get the Diskover Ansible Repository
Obtain the diskover-ansible repository from Diskover Data. This contains the playbook, roles, inventory templates, and documentation.
Once you have it, navigate to the repository root:
cd /path/to/diskover-ansible
All commands in this guide (and the other guides in this section) should be run from this directory.
Step 3: Configure Your Inventory (hosts.yml)
The inventory file tells Ansible which machines to connect to and what role each one plays. Edit the file at inventory/hosts.yml.
For a complete deep-dive into every inventory option, see the Inventory Guide. This section covers just what you need for your first deployment.
Single-Host Deployment
If all components will run on one machine, point every host group to the same IP address:
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "your-ssh-password"
become: true
ansible_become_pass: "your-sudo-password"
children:
diskover:
children:
web:
hosts:
10.0.1.50:
hostname: diskover.example.com
rabbitmq:
hosts:
10.0.1.50:
hostname: diskover.example.com
worker:
hosts:
10.0.1.50:
hostname: diskover.example.com
elasticsearch:
hosts:
10.0.1.50:
hostname: diskover.example.com
Multi-Host Deployment
For separate machines, assign different IPs to each host group:
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "your-ssh-password"
become: true
ansible_become_pass: "your-sudo-password"
children:
diskover:
children:
web:
hosts:
10.0.1.10:
hostname: diskover-web.example.com
rabbitmq:
hosts:
10.0.1.20:
hostname: diskover-mq.example.com
worker:
hosts:
10.0.1.30:
hostname: diskover-worker.example.com
elasticsearch:
hosts:
10.0.1.40:
hostname: diskover-es1.example.com
10.0.1.50:
hostname: diskover-es2.example.com
10.0.1.60:
hostname: diskover-es3.example.com
What Each Field Means
Field | Description |
|---|---|
| Tells Ansible to connect via SSH. Use |
| The SSH username Ansible uses to log into each target machine |
| The SSH password. Wrap in quotes if it contains special characters |
| Tells Ansible to use sudo for privilege escalation on the target machines |
| The sudo password. Wrap in quotes if it contains special characters |
| The FQDN or short hostname of each target machine. Used by the Elasticsearch role for the ES node name configuration |
Test Connectivity
Before running the playbook, verify that Ansible can reach all your target hosts:
ansible all -i inventory/hosts.yml -m ping
A successful response looks like:
10.0.1.50 | SUCCESS => {
"changed": false,
"ping": "pong"
}
If you get connection errors, check SSH access manually (ssh user@10.0.1.50) and review the Troubleshooting guide.
Step 4: Configure Your Variables (all.yml)
The variables file controls what Diskover version to install, your credentials for downloading packages, and how each component is configured. Edit the file at inventory/group_vars/all.yml.
Every variable in all.yml plays a role in the deployment — the file ships with sensible defaults, but you should review and set each one for your environment. The Variables Reference guide explains every variable in detail. Here, we'll walk through them by category.
Important: Do not remove or leave blank any variable that the file includes. Every variable is used by at least one Ansible role during the playbook run.
Diskover Application Settings
These are the variables you'll most likely need to change from the defaults:
# The Diskover version to install — must match a version in JFrog Artifactory diskover_version: 2.5.0 # JFrog Artifactory credentials (provided by Diskover Data) jfrog_user: your-jfrog-username jfrog_pass: your-jfrog-api-key # Auto-configure Diskover Admin on first install # Set to true for fresh installs, false for upgrades config_diskover_admin_api: true
RabbitMQ Settings
# Credentials for the RabbitMQ vhost — change the password from the default! rabbitmq_user: "diskover" rabbitmq_pass: "your-secure-rabbitmq-password"
Important: The default password (
darkdata) is for lab use only. Always change this for production deployments.
Elasticsearch Settings
Review these and adjust based on your Elasticsearch host's hardware:
Variable | Default | How to Set It |
|---|---|---|
|
| Set to roughly half of the ES host's RAM in GB (max 31). Example: 16 GB RAM host → |
|
| Leave as |
|
| Leave as default (all interfaces) for multi-host. Use |
|
| Change only if running multiple ES clusters on the same network |
|
| Leave as |
|
| Leave as |
|
| Change if index data should live on a separate volume |
|
| Change if you want logs in a different location |
|
| Set to |
|
| Set to |
SSL/TLS Settings
Leave these at their defaults unless you want HTTPS for the web UI:
Variable | Default | When to Set |
|---|---|---|
|
| Set to |
| (empty) | Set to your FQDN when SSL is enabled (e.g., |
| (empty) | Absolute path to your SSL certificate on the control machine |
| (empty) | Absolute path to your SSL private key on the control machine |
|
| Set to |
Installation Mode & Diskover MCP Connector
Variable | Default | When to Change |
|---|---|---|
|
| Set to |
|
| Set to the path where offline tarballs are staged (only when |
|
| Set to |
Step 5: Run the Playbook
With your inventory and variables configured, you're ready to deploy. From the diskover-ansible repository root, run:
time ansible-playbook -i inventory/hosts.yml install_diskover.yml
The
timeprefix is optional — it shows how long the deployment took when it finishes.
For AWS or cloud deployments using a PEM key instead of a password:
time ansible-playbook -i inventory/hosts.yml install_diskover.yml --private-key /path/to/your-key.pem
What Happens During the Run
The playbook executes roles in this order:
Offline repo setup (if
offline_install: true) or JFrog repo setup (if online)Firewalld & SELinux — Disables firewalld and SELinux on all target hosts
Elasticsearch — Installs and configures Elasticsearch on the
elasticsearchhostsWeb stack — Installs Nginx, PHP 8.4, Python3, Diskover Admin, Diskover Web UI, and Kibana on the
webhostsWorker stack — Installs Python3, Diskoverd, and Celery on the
workerhostsRabbitMQ — Installs the message broker on the
rabbitmqhostsDiskover MCP — Installs the MCP server on the
webhosts (ifdeploy_mcp_server: true)Diskover Admin API — Configures Admin with Elasticsearch and RabbitMQ connection details (if
config_diskover_admin_api: true)
The playbook typically takes 5–15 minutes depending on internet speed and the number of hosts.
Reading the Output
As the playbook runs, you'll see tasks scroll by with statuses like ok, changed, and skipping. The important thing to watch for is the PLAY RECAP at the very end.
Step 6: Verify the Deployment
Check the PLAY RECAP
A successful deployment shows failed=0 and unreachable=0 for every host:
PLAY RECAP ********************************************************************* 10.0.1.50 : ok=71 changed=31 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0
If any host shows failed=1 or higher, check the error message above the PLAY RECAP and consult the Troubleshooting guide.
Verify Diskover Admin Configuration
If config_diskover_admin_api: true was set (the default for first-time installs), the playbook has automatically configured Elasticsearch and RabbitMQ connection details in Diskover Admin. Verify by navigating to:
HTTP:
http://<web-host-ip>:8000/login.phpHTTPS (if SSL enabled):
https://<ssl_domain>/login.php
If config_diskover_admin_api was set to false, you'll need to complete the Admin Wizard manually. Logging into Diskover for the very first time automatically lands you in the Admin wizard.
Screenshots of the Elasticsearch & RabbitMQ setup config
Install Your License
A valid Diskover license must be installed before the worker (diskoverd) will start. Navigate to Diskover Admin, go through the Admin wizard and upload your license file.
Start Diskoverd
Once the license is installed and RabbitMQ connection info configured, start the scanner daemon on each worker host:
systemctl start diskoverd
Watch the startup logs to confirm it's running:
journalctl -fu diskoverd
You should see output like:
_ _ _
| (_) | |
__| |_ ___| | _______ _____ _ __
/ _` | / __| |/ / _ \ \ / / _ \ '__| /)___(\
| (_| | \__ \ < (_) \ V / __/ | (='.'=)
\__,_|_|___/_|\_\___/ \_/ \___|_| (\")_(\")
diskoverd v2.5.0 task worker daemon
Verify the service status:
systemctl status diskoverd
You should see active (running).
Start Celery
Celery is the task queue that handles file actions (copy, move, delete, tag, etc.) submitted through the Diskover Web UI. It runs on the same worker host(s) as Diskoverd and communicates with RabbitMQ to pick up and execute file action tasks.
Start the Celery service on each worker host:
systemctl start celery
Verify it's running:
systemctl status celery
You should see active (running).
To check the Celery logs for any errors:
tail -f /opt/diskover/diskover_celery/log/celery.log
Note: Diskoverd handles scanning and indexing, while Celery handles file actions. Both services run on the worker host(s) and both need to be running for full Diskover functionality. If you only need scanning (no file actions), Diskoverd alone is sufficient — but for most deployments you'll want both services started.
Access the Web UI
Open your browser and navigate to:
HTTP:
http://<web-host-ip>HTTPS (if SSL enabled):
https://<ssl_domain>
Confirm the Worker is Online
From the Diskover Web UI, navigate to the Task Panel and check the Workers tab. Your worker should appear in the list with an online status.
Step 7: Run Your First Scan
With Diskover fully deployed and the worker online, you can now create your first index. From the Diskover Web UI:
Navigate to the Task Panel
Create a new scan task, specifying the filesystem path you want to index
Submit the task and watch it appear in the task queue
Once complete, the indexed data will be searchable from the main Diskover dashboard
Quick Reference: Service Management
After deployment, you may need to start, stop, or restart Diskover services. Here are the key commands:
Service | Start | Stop | Status | Logs |
|---|---|---|---|---|
Elasticsearch |
|
|
|
|
Diskoverd |
|
|
|
|
Celery |
|
|
|
|
RabbitMQ |
|
|
|
|
Nginx |
|
|
|
|
PHP-FPM |
|
|
|
|
Diskover Admin |
|
|
|
|
Kibana |
|
|
|
|
What's Next
Now that you have a working Diskover deployment, explore the other guides in this section:
Inventory Guide — Learn about advanced inventory configurations like multiple workers, Elasticsearch clusters, SSH key authentication, and proxy support
Variables Reference — Understand every tunable parameter in
all.ymlRunning Playbooks — Learn how to target specific hosts, perform upgrades, and configure SSL
Troubleshooting — When something goes wrong, this is your go-to reference
Support
If you run into issues that aren't covered in the Troubleshooting guide:
Support portal: https://support.diskoverdata.com
Knowledge base: https://support.diskoverdata.com/hc/en-us
When submitting a ticket, include the ansible.log file from the playbook directory, the PLAY RECAP output, and the target OS version (cat /etc/os-release).
Comments
0 comments
Please sign in to leave a comment.