Getting Started with Ansible Deployment

This guide walks you through deploying Diskover for the first time using Ansible. By the end, you'll have a fully functional Diskover environment — Elasticsearch storing your index data, the web UI accessible in your browser, and at least one worker ready to scan your filesystems.

We'll cover everything from installing Ansible on your laptop or workstation to verifying that all services are running after the playbook completes.

Before You Begin

What You'll Need

Before starting the deployment, gather the following:

Credentials from Diskover Data:

Credential	Purpose	How to Obtain
JFrog Artifactory username & password	Download Diskover packages during installation	Provided by Diskover Data. Contact Diskover Support if you don't have them
Diskover license file	Activate the platform after installation	Provided by Diskover Data

Infrastructure you'll prepare:

Requirement	Detail
One or more target machines	RHEL/Rocky Linux 8 or 9, or Ubuntu/Debian 22 or 24
SSH access	You must be able to SSH from your workstation (the control machine) into every target machine
sudo privileges	The SSH user on each target machine must have sudo access
Internet access on target machines	Required for online installations (to download packages from JFrog Artifactory). Not required for offline/air-gapped installs
A control machine	Your workstation — macOS, Linux, or Windows (WSL) — where you'll run the Ansible playbook from

Note: The control machine is the machine you run Ansible from. It does not need to be one of the target machines where Diskover is installed (though it can be, using ansible_connection: local).

Decide Your Topology

Before configuring anything, decide how you want to distribute the Diskover components:

Topology	When to Use	What It Looks Like
Single-host	Evaluations, demos, small environments	All four component groups (web, worker, elasticsearch, rabbitmq) on one machine. All host groups in `hosts.yml` point to the same IP address.
Multi-host	Production environments, large-scale data	Each component group on its own dedicated machine (or machines — you can have multiple workers and multiple Elasticsearch nodes).

If you're deploying Diskover for the first time and just want to get it running, start with a single-host deployment. You can always redistribute components to dedicated hosts later by updating the inventory and re-running the playbook.

Network Port Requirements

Ensure the following ports are open between your hosts. For a single-host deployment, all traffic is local so firewall rules are less of a concern (the playbook also disables firewalld on all target hosts).

Port	Service	Direction	Notes
22	SSH	Control machine → all target hosts	Required for Ansible to connect
8000	Diskover Web UI (Nginx)	Users → web host	HTTP access. Redirects to 443 when SSL is enabled
443	Diskover Web UI (Nginx)	Users → web host	HTTPS access. Only used when `ssl_enabled: true`
8000	Diskover Admin	Users → web host	Admin interface
5601	Kibana	Users → web host	Dashboard UI (only with Elasticsearch, not OpenSearch)
9200	Elasticsearch HTTP	Web + worker hosts → ES host(s)	Search and indexing API
9300	Elasticsearch transport	ES node → ES node	Internal cluster communication. Only needed for multi-node ES clusters
5672	RabbitMQ (AMQP)	Web + worker hosts → RabbitMQ host	File action routing

Step 1: Install Ansible on Your Control Machine

Ansible runs on your workstation or a dedicated management server — not on the target machines. Install it using the method appropriate for your operating system.

Linux (RHEL / Rocky / Fedora)

sudo dnf install epel-release
sudo dnf install ansible sshpass

macOS

brew install sshpass
pip3 install ansible-core==2.16.5

Ubuntu / Debian

sudo apt update
sudo apt install ansible sshpass

Verify the Installation

After installing, confirm you have the right version:

ansible --version

You should see version 2.16.x in the output. This is important — Ansible versions 2.17 and later have breaking changes with some modules used by the Diskover playbook. Previous versions of 2.16.x work as well.

If you see a newer version, install the specific version:

pip3 install ansible-core==2.16.5

Step 2: Get the Diskover Ansible Repository

Obtain the diskover-ansible repository from Diskover Data. This contains the playbook, roles, inventory templates, and documentation.

Once you have it, navigate to the repository root:

cd /path/to/diskover-ansible

All commands in this guide (and the other guides in this section) should be run from this directory.

Step 3: Configure Your Inventory (`hosts.yml`)

The inventory file tells Ansible which machines to connect to and what role each one plays. Edit the file at inventory/hosts.yml.

For a complete deep-dive into every inventory option, see the Inventory Guide. This section covers just what you need for your first deployment.

Single-Host Deployment

If all components will run on one machine, point every host group to the same IP address:

all:
    vars:
        ansible_connection: ssh
        ansible_user: diskover
        ansible_ssh_pass: "your-ssh-password"
        become: true
        ansible_become_pass: "your-sudo-password"
    children:
        diskover:
            children:
                web:
                    hosts:
                        10.0.1.50:
                            hostname: diskover.example.com
                rabbitmq:
                    hosts:
                        10.0.1.50:
                            hostname: diskover.example.com
                worker:
                    hosts:
                        10.0.1.50:
                            hostname: diskover.example.com
                elasticsearch:
                    hosts:
                        10.0.1.50:
                            hostname: diskover.example.com

Multi-Host Deployment

For separate machines, assign different IPs to each host group:

all:
    vars:
        ansible_connection: ssh
        ansible_user: diskover
        ansible_ssh_pass: "your-ssh-password"
        become: true
        ansible_become_pass: "your-sudo-password"
    children:
        diskover:
            children:
                web:
                    hosts:
                        10.0.1.10:
                            hostname: diskover-web.example.com
                rabbitmq:
                    hosts:
                        10.0.1.20:
                            hostname: diskover-mq.example.com
                worker:
                    hosts:
                        10.0.1.30:
                            hostname: diskover-worker.example.com
                elasticsearch:
                    hosts:
                        10.0.1.40:
                            hostname: diskover-es1.example.com
                        10.0.1.50:
                            hostname: diskover-es2.example.com
                        10.0.1.60:
                            hostname: diskover-es3.example.com

What Each Field Means

Field	Description
`ansible_connection: ssh`	Tells Ansible to connect via SSH. Use `local` if running Ansible directly on the target machine
`ansible_user`	The SSH username Ansible uses to log into each target machine
`ansible_ssh_pass`	The SSH password. Wrap in quotes if it contains special characters
`become: true`	Tells Ansible to use sudo for privilege escalation on the target machines
`ansible_become_pass`	The sudo password. Wrap in quotes if it contains special characters
`hostname`	The FQDN or short hostname of each target machine. Used by the Elasticsearch role for the ES node name configuration

Test Connectivity

Before running the playbook, verify that Ansible can reach all your target hosts:

ansible all -i inventory/hosts.yml -m ping

A successful response looks like:

10.0.1.50 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

If you get connection errors, check SSH access manually (ssh user@10.0.1.50) and review the Troubleshooting guide.

Step 4: Configure Your Variables (`all.yml`)

The variables file controls what Diskover version to install, your credentials for downloading packages, and how each component is configured. Edit the file at inventory/group_vars/all.yml.

Every variable in all.yml plays a role in the deployment — the file ships with sensible defaults, but you should review and set each one for your environment. The Variables Reference guide explains every variable in detail. Here, we'll walk through them by category.

Important: Do not remove or leave blank any variable that the file includes. Every variable is used by at least one Ansible role during the playbook run.

Diskover Application Settings

These are the variables you'll most likely need to change from the defaults:

# The Diskover version to install — must match a version in JFrog Artifactory
diskover_version: 2.5.0

# JFrog Artifactory credentials (provided by Diskover Data)
jfrog_user: your-jfrog-username
jfrog_pass: your-jfrog-api-key

# Auto-configure Diskover Admin on first install
# Set to true for fresh installs, false for upgrades
config_diskover_admin_api: true

RabbitMQ Settings

# Credentials for the RabbitMQ vhost — change the password from the default!
rabbitmq_user: "diskover"
rabbitmq_pass: "your-secure-rabbitmq-password"

Important: The default password (darkdata) is for lab use only. Always change this for production deployments.

Elasticsearch Settings

Review these and adjust based on your Elasticsearch host's hardware:

Variable	Default	How to Set It
`es_heap_size`	`4`	Set to roughly half of the ES host's RAM in GB (max 31). Example: 16 GB RAM host → `es_heap_size: 8`
`es_memory_lock`	`true`	Leave as `true` for production — prevents swapping
`es_network_host`	`0.0.0.0`	Leave as default (all interfaces) for multi-host. Use `127.0.0.1` only for single-host
`es_cluster_name`	`diskover`	Change only if running multiple ES clusters on the same network
`es_security_enabled`	`true`	Leave as `true` for production
`es_security_enrollment_enabled`	`true`	Leave as `true` for standard deployments
`es_data_dir`	`/var/lib/elasticsearch`	Change if index data should live on a separate volume
`es_log_dir`	`/var/log/elasticsearch`	Change if you want logs in a different location
`es_restart_on_change`	`true`	Set to `false` if you prefer manual restarts during maintenance windows
`es_start_service`	`true`	Set to `false` if you need to perform manual steps before ES starts

SSL/TLS Settings

Leave these at their defaults unless you want HTTPS for the web UI:

Variable	Default	When to Set
`ssl_enabled`	`false`	Set to `true` to enable HTTPS
`ssl_domain`	(empty)	Set to your FQDN when SSL is enabled (e.g., `diskover.example.com`)
`ssl_cert_source`	(empty)	Absolute path to your SSL certificate on the control machine
`ssl_key_source`	(empty)	Absolute path to your SSL private key on the control machine
`ssl_force_reconfigure`	`false`	Set to `true` when changing SSL settings on an existing deployment

Installation Mode & Diskover MCP Connector

Variable	Default	When to Change
`offline_install`	`false`	Set to `true` for air-gapped environments with no internet access
`offline_rpms_location`	`/unix/path/`	Set to the path where offline tarballs are staged (only when `offline_install: true`). The offline tarballs are provided by the Diskover team
`deploy_mcp_server`	`false`	Set to `true` to deploy the Diskover MCP connector

Step 5: Run the Playbook

With your inventory and variables configured, you're ready to deploy. From the diskover-ansible repository root, run:

time ansible-playbook -i inventory/hosts.yml install_diskover.yml

The time prefix is optional — it shows how long the deployment took when it finishes.

For AWS or cloud deployments using a PEM key instead of a password:

time ansible-playbook -i inventory/hosts.yml install_diskover.yml --private-key /path/to/your-key.pem

What Happens During the Run

The playbook executes roles in this order:

Offline repo setup (if offline_install: true) or JFrog repo setup (if online)
Firewalld & SELinux — Disables firewalld and SELinux on all target hosts
Elasticsearch — Installs and configures Elasticsearch on the elasticsearch hosts
Web stack — Installs Nginx, PHP 8.4, Python3, Diskover Admin, Diskover Web UI, and Kibana on the web hosts
Worker stack — Installs Python3, Diskoverd, and Celery on the worker hosts
RabbitMQ — Installs the message broker on the rabbitmq hosts
Diskover MCP — Installs the MCP server on the web hosts (if deploy_mcp_server: true)
Diskover Admin API — Configures Admin with Elasticsearch and RabbitMQ connection details (if config_diskover_admin_api: true)

The playbook typically takes 5–15 minutes depending on internet speed and the number of hosts.

Reading the Output

As the playbook runs, you'll see tasks scroll by with statuses like ok, changed, and skipping. The important thing to watch for is the PLAY RECAP at the very end.

Step 6: Verify the Deployment

Check the PLAY RECAP

A successful deployment shows failed=0 and unreachable=0 for every host:

PLAY RECAP *********************************************************************
10.0.1.50   : ok=71   changed=31   unreachable=0    failed=0    skipped=5    rescued=0    ignored=0

If any host shows failed=1 or higher, check the error message above the PLAY RECAP and consult the Troubleshooting guide.

Verify Diskover Admin Configuration

If config_diskover_admin_api: true was set (the default for first-time installs), the playbook has automatically configured Elasticsearch and RabbitMQ connection details in Diskover Admin. Verify by navigating to:

HTTP: http://<web-host-ip>:8000/login.php
HTTPS (if SSL enabled): https://<ssl_domain>/login.php

If config_diskover_admin_api was set to false, you'll need to complete the Admin Wizard manually. Logging into Diskover for the very first time automatically lands you in the Admin wizard.

Screenshots of the Elasticsearch & RabbitMQ setup config

Install Your License

A valid Diskover license must be installed before the worker (diskoverd) will start. Navigate to Diskover Admin, go through the Admin wizard and upload your license file.

Start Diskoverd

Once the license is installed and RabbitMQ connection info configured, start the scanner daemon on each worker host:

systemctl start diskoverd

Watch the startup logs to confirm it's running:

journalctl -fu diskoverd

You should see output like:

             _ _     _
            | (_)   | |
          __| |_ ___| | _______   _____ _ __
         / _` | / __| |/ / _ \ \ / / _ \ '__| /)___(\
        | (_| | \__ \   < (_) \ V /  __/ |    (='.'=)
         \__,_|_|___/_|\_\___/ \_/ \___|_|   (\")_(\")

             diskoverd v2.5.0 task worker daemon

Verify the service status:

systemctl status diskoverd

You should see active (running).

Start Celery

Celery is the task queue that handles file actions (copy, move, delete, tag, etc.) submitted through the Diskover Web UI. It runs on the same worker host(s) as Diskoverd and communicates with RabbitMQ to pick up and execute file action tasks.

Start the Celery service on each worker host:

systemctl start celery

Verify it's running:

systemctl status celery

You should see active (running).

To check the Celery logs for any errors:

tail -f /opt/diskover/diskover_celery/log/celery.log

Note: Diskoverd handles scanning and indexing, while Celery handles file actions. Both services run on the worker host(s) and both need to be running for full Diskover functionality. If you only need scanning (no file actions), Diskoverd alone is sufficient — but for most deployments you'll want both services started.

Access the Web UI

Open your browser and navigate to:

HTTP: http://<web-host-ip>
HTTPS (if SSL enabled): https://<ssl_domain>

Confirm the Worker is Online

From the Diskover Web UI, navigate to the Task Panel and check the Workers tab. Your worker should appear in the list with an online status.

Step 7: Run Your First Scan

With Diskover fully deployed and the worker online, you can now create your first index. From the Diskover Web UI:

Navigate to the Task Panel
Create a new scan task, specifying the filesystem path you want to index
Submit the task and watch it appear in the task queue
Once complete, the indexed data will be searchable from the main Diskover dashboard

Quick Reference: Service Management

After deployment, you may need to start, stop, or restart Diskover services. Here are the key commands:

Service	Start	Stop	Status	Logs
Elasticsearch	`systemctl start elasticsearch`	`systemctl stop elasticsearch`	`systemctl status elasticsearch`	`/var/log/elasticsearch/`
Diskoverd	`systemctl start diskoverd`	`systemctl stop diskoverd`	`systemctl status diskoverd`	`journalctl -fu diskoverd` `/var/log/diskover/`
Celery	`systemctl start celery`	`systemctl stop celery`	`systemctl status celery`	`/opt/diskover/diskover_celery/log/`
RabbitMQ	`systemctl start rabbitmq-server`	`systemctl stop rabbitmq-server`	`systemctl status rabbitmq-server`	`/var/log/rabbitmq/`
Nginx	`systemctl start nginx`	`systemctl stop nginx`	`systemctl status nginx`	`/var/log/nginx/error.log`
PHP-FPM	`systemctl start php-fpm`	`systemctl stop php-fpm`	`systemctl status php-fpm`	`/var/opt/remi/php84/log/php-fpm/`
Diskover Admin	`systemctl start diskover-admin`	`systemctl stop diskover-admin`	`systemctl status diskover-admin`	`journalctl -fu diskover-admin` `/var/log/diskover/`
Kibana	`systemctl start kibana`	`systemctl stop kibana`	`systemctl status kibana`	`/var/log/kibana/`

What's Next

Now that you have a working Diskover deployment, explore the other guides in this section:

Inventory Guide — Learn about advanced inventory configurations like multiple workers, Elasticsearch clusters, SSH key authentication, and proxy support
Variables Reference — Understand every tunable parameter in all.yml
Running Playbooks — Learn how to target specific hosts, perform upgrades, and configure SSL
Troubleshooting — When something goes wrong, this is your go-to reference

Support

If you run into issues that aren't covered in the Troubleshooting guide:

Support portal: https://support.diskoverdata.com
Knowledge base: https://support.diskoverdata.com/hc/en-us

When submitting a ticket, include the ansible.log file from the playbook directory, the PLAY RECAP output, and the target OS version (cat /etc/os-release).

Getting Started — Your First Deployment

Getting Started with Ansible Deployment

Before You Begin

What You'll Need

Decide Your Topology

Network Port Requirements

Step 1: Install Ansible on Your Control Machine

Linux (RHEL / Rocky / Fedora)

macOS

Ubuntu / Debian

Verify the Installation

Step 2: Get the Diskover Ansible Repository

Step 3: Configure Your Inventory (`hosts.yml`)

Single-Host Deployment

Multi-Host Deployment

What Each Field Means

Test Connectivity

Step 4: Configure Your Variables (`all.yml`)

Diskover Application Settings

RabbitMQ Settings

Elasticsearch Settings

SSL/TLS Settings

Installation Mode & Diskover MCP Connector

Step 5: Run the Playbook

What Happens During the Run

Reading the Output

Step 6: Verify the Deployment

Check the PLAY RECAP

Verify Diskover Admin Configuration

Install Your License

Start Diskoverd

Start Celery

Access the Web UI

Confirm the Worker is Online

Step 7: Run Your First Scan

Quick Reference: Service Management

What's Next

Support

Comments

Getting Started with Ansible Deployment

Before You Begin

What You'll Need

Decide Your Topology

Network Port Requirements

Step 1: Install Ansible on Your Control Machine

Linux (RHEL / Rocky / Fedora)

macOS

Ubuntu / Debian

Verify the Installation

Step 2: Get the Diskover Ansible Repository

Step 3: Configure Your Inventory (hosts.yml)

Single-Host Deployment

Multi-Host Deployment

What Each Field Means

Test Connectivity

Step 4: Configure Your Variables (all.yml)

Diskover Application Settings

RabbitMQ Settings

Elasticsearch Settings

SSL/TLS Settings

Installation Mode & Diskover MCP Connector

Step 5: Run the Playbook

What Happens During the Run

Reading the Output

Step 6: Verify the Deployment

Check the PLAY RECAP

Verify Diskover Admin Configuration

Install Your License

Start Diskoverd

Start Celery

Access the Web UI

Confirm the Worker is Online

Step 7: Run Your First Scan

Quick Reference: Service Management

What's Next

Support

Related articles

Step 3: Configure Your Inventory (`hosts.yml`)

Step 4: Configure Your Variables (`all.yml`)