Offline / Air-Gapped Installation
This guide covers deploying Diskover using Ansible in environments with no internet access. Instead of downloading packages from JFrog Artifactory, the playbook installs from pre-built RPM tarballs that you stage on your Ansible control machine ahead of time.
Use this guide when your target hosts cannot reach the internet — common in government, defense, financial, and other security-sensitive environments.
For standard online installations, see the Getting Started guide.
Supported Platforms
Offline installation is currently supported on:
Platform | Versions |
|---|---|
RHEL | 8, 9 |
Rocky Linux | 8, 9 |
Offline artifacts are RPM-based. Debian/Ubuntu offline installation is not currently supported.
Prerequisites
Control Machine (where you run Ansible)
Requirement | Detail |
|---|---|
OS | macOS, Linux, or Windows (WSL) |
Ansible | Version 2.16.x |
Python | 3.11 or later |
sshpass | Required when using password-based SSH authentication |
SSH access | Must be able to SSH into all target hosts |
The control machine needs internet access only for installing Ansible itself. If the control machine is also air-gapped, Ansible and its dependencies must be pre-installed through your organization's internal package management.
Target Machines (where Diskover is installed)
Requirement | Detail |
|---|---|
OS | RHEL/Rocky Linux 8, 9, or 10 / CentOS 10 |
SSH | SSH server running and accessible from the control machine |
sudo | The SSH user must have sudo privileges |
Internet access | Not required — that's the point of this guide |
Offline Artifacts
The offline artifacts are pre-built RPM tarballs hosted by Diskover Data in JFrog Artifactory. Contact Diskover Support to request the artifacts for your target OS version and Diskover version.
You will receive the following tarballs:
Tarball | Contents | Used By |
|---|---|---|
| Elasticsearch RPMs + Java dependencies | Elasticsearch hosts |
| Diskover Web, Diskover Admin, Nginx, PHP 8.4, Kibana RPMs | Web hosts |
| Python3.x RPMs | Web + worker hosts |
| RabbitMQ + Erlang RPMs | RabbitMQ hosts |
| Node.js RPMs | Web hosts (for MCP Connector, if enabled) |
| Python wheel files for Diskover Admin | Web hosts |
| Python wheel files for Diskoverd | Worker hosts |
| Pre-cached npm dependencies for Diskover MCP Connector | Web hosts (if MCP enabled) |
Each tarball contains the service RPMs, all dependency packages, and a repodata/ directory that allows it to function as a local yum repository.
Important: Offline artifacts are built per OS version. RHEL 8 artifacts cannot be used on RHEL 9 hosts (and vice versa). Make sure you request the correct artifacts for your target OS.
Step-by-Step Installation
Step 1: Install Ansible on Your Control Machine
If your control machine has internet access, install Ansible normally:
Linux (RHEL/Rocky/Fedora):
sudo dnf install ansible sshpass
macOS:
brew install sshpass pip3 install ansible-core==2.16.5
Verify the installation:
ansible --version
Confirm the version is 2.16.x (recommended). Older versions of Ansible will also work, but versions 2.17+ have breaking changes with some modules and should be avoided.
Step 2: Stage the Offline Artifacts
Place all of the offline tarballs in a single directory on your Ansible control machine. The playbook will automatically copy the relevant tarballs to each target host during execution — you do not need to manually transfer them to every machine.
For example, place them at /home/admin/diskover-offline/:
/home/admin/diskover-offline/ ├── core-diskover.tgz ├── elasticsearch.tgz ├── python.tgz ├── rabbitmq.tgz ├── nodejs.tgz ├── admin-pips.tgz ├── worker-pips.tgz └── npm-cache.tgz
This directory path is what you'll set as offline_rpms_location in all.yml.
Tip for large environments: The tarballs can be several hundred MB each. If you have many target hosts and a slow network between the control machine and targets, it may be faster to stage the tarballs directly on each target host (at the same path) before running Ansible. This avoids the control machine having to copy them to every host.
Step 3: Configure Your Inventory
Edit inventory/hosts.yml exactly as you would for an online installation. The inventory format is the same — see the Inventory Guide for details.
Example single-host deployment:
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "your-ssh-password"
become: true
ansible_become_pass: "your-sudo-password"
children:
diskover:
children:
web:
hosts:
10.0.1.50:
hostname: diskover.example.com
rabbitmq:
hosts:
10.0.1.50:
hostname: diskover.example.com
worker:
hosts:
10.0.1.50:
hostname: diskover.example.com
elasticsearch:
hosts:
10.0.1.50:
hostname: diskover.example.com
Step 4: Configure Your Variables
Edit inventory/group_vars/all.yml with the offline-specific settings:
# ═══════════════════════════════════════════════════════════════ # Enable offline mode # ═══════════════════════════════════════════════════════════════ offline_install: true # Path on the control machine where tarballs are staged offline_rpms_location: /home/admin/diskover-offline/ # ═══════════════════════════════════════════════════════════════ # Diskover version — must match the artifact version # ═══════════════════════════════════════════════════════════════ diskover_version: 2.5.0 # JFrog credentials are NOT needed for offline installs # You can leave these blank or as-is jfrog_user: "" jfrog_pass: "" # ═══════════════════════════════════════════════════════════════ # First install — auto-configure Diskover Admin # ═══════════════════════════════════════════════════════════════ config_diskover_admin_api: true # ═══════════════════════════════════════════════════════════════ # RabbitMQ — change from default for production! # ═══════════════════════════════════════════════════════════════ rabbitmq_user: "diskover" rabbitmq_pass: "your-rabbitmq-password" # ═══════════════════════════════════════════════════════════════ # Elasticsearch — adjust heap size for your host's RAM # ═══════════════════════════════════════════════════════════════ es_heap_size: 4
See the Variables Reference guide for a complete explanation of every variable.
Step 5: Run the Playbook
time ansible-playbook -i inventory/hosts.yml install_diskover.yml
The playbook detects offline_install: true and automatically:
Copies the relevant tarballs from
offline_rpms_locationto each target hostExtracts them and creates local yum repositories
Installs packages from the local repos instead of JFrog Artifactory
Installs Python pip packages from the local wheel directories
Cleans up temporary files after each role completes
Step 6: Verify the Deployment
1. Check the PLAY RECAP:
PLAY RECAP ********************************************************************* 10.0.1.50 : ok=71 changed=31 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0
Verify failed=0 for every host.
2. Verify Diskover Admin configuration:
Navigate to http://<web-host-ip>:8000/diskover_admin/config/ (or https://<ssl_domain>/diskover_admin/config/ if SSL is enabled) and confirm Elasticsearch and RabbitMQ connection details are configured.
3. Install your license:
The first time you log into Diskover Admin, you will automatically be taken to the Admin Wizard. The wizard will guide you through entering your license key along with other initial configuration settings. Follow the wizard steps to complete your license activation.
4. Start Diskoverd:
Once the license is installed and RabbitMQ connection info configured, start the scanner daemon on each worker host:
systemctl start diskoverd
Watch the startup logs to confirm it's running:
journalctl -fu diskoverd
Verify it shows active (running):
systemctl status diskoverd
5. Start Celery:
Celery is the task queue that handles file actions (copy, move, delete, tag, etc.) submitted through the Diskover Web UI. It runs on the same worker host(s) as Diskoverd and communicates with RabbitMQ to pick up and execute file action tasks.
Start the Celery service on each worker host:
systemctl start celery
Verify it's running:
systemctl status celery
To check the Celery logs for any errors:
tail -f /opt/diskover/diskover_celery/log/celery.log
Note: Diskoverd handles scanning and indexing, while Celery handles file actions. Both services run on the worker host(s) and both need to be running for full Diskover functionality. If you only need scanning (no file actions), Diskoverd alone is sufficient — but for most deployments you'll want both services started.
6. Access the web UI:
Navigate to http://<web-host-ip> (or https://<ssl_domain> if SSL is enabled) and verify the login page loads.
How Offline Installation Works
Understanding the mechanics helps with troubleshooting. Here's what happens under the hood.
1. Tarball Transfer and Extraction
The offline_install role copies each tarball from the control machine to /tmp/rpms/ on the target host and extracts it:
/tmp/rpms/ ├── elasticsearch/ ← extracted from elasticsearch.tgz │ ├── *.rpm │ └── repodata/ ├── diskover/ ← extracted from core-diskover.tgz │ ├── *.rpm │ └── repodata/ ├── python/ ← extracted from python.tgz │ ├── *.rpm │ └── repodata/ ├── rabbitmq/ ← extracted from rabbitmq.tgz │ ├── *.rpm │ └── repodata/ ├── nodejs/ ← extracted from nodejs.tgz │ ├── *.rpm │ └── repodata/ ├── admin-pips/ ← extracted from admin-pips.tgz ├── worker-pips/ ← extracted from worker-pips.tgz └── npm-cache/ ← extracted from npm-cache.tgz
2. Local Yum Repository Creation
For each extracted RPM directory, Ansible creates a local yum repository file in /etc/yum.repos.d/:
# Example: /etc/yum.repos.d/offline-elasticsearch.repo [offline-elasticsearch] baseurl = file:///tmp/rpms/elasticsearch/ enabled = 1 gpgcheck = 0 name = Local Elasticsearch repository
3. Package Installation
Each Ansible role detects that offline_install: true is set and installs packages from the local repository instead of the online one. Here's how they differ:
Online install:
ansible.builtin.dnf:
name: elasticsearch
enablerepo: elasticsearch
In an online install, dnf pulls the Elasticsearch package from the remote JFrog Artifactory repository (configured as the elasticsearch repo). The target host reaches out to JFrog over the internet, downloads the RPM, and installs it. All other system repositories (like baseos, appstream, etc.) remain active as normal.
Offline install:
ansible.builtin.dnf:
name: elasticsearch
enablerepo: offline-elasticsearch
disablerepo: "*"
In an offline install, two key things change. First, disablerepo: "*" tells dnf to turn off every repository on the system — this prevents dnf from trying to reach any remote servers (which would fail in an air-gapped environment and cause the playbook to hang or error out). Second, enablerepo: offline-elasticsearch re-enables only the local repository that was created in the previous step (the one pointing to file:///tmp/rpms/elasticsearch/). This means dnf installs the package entirely from the local RPM files on disk — no internet required.
Python pip packages follow the same offline pattern, using the --no-index and --find-links flags:
pip install --no-index --find-links=/tmp/rpms/admin-pips/ <package>
The --no-index flag tells pip not to contact PyPI (the online Python package index), and --find-links points it to the local directory of pre-downloaded wheel files instead.
4. Cleanup
After each role completes its installation, the extracted RPMs and temporary repo files are removed from /tmp/rpms/ to free disk space.
Upgrading an Offline Deployment
To upgrade Diskover in an offline environment:
Obtain new artifacts: Request the new version's offline tarballs from Diskover Data
Stage the new tarballs: Place them on the control machine at
offline_rpms_location(replacing the previous version's tarballs)Update
all.yml:diskover_version: 2.5.1 # New version config_diskover_admin_api: false # Preserve existing config!
Run the playbook:
time ansible-playbook -i inventory/hosts.yml install_diskover.yml
Troubleshooting Offline Installations
"No package matching '...' found available"
The local yum repository may not be set up correctly, or the tarball wasn't extracted properly.
Diagnosis:
# Check if the tarball contains repodata tar tzf elasticsearch.tgz | grep repodata # Check if the local repo file exists on the target cat /etc/yum.repos.d/offline-*.repo # Check if extracted files exist on the target ls /tmp/rpms/elasticsearch/
Fix:
Verify the tarballs exist at
offline_rpms_locationon the control machineEnsure the tarballs are not corrupt (re-download if necessary)
Tarball Copy Fails or Times Out
The tarballs can be large (several hundred MB). If the Ansible copy module times out transferring them:
Fix:
Stage the tarballs manually on each target host before running Ansible. Copy them to any path and set
offline_rpms_locationto that path inall.ymlIncrease the connection timeout in
ansible.cfg:timeout = 120
Wrong OS Version
Offline artifacts are built per OS version. Using RHEL 8 artifacts on a RHEL 9 host (or vice versa) will cause package dependency failures.
Fix:
Verify the target OS:
cat /etc/os-releaseon the target hostRequest the correct artifacts from Diskover Support
Pip Install Fails with "No matching distribution found"
The Python wheel files in admin-pips.tgz or worker-pips.tgz don't match the target host's Python version or architecture.
Fix:
Check Python version on the target:
python3 --versionEnsure the pip tarballs were built for the same Python version and OS as the target
Contact Diskover Support for compatible artifacts
Support
If you encounter issues with offline deployments:
Support portal: https://support.diskoverdata.com
Knowledge base: https://support.diskoverdata.com/hc/en-us
When submitting a ticket, include:
The
ansible.logfilePLAY RECAP output
Target OS version:
cat /etc/os-releaseThe artifact filenames and sizes
Any error messages from
/tmp/rpms/or/etc/yum.repos.d/
Comments
0 comments
Please sign in to leave a comment.