Manual Ubuntu/Debian Worker Deployment
Component: Diskover Worker (Diskoverd + Celery)
Deployment Method: Manual (APT packages)
Platform: Ubuntu/Debian (22/24)
Overview
This guide walks through manually deploying a Diskover worker node on Ubuntu or Debian. A worker runs two services — diskoverd (the scanner daemon) and Celery (the task queue for file actions) — and connects back to your existing Diskover web and Elasticsearch infrastructure.
Use this guide when:
Ansible automation is not available in your environment
You are adding a new worker to an existing Diskover deployment
You need to understand the individual steps that Ansible automates
By the end of this guide, your worker will appear in the Diskover Task Panel and be ready to run scans.
Prerequisites
Before starting, confirm the following are in place:
Existing Diskover Infrastructure:
Component | Requirement |
|---|---|
Diskover Web + Admin | Running and accessible (HTTP or HTTPS) |
Elasticsearch | Running and accessible on port 9200 |
RabbitMQ | Running and accessible on port 5672 |
Worker Host Requirements:
Requirement | Detail |
|---|---|
OS | Ubuntu 22.04/24.04 or Debian 11/12 |
Python | 3.12+ (or 3.11 if system Python is older) |
CPU | 2+ cores recommended (more cores = faster scanning) |
Memory | 4 GB minimum, 8 GB+ recommended |
Disk | Sufficient for OS + Diskover packages (~2 GB) |
Network | Reachable to Web host, Elasticsearch, and RabbitMQ |
Access | Root or sudo privileges |
Credentials:
Credential | Purpose | How to Obtain |
|---|---|---|
JFrog Artifactory username | Download Diskover packages | Provided by Diskover Data |
JFrog Artifactory token | Authenticate to package repository | Provided by Diskover Data |
Diskover license | Activate the platform | Must already be installed on Diskover Admin |
Network Connectivity:
Destination | Port | Purpose |
|---|---|---|
Diskover Web host | 8000 (HTTP) or 443 (HTTPS) | API communication |
Elasticsearch host(s) | 9200 | Search and indexing |
RabbitMQ host | 5672 | Celery task messaging |
443 | Package downloads (online installs only) |
Pre-Deployment Checklist
- Worker host OS is installed and up to date (
apt update && apt upgrade) - Root or sudo access confirmed on the worker host
- Network connectivity verified to Web host, Elasticsearch, and RabbitMQ
- JFrog credentials obtained from Diskover Data
- Diskover license already installed on Diskover Admin
- SSL certificate available (if Diskover Web uses HTTPS)
- Elasticsearch CA certificate available (if Elasticsearch security is enabled)
Step 1: Configure the APT Repository
Set up the Diskover package repository so you can install packages via apt.
Create the repository source file:
cat <<'EOF' > /etc/apt/sources.list.d/diskover.list deb [trusted=yes] https://artifactory.diskoverdata.com/artifactory/diskover-debian-prod stable main EOF
Create the authentication file with your JFrog credentials:
mkdir -p /etc/apt/auth.conf.d cat <<EOF > /etc/apt/auth.conf.d/diskover.conf machine artifactory.diskoverdata.com login your-jfrog-username password your-jfrog-token EOF chmod 600 /etc/apt/auth.conf.d/diskover.conf
Note: The auth file is set to mode
600so only root can read your credentials.
Update the package cache and verify the Diskover packages are visible:
apt update apt-cache pkgnames | sort | grep diskover
You should see a list of available Diskover packages including diskoverd, diskover-scanner-*, diskover-plugin-*, and diskover-file-actions-*.
Step 2: Set Up Python Virtual Environment
Diskover runs inside a Python virtual environment to isolate its dependencies from the system Python.
Determine your system Python version:
python3 -V
If your system Python is 3.12 or newer, use it directly. If it's older than 3.12, install Python 3.11:
apt install python3.11 python3.11-venv python3.11-dev
Install the required system packages:
apt install python3-venv gcc python3-dev
Note: If you installed Python 3.11 separately, replace
python3-devwithpython3.11-dev.
Create the virtual environment:
python3 -m venv /opt/python-venv-diskover
If using Python 3.11 specifically:
python3.11 -m venv /opt/python-venv-diskover
Upgrade pip, setuptools, and wheel inside the venv:
/opt/python-venv-diskover/bin/pip install --upgrade pip setuptools wheel
Add the virtual environment to the system PATH so commands like celery are available:
cat <<'EOF' > /etc/profile.d/diskover-venv.sh # Add Diskover venv to PATH for CLI usage export PATH="/opt/python-venv-diskover/bin:$PATH" EOF
Source it for your current session:
source /etc/profile.d/diskover-venv.sh
Step 3: Install Diskover Packages
Install diskoverd (the worker daemon) along with all scanners, plugins, and file actions:
apt install diskoverd=<version> \ diskover-scanner-* \ diskover-plugin-* \ diskover-file-actions-* \ diskover-ingester-parquet
Replace
<version>with your target Diskover version (e.g.,2.5.0). Scanners, plugins, and file actions use wildcard installs to get all available packages.
Install NFS and CIFS utilities so the worker can access network file systems:
apt install nfs-common nfs-kernel-server cifs-utils smbclient
Enable and start the NFS server:
systemctl enable nfs-server systemctl start nfs-server
Install the Python dependencies into the virtual environment:
/opt/python-venv-diskover/bin/pip install -r /opt/diskover/requirements.txt
Step 4: Configure the Worker (config.yaml)
Create the diskoverd configuration directory and file. This tells the worker how to reach the Diskover API.
mkdir -p /root/.config/diskoverd
Create the config file:
If Diskover Web uses HTTPS (SSL enabled):
cat <<EOF > /root/.config/diskoverd/config.yaml appName: diskoverd apiurl: https://your-diskover-domain/api.php apiuser: apipass: EOF
If Diskover Web uses HTTP (no SSL):
cat <<EOF > /root/.config/diskoverd/config.yaml appName: diskoverd apiurl: http://your-web-host-ip:8000/api.php apiuser: apipass: EOF
Replace
your-diskover-domainoryour-web-host-ipwith the actual hostname or IP of your Diskover Web host.
Step 5: Configure the Encryption Key
Diskoverd needs the same encryption key used by Diskover Admin. This key is stored in the diskover-admin systemd service file on the web host.
On the web host, retrieve the key:
grep -oP 'DISKOVER_ENCRYPT_SECRET_KEY=\K.*' /usr/lib/systemd/system/diskover-admin.service
Copy the output value. On the worker host, add the key to the diskoverd service file. Open the file:
vi /etc/systemd/system/diskoverd.service
Add the following line in the [Service] section, below the User=root line:
Environment="DISKOVER_ENCRYPT_SECRET_KEY=<key-value-from-web-host>"
Reload systemd to pick up the change:
systemctl daemon-reload
Step 6: Configure Celery
Celery handles asynchronous file actions. Its configuration comes bundled with the diskoverd package — you just need to put the files in the right places.
Copy the Celery environment configuration file:
cp /opt/diskover/diskover_celery/etc/celery.conf /etc/celery.conf
Create the Celery systemd service file:
cat <<'EOF' > /etc/systemd/system/celery.service
[Unit]
Description=Diskover Celery Service
After=network.target
[Service]
Type=forking
User=root
Group=root
EnvironmentFile=/etc/celery.conf
ExecStart=/bin/sh -c '. /etc/celery.conf && cd ${INSTALL_DIRECTORY} && ${CELERY_BIN} -A ${CELERY_APP} multi start ${CELERYD_NODES} \
--pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
--loglevel="${CELERYD_LOG_LEVEL}" $CELERYD_OPTS'
ExecStop=/bin/sh -c '. /etc/celery.conf && cd ${INSTALL_DIRECTORY} && ${CELERY_BIN} multi stopwait $CELERYD_NODES \
--pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
--loglevel="${CELERYD_LOG_LEVEL}"'
ExecReload=/bin/sh -c '. /etc/celery.conf && cd ${INSTALL_DIRECTORY} && ${CELERY_BIN} -A $CELERY_APP multi restart $CELERYD_NODES \
--pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
--loglevel="${CELERYD_LOG_LEVEL}" $CELERYD_OPTS'
Restart=always
[Install]
WantedBy=multi-user.target
EOF
Create the required log and runtime directories:
mkdir -p /var/log/celery /var/run/celery chmod 777 /var/log/celery /var/run/celery
Verify the Celery binary path matches what's configured in /etc/celery.conf:
which celery
If the path differs from the CELERY_BIN value in /etc/celery.conf, update the conf file:
vi /etc/celery.conf
Set CELERY_BIN to match the output of which celery (typically /opt/python-venv-diskover/bin/celery).
Enable the service and reload systemd:
chmod 644 /etc/systemd/system/celery.service systemctl daemon-reload systemctl enable celery
Step 7: Elasticsearch Certificate (if ES Security is Enabled)
If Elasticsearch has security enabled (the default for Elasticsearch 8.x), the worker needs the Elasticsearch CA certificate to communicate over HTTPS.
On the Elasticsearch host, locate the certificate:
/etc/elasticsearch/certs/http_ca.crt
Copy this file to the worker host and place it in the system CA trust store:
mkdir -p /usr/local/share/ca-certificates cp /path/to/http_ca.crt /usr/local/share/ca-certificates/http_ca.crt update-ca-certificates
Note: Replace
/path/to/http_ca.crtwith wherever you staged the certificate file on the worker.
Step 8: SSL Certificate for Diskover Web (if HTTPS is Enabled)
If Diskover Web is served over HTTPS, the worker's Python environment needs to trust the SSL certificate so API requests succeed.
Locate the Python certifi CA bundle:
/opt/python-venv-diskover/bin/python3 -c "import certifi; print(certifi.where())"
Back up the original bundle (only needs to be done once):
cp $(/opt/python-venv-diskover/bin/python3 -c "import certifi; print(certifi.where())") \ $(/opt/python-venv-diskover/bin/python3 -c "import certifi; print(certifi.where())").bak
Append your Diskover SSL certificate to the bundle:
cat /path/to/diskover-ssl.crt >> $(/opt/python-venv-diskover/bin/python3 -c "import certifi; print(certifi.where())")
Replace
/path/to/diskover-ssl.crtwith the path to your Diskover Web SSL.crtfile.
Step 9: Configure Python Path in Diskover Admin
The worker needs to know which Python binary to use. This is configured through the Diskover Admin web interface.
Navigate to Diskover Admin → DiskoverD
Update the Python Command setting to:
/opt/python-venv-diskover/bin/python3
Save the configuration
Step 10: Start Services
Start the worker and Celery services:
systemctl start diskoverd celery
Verify both services are running:
systemctl status diskoverd celery
Both should show active (running).
Check the diskoverd startup logs:
journalctl -fu diskoverd
You should see the worker connecting to the Diskover API and registering itself.
Verification
Check | Command / Action | Expected Result |
|---|---|---|
Diskoverd service |
|
|
Celery service |
|
|
Worker registration | Diskover Web → Task Panel → Workers tab | Worker listed as online |
API connectivity |
| No connection errors in logs |
Test scan | Run a scan from Task Panel targeting this worker | Scan completes without errors |
Service Management
Action | Command |
|---|---|
Start diskoverd |
|
Stop diskoverd |
|
Restart diskoverd |
|
View diskoverd logs |
|
Start Celery |
|
Stop Celery |
|
Restart Celery |
|
View Celery logs | Located in |
Troubleshooting
Issue | Cause | Solution |
|---|---|---|
| Incorrect JFrog credentials | Verify username and token in |
diskoverd fails to start | Missing or incorrect | Check |
diskoverd starts but worker not visible in Task Panel | API URL is wrong or unreachable | Confirm the URL in |
Celery fails to start |
| Run |
Python pip install fails | Missing gcc or python-dev | Install build dependencies: |
SSL errors when connecting to API | Python doesn't trust the certificate | Append the SSL cert to the certifi CA bundle (see Step 8) |
Elasticsearch connection errors | Missing ES CA certificate | Copy |
Encryption key mismatch | Missing | Sync the key from the web host's |
Permission denied on Celery directories | Log/run directories not created | Run |
Log File Locations:
Component | Log Location |
|---|---|
Diskoverd |
|
Celery |
|
APT package manager |
|
File and Directory Reference
Path | Purpose |
|---|---|
| Main Diskover installation directory |
| Python virtual environment |
| Python package dependencies |
| Celery application directory |
| Diskoverd configuration |
| Diskoverd systemd unit |
| Celery systemd unit |
| Celery environment configuration |
| APT repository source |
| APT repository credentials |
| PATH configuration for venv |
| Celery log files |
| Celery runtime files |
Comments
0 comments
Please sign in to leave a comment.