Troubleshooting — Diskoverd Task-Worker

Overview

Diskoverd is the Diskover worker daemon. It runs on each indexer host and is responsible for:

Receiving and executing crawl tasks — accepts jobs from the task queue and launches diskover crawl processes
Worker coordination — registers the worker with the platform, reports health and capacity, and manages concurrent task execution
Volume mounting — optionally mounts NFS/CIFS volumes before crawling and unmounts them after
Scheduling — runs cron-based crawl schedules assigned to this worker

Diskoverd is a long-running Python process. It is typically started as a system service and kept running continuously alongside the Celery worker.

Service Management

Diskoverd is commonly run as a systemd service. If the service file has been installed:

RHEL / CentOS / Rocky Linux and Ubuntu

# Start
sudo systemctl start diskoverd

# Stop
sudo systemctl stop diskoverd

# Restart
sudo systemctl restart diskoverd

# Status
sudo systemctl status diskoverd

# Enable on boot
sudo systemctl enable diskoverd

Running Manually

Diskoverd can also be started directly for testing or troubleshooting:

cd /opt/diskover
python3 diskoverd.py

# With a custom worker name
python3 diskoverd.py -n my-worker-name

# Verbose logging to stdout
python3 diskoverd.py --verbose

# Skip NFS/CIFS mount capability checks (Linux workers without mount support)
python3 diskoverd.py --assumemountingenabled

Worker Identity

Each Diskoverd instance registers with the platform under a worker name. This name must be unique across all workers and is used to route tasks and identify the worker in the UI.

Worker name is resolved in this order:

-n <name> command-line option
DISKOVERD_WORKERNAME environment variable
Default: <hostname>_<5-char-unique-id> (automatically generated)

The worker name determines which Celery queues are consumed: index.<workername> and fileactions.<workername>. If the worker name changes between restarts, the old queues become orphaned in RabbitMQ.

Best practice: Set a stable, explicit worker name via -n or DISKOVERD_WORKERNAME so the queue names remain consistent across restarts.

Configuration

Worker configuration is managed through the Diskover Admin UI and stored in the shared SQLite database. No manual config file editing is required for most settings. A restart to any of the configs below require a restart of the ‘diskoverd’ service.

Navigate to Diskover Admin > Configuration > DiskoverD to adjust:

Setting	Default	Description
Work Threads	4	Maximum number of tasks running concurrently on this worker. Each task spawns a subprocess.
Python Command	`python3`	Path to the Python interpreter used to launch crawl subprocesses
Diskover Path	`/opt/diskover/`	Path to the Diskover core installation
Timezone	`America/Vancouver`	Local timezone for scheduled tasks
NFS Enabled	On	Whether this worker can mount NFS volumes
CIFS Enabled	On	Whether this worker can mount CIFS/SMB volumes
Secret Key	`darkdata`	JWT secret used for inter-service communication. Change in production.
SSL Cert Verify	Off	Verify SSL certificates when calling external APIs

Logging Configuration

Log settings are under Diskover Admin > Configuration > DiskoverD > Logging:

Setting	Default	Description
Log Level	`INFO`	`DEBUG` for verbose output, `WARNING` to reduce noise
Log Directory	`/var/log/diskover/`	Directory where log files are written
Log Rotation	Sized (100 MB, 1 backup)	Rotate by file size or on a schedule

Environment Variables

Any DiskoverD configuration field can be overridden via environment variable using the DISKOVERD_ prefix:

DISKOVERD_WORKERNAME=my-worker       # Override worker name
DISKOVERD_WORK_THREADS=8             # Override thread count

For volume encryption, set the encryption key via environment variable:

DISKOVER_ENCRYPT_SECRET_KEY=<key>    # Required for encrypted volume mounting

Log Locations

Log	Path
Main daemon log	`/var/log/diskover/diskoverd.log`
Subprocess log (crawl child processes)	`/var/log/diskover/diskoverd_subproc.log`
systemd journal	`journalctl -u diskoverd`

# Tail the main daemon log
sudo tail -f /var/log/diskover/diskoverd.log

# Tail the subprocess log (individual crawl output)
sudo tail -f /var/log/diskover/diskoverd_subproc.log

# Tail the service directly
journalctl -f -u diskoverd

Common Operations

Check Worker Status in the UI

Active workers and their health are visible in Diskover UI > Task Panel > Workers. A worker is considered offline if no heartbeat has been received for more than 10 minutes.

Check Worker is Registered

# Should show the worker's registered with the system
curl -XGET https://diskover-web-URL/diskover_admin/api/config/workers

Change Worker Concurrency

Increase or decrease Work Threads in Diskover Admin > Configuration > DiskoverD. Changes take effect after restarting diskoverd service on the appropriate worker.

A good starting point is one thread per physical CPU core. Increase for I/O-bound workloads (network storage), decrease if the indexer is CPU or memory constrained.

Run a Manual Crawl (Bypass Scheduler)

For ad-hoc crawls without going through the task queue, diskover can be invoked directly:

python3 /opt/diskover/diskover.py -i diskover-<indexname> /path/to/crawl

This bypasses diskoverd entirely and is useful for testing or one-off re-scans.

View Scheduled Tasks

Tasks assigned to this worker are visible in Diskover UI > Task Panel. The schedule, last run time, and status are shown for each task.