Troubleshooting — Celery

Overview

Celery is the distributed task queue that powers Diskover's background processing. It runs as a worker service on each indexer machine and is responsible for executing two types of jobs:

Index tasks — file system crawls submitted by diskoverd
File action tasks — operations on indexed files (copy, export, permissions fix, etc.) triggered from the Diskover web UI

Workers are identified by hostname. Each worker registers two queues with RabbitMQ — index.<hostname> and fileactions.<hostname> — and only processes tasks routed to its own queues. Task results are stored in Elasticsearch.

Service Management

The Celery worker is managed via a systemd service, which internally uses celery multi to manage worker processes.

# Start
sudo systemctl start celery

# Stop
sudo systemctl stop celery

# Restart
sudo systemctl restart celery

# Status
sudo systemctl status celery

# Enable on boot
sudo systemctl enable celery

These commands apply to both RHEL/Rocky and Ubuntu. The service reads its configuration from /etc/celery.conf.

Configuration

/etc/celery.conf

This is the primary configuration file, sourced by the systemd service at startup. Key fields to review and update for each installation:

# Node name(s) — typically one per machine, named after the host
CELERYD_NODES="worker01"
# Multiple workers on one machine:
# CELERYD_NODES="w1 w2 w3"

# Path to the celery binary (update if using a virtualenv)
CELERY_BIN="/opt/python-venv-diskover/bin/celery"

# Celery app module — do not change
CELERY_APP="diskover_celery.worker"

# Path to the Diskover installation
INSTALL_DIRECTORY="/opt/diskover"

# Log and PID paths (use %n for node name substitution)
CELERYD_PID_FILE='${INSTALL_DIRECTORY}/diskover_celery/run/%n.pid'
CELERYD_LOG_FILE='${INSTALL_DIRECTORY}/diskover_celery/log/%n%I.log'
CELERYD_LOG_LEVEL="INFO"

# RabbitMQ connection (for reference — actual broker config is in Diskover Admin)
RABBIT_HOST="localhost"
RABBIT_USER="diskover"
RABBIT_PASS="darkdata"

Note: After editing /etc/celery.conf, restart the celery service for changes to take effect.

Diskover Admin — Task Queue Settings

The broker URL and worker behaviour are also configurable in Diskover Admin > Configuration > System > Message Queue:

Setting	Default	Description
Host	`localhost`	RabbitMQ hostname
User	`diskover`	RabbitMQ username
Password	`darkdata`	RabbitMQ password
Use SSL	Off	Enable TLS for broker connection
Worker Prefetch Multiplier	1	Tasks fetched per worker at a time (1 = one at a time)
Tasks Acks Late	On	ACK tasks after completion, not on receipt (safer for long tasks)
Cancel Long Running Tasks on Connection Loss	On	Terminate in-progress tasks if broker disconnects

Queue and Exchange Routing

Diskover uses two direct exchanges, each with per-hostname queues:

Exchange	Queue pattern	Used for
`index`	`index.<hostname>`	Crawl/indexing tasks (priority 0–10)
`fileactions`	`fileactions.<hostname>`	File operation tasks

Tasks are routed to the specific worker that scanned the files — this is determined by the session worker selection in the Diskover web UI.

Log Locations

Log	Path
Worker log	`/opt/diskover/diskover_celery/log/<nodename>.log`
Worker PID file	`/opt/diskover/diskover_celery/run/<nodename>.pid`

# Tail the worker log (replace worker01 with your node name)
tail -f /opt/diskover/diskover_celery/log/worker01.log

Log level is set by CELERYD_LOG_LEVEL in /etc/celery.conf. Use DEBUG to see individual task dispatch and result events.

Common Operations

Check Which Workers Are Active

From the RabbitMQ perspective:

sudo rabbitmqctl list_consumers

Using Celery's inspect command (run from /opt/diskover):

/opt/python-venv-diskover/bin/celery -A diskover_celery.worker inspect active

Check Registered Queues on a Worker

/opt/python-venv-diskover/bin/celery -A diskover_celery.worker inspect active_queues

Check Queue Depth

sudo rabbitmqctl list_queues name messages consumers

A non-zero messages count with 0 consumers means the worker is down for that hostname.

Manually Restart a Single Worker Node

# Restart using celery multi directly (reads /etc/celery.conf)
source /etc/celery.conf
$CELERY_BIN -A $CELERY_APP multi restart $CELERYD_NODES \
    --pidfile=$CELERYD_PID_FILE \
    --logfile=$CELERYD_LOG_FILE \
    --loglevel=$CELERYD_LOG_LEVEL \
    $CELERYD_OPTS

Or simply use systemd:

sudo systemctl restart celery

Troubleshooting

Worker Not Picking Up Tasks

Confirm the worker is running and consuming the right queues:

sudo systemctl status celery
sudo rabbitmqctl list_consumers

Check the worker registered the correct hostname-based queues. The queue name must match the hostname of the machine:

hostname
# Compare with queue names shown in:
sudo rabbitmqctl list_queues name consumers

If the worker registered under a different hostname than expected, set the node name explicitly in /etc/celery.conf:

CELERYD_NODES="expected-hostname"

Worker Starts Then Immediately Stops

The worker failed to connect to either RabbitMQ or the Diskover Admin API on startup.

# Check last 100 lines of the worker log
tail -100 /opt/diskover/diskover_celery/log/worker01.log

Common causes:

Broker unreachable — verify RabbitMQ is running and port 5672 is accessible from this machine. Verify the correct RabbitMQ host is correct inside Diskover Admin > Configuration > System > Message Queue > Host
Admin API unreachable — the worker fetches its configuration from the Diskover Admin API on startup; if that fails, it exits. Verify diskover-admin is running and the API URL is correct in /root/.config/diskoverd/config.yaml
License failure — the license check runs on startup; an expired or missing license will cause the worker to exit

Tasks Stuck in PENDING State

Tasks are submitted but never transition to STARTED or SUCCESS.

Check there is a live consumer for the target queue:
```
sudo rabbitmqctl list_consumers
```

Check the worker is not paused:

/opt/python-venv-diskover/bin/celery -A diskover_celery.worker inspect active

Check task result backend — results are stored in Elasticsearch. If ES is unreachable, results cannot be written and tasks may appear stuck:
```
curl -s http://localhost:9200/_cluster/health
```

Stale PID File Prevents Worker from Starting

If the worker crashed without cleaning up, the PID file may block restart:

# Check for stale PID files
ls /opt/diskover/diskover_celery/run/

# Remove stale PID file (replace worker01 with your node name)
rm /opt/diskover/diskover_celery/run/worker01.pid

# Then restart
sudo systemctl restart celery

Task Failed — Reading the Error

Task errors are stored in Elasticsearch. They are also visible in the Flower UI (if running) under the Tasks tab. In the Diskover web UI, failed file actions will show an error message returned by the task.

To inspect the raw result from the Elasticsearch backend:

curl -s "http://localhost:9200/celery/_search?pretty" \
  -H 'Content-Type: application/json' \
  -d '{"query": {"term": {"status": "FAILURE"}}, "size": 10, "sort": [{"date_done": "desc"}]}'

Official Documentation