Troubleshooting — Celery
Overview
Celery is the distributed task queue that powers Diskover's background processing. It runs as a worker service on each indexer machine and is responsible for executing two types of jobs:
Index tasks — file system crawls submitted by diskoverd
File action tasks — operations on indexed files (copy, export, permissions fix, etc.) triggered from the Diskover web UI
Workers are identified by hostname. Each worker registers two queues with RabbitMQ — index.<hostname> and fileactions.<hostname> — and only processes tasks routed to its own queues. Task results are stored in Elasticsearch.
Service Management
The Celery worker is managed via a systemd service, which internally uses celery multi to manage worker processes.
# Start sudo systemctl start celery # Stop sudo systemctl stop celery # Restart sudo systemctl restart celery # Status sudo systemctl status celery # Enable on boot sudo systemctl enable celery
These commands apply to both RHEL/Rocky and Ubuntu. The service reads its configuration from
/etc/celery.conf.
Configuration
/etc/celery.conf
This is the primary configuration file, sourced by the systemd service at startup. Key fields to review and update for each installation:
# Node name(s) — typically one per machine, named after the host
CELERYD_NODES="worker01"
# Multiple workers on one machine:
# CELERYD_NODES="w1 w2 w3"
# Path to the celery binary (update if using a virtualenv)
CELERY_BIN="/opt/python-venv-diskover/bin/celery"
# Celery app module — do not change
CELERY_APP="diskover_celery.worker"
# Path to the Diskover installation
INSTALL_DIRECTORY="/opt/diskover"
# Log and PID paths (use %n for node name substitution)
CELERYD_PID_FILE='${INSTALL_DIRECTORY}/diskover_celery/run/%n.pid'
CELERYD_LOG_FILE='${INSTALL_DIRECTORY}/diskover_celery/log/%n%I.log'
CELERYD_LOG_LEVEL="INFO"
# RabbitMQ connection (for reference — actual broker config is in Diskover Admin)
RABBIT_HOST="localhost"
RABBIT_USER="diskover"
RABBIT_PASS="darkdata"
Note: After editing
/etc/celery.conf, restart the celery service for changes to take effect.
Diskover Admin — Task Queue Settings
The broker URL and worker behaviour are also configurable in Diskover Admin > Configuration > System > Message Queue:
Setting | Default | Description |
|---|---|---|
Host |
| RabbitMQ hostname |
User |
| RabbitMQ username |
Password |
| RabbitMQ password |
Use SSL | Off | Enable TLS for broker connection |
Worker Prefetch Multiplier | 1 | Tasks fetched per worker at a time (1 = one at a time) |
Tasks Acks Late | On | ACK tasks after completion, not on receipt (safer for long tasks) |
Cancel Long Running Tasks on Connection Loss | On | Terminate in-progress tasks if broker disconnects |
Queue and Exchange Routing
Diskover uses two direct exchanges, each with per-hostname queues:
Exchange | Queue pattern | Used for |
|---|---|---|
|
| Crawl/indexing tasks (priority 0–10) |
|
| File operation tasks |
Tasks are routed to the specific worker that scanned the files — this is determined by the session worker selection in the Diskover web UI.
Log Locations
Log | Path |
|---|---|
Worker log |
|
Worker PID file |
|
# Tail the worker log (replace worker01 with your node name) tail -f /opt/diskover/diskover_celery/log/worker01.log
Log level is set by CELERYD_LOG_LEVEL in /etc/celery.conf. Use DEBUG to see individual task dispatch and result events.
Common Operations
Check Which Workers Are Active
From the RabbitMQ perspective:
sudo rabbitmqctl list_consumers
Using Celery's inspect command (run from /opt/diskover):
/opt/python-venv-diskover/bin/celery -A diskover_celery.worker inspect active
Check Registered Queues on a Worker
/opt/python-venv-diskover/bin/celery -A diskover_celery.worker inspect active_queues
Check Queue Depth
sudo rabbitmqctl list_queues name messages consumers
A non-zero messages count with 0 consumers means the worker is down for that hostname.
Manually Restart a Single Worker Node
# Restart using celery multi directly (reads /etc/celery.conf)
source /etc/celery.conf
$CELERY_BIN -A $CELERY_APP multi restart $CELERYD_NODES \
--pidfile=$CELERYD_PID_FILE \
--logfile=$CELERYD_LOG_FILE \
--loglevel=$CELERYD_LOG_LEVEL \
$CELERYD_OPTS
Or simply use systemd:
sudo systemctl restart celery
Troubleshooting
Worker Not Picking Up Tasks
Confirm the worker is running and consuming the right queues:
sudo systemctl status celery sudo rabbitmqctl list_consumers
Check the worker registered the correct hostname-based queues. The queue name must match the hostname of the machine:
hostname # Compare with queue names shown in: sudo rabbitmqctl list_queues name consumers
If the worker registered under a different hostname than expected, set the node name explicitly in /etc/celery.conf:
CELERYD_NODES="expected-hostname"
Worker Starts Then Immediately Stops
The worker failed to connect to either RabbitMQ or the Diskover Admin API on startup.
# Check last 100 lines of the worker log tail -100 /opt/diskover/diskover_celery/log/worker01.log
Common causes:
Broker unreachable — verify RabbitMQ is running and port 5672 is accessible from this machine. Verify the correct RabbitMQ host is correct inside Diskover Admin > Configuration > System > Message Queue > Host
Admin API unreachable — the worker fetches its configuration from the Diskover Admin API on startup; if that fails, it exits. Verify
diskover-adminis running and the API URL is correct in/root/.config/diskoverd/config.yamlLicense failure — the license check runs on startup; an expired or missing license will cause the worker to exit
Tasks Stuck in PENDING State
Tasks are submitted but never transition to STARTED or SUCCESS.
Check there is a live consumer for the target queue:
sudo rabbitmqctl list_consumers
Check the worker is not paused:
/opt/python-venv-diskover/bin/celery -A diskover_celery.worker inspect active
Check task result backend — results are stored in Elasticsearch. If ES is unreachable, results cannot be written and tasks may appear stuck:
curl -s http://localhost:9200/_cluster/health
Stale PID File Prevents Worker from Starting
If the worker crashed without cleaning up, the PID file may block restart:
# Check for stale PID files ls /opt/diskover/diskover_celery/run/ # Remove stale PID file (replace worker01 with your node name) rm /opt/diskover/diskover_celery/run/worker01.pid # Then restart sudo systemctl restart celery
Task Failed — Reading the Error
Task errors are stored in Elasticsearch. They are also visible in the Flower UI (if running) under the Tasks tab. In the Diskover web UI, failed file actions will show an error message returned by the task.
To inspect the raw result from the Elasticsearch backend:
curl -s "http://localhost:9200/celery/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{"query": {"term": {"status": "FAILURE"}}, "size": 10, "sort": [{"date_done": "desc"}]}'
Comments
0 comments
Please sign in to leave a comment.