Horizontal Scaling — Task Workers
As your Diskover deployment grows — more data to scan, more file actions being submitted through the Web UI, more users running concurrent tasks — a single worker host can become a bottleneck. This guide walks through how to horizontally scale your Task Workers by adding additional worker hosts to an existing Ansible-managed deployment.
By the end of this guide, you'll have multiple worker hosts running Diskoverd and Celery in parallel, all pulling tasks from the same Diskover Web API and writing to the same Elasticsearch cluster.
What You'll Learn
What horizontal worker scaling means and when to do it
The architecture of a multi-worker Diskover deployment
How to prepare a new worker host
How to add the new host to your Ansible inventory
How to install Diskoverd and Celery on just the new worker using
--limitHow to verify the new worker joined the deployment
How to remove a worker when scaling back down
What Is Horizontal Scaling?
Horizontal scaling means adding more machines to handle workload, rather than making a single machine more powerful (that's vertical scaling). For Diskover, each worker host runs two services:
Diskoverd — handles filesystem scanning and indexing. Each scanner task runs on a worker.
Celery — handles file actions (Live View, Rclone, Export, Download, etc) submitted through the Diskover Web UI.
The two services get their work from different places:
Diskoverd pulls scan tasks from the Diskover Web API on the web host. Each scan task is picked up by a single worker and runs entirely on that worker — Diskover does not split a single scan across multiple workers. However, with multiple workers available, different scan tasks can run on different workers at the same time, so your overall scan throughput increases as you add workers.
Celery consumes file action tasks (Live View, Rclone, Export, Download, etc.) from RabbitMQ. File actions are routed by RabbitMQ to the specific Celery worker that originally scanned the index the action is operating on — so if worker-1 scanned an index for
/data1, file actions against that index will be handled by worker-1's Celery.
When you add a second (or third, or fourth) worker host, each new worker registers with the Diskover Web API for scan work and subscribes to RabbitMQ for file action work. As long as each worker can reach the Web API, and Elasticsearch, it will be available to pick up tasks.
Why Horizontally Scale?
Reason | What It Helps With |
|---|---|
More concurrent scan tasks | A single scan task always runs on one worker, but multiple workers means multiple scan tasks can run at the same time (e.g. one worker scanning |
File action queue backlog | File actions are routed back to the worker that scanned the index. Spreading your scans across multiple workers means file actions against different indexes are handled by different workers in parallel |
Workload isolation | Dedicate specific workers to scanning and others to file actions, or to different storage systems |
High availability | If one worker host goes offline for maintenance, others continue processing work |
Geographic distribution | Place workers close to the storage they scan (e.g. one worker per data center or NAS) to reduce network traversal |
When NOT to Scale Horizontally
Not every performance issue is solved by adding workers. Consider these alternatives first:
Elasticsearch is the bottleneck — If scans are slow because ES can't keep up with indexing, add ES nodes or increase
es_heap_sizeon existing ones. More workers will just create more pressure on an already-overloaded cluster.Network bandwidth to storage is saturated — Adding workers won't help if your NAS or filer is already maxed out on read throughput. Check your storage metrics first.
Architecture — Multi-Worker Deployment
In a typical multi-worker deployment, all workers write indexed data to the same Elasticsearch cluster. Diskoverd on each worker pulls scan tasks from the Diskover Web API, and Celery on each worker subscribes to RabbitMQ for file action tasks.
Key points:
Diskoverd on each worker polls the Diskover Web API for scan tasks. Each scan task runs start-to-finish on one worker.
Celery on each worker subscribes to RabbitMQ and consumes file action tasks (Live View, Rclone, Export, Download, etc.).
Every worker writes to the same Elasticsearch cluster so all indexed data lives in one place regardless of which worker did the scan.
Workers do not communicate with each other directly — all coordination happens through the Diskover Web API (for scans) and RabbitMQ (for file actions).
Prerequisites
Before adding a new worker, verify the following:
Existing Diskover deployment is healthy — all services running, Web UI accessible, existing workers processing tasks
New worker host is provisioned with a supported OS (RHEL/Rocky 8, 9, or 10; CentOS 10; Ubuntu/Debian 22 or 24)
SSH access from the Ansible control machine to the new worker host (same user, same auth method as your existing workers)
Network connectivity from the new worker to: Diskover Web API on the web host (port 8000 by default, or 443 when SSL is enabled), RabbitMQ host on port 5672 (AMQP), and Elasticsearch host(s) on port 9200 (HTTP/HTTPS)
Firewall rules open for the above ports
DNS or
/etc/hostsconfigured so the new worker can resolve RabbitMQ and Elasticsearch hostnames (if using names instead of IPs)Hardware sized appropriately — workers should have enough CPU cores and memory to handle your scan concurrency settings. See the Ansible Overview guide for sizing recommendations.
Step 1: Prepare the New Worker Host
On the new worker host, perform basic OS setup. Most of this is handled automatically by the playbook, but it's worth verifying before you start:
Confirm the OS version:
cat /etc/os-release
Verify SSH access from your Ansible control machine:
ssh diskover@10.0.1.32
Verify sudo works:
sudo whoami
Verify network connectivity to the Diskover Web API, RabbitMQ, and Elasticsearch from the new worker:
nc -zv WEB_HOST 8000 nc -zv RABBITMQ_HOST 5672 nc -zv ELASTICSEARCH_HOST 9200
If any of these fail, resolve the issue before running the playbook.
Step 2: Add the New Worker to Your Inventory
Edit inventory/hosts.yml on your Ansible control machine and add the new worker under the existing worker host group.
Before (single worker):
all:
vars:
ansible_connection: ssh
ansible_user: diskover
ansible_ssh_pass: "changeme"
become: true
ansible_become_pass: "changeme"
children:
diskover:
children:
web:
hosts:
10.0.1.10:
hostname: diskover-web.example.com
rabbitmq:
hosts:
10.0.1.20:
hostname: diskover-mq.example.com
worker:
hosts:
10.0.1.30:
hostname: diskover-worker-1.example.com
elasticsearch:
hosts:
10.0.1.40:
hostname: diskover-es-1.example.com
After (added second worker at 10.0.1.31):
worker:
hosts:
10.0.1.30:
hostname: diskover-worker-1.example.com
10.0.1.31:
hostname: diskover-worker-2.example.com
That's the only change needed in the inventory. No changes are needed in all.yml — the existing RabbitMQ and Elasticsearch connection variables will be reused for the new worker.
Reminder: The
hostnamefield is only used by the Elasticsearch role fornode.nameandcluster.initial_master_nodes. For worker hosts, it's optional metadata and does not change the system hostname. See the Inventory Guide for more detail.
Step 3: Run the Playbook with --limit
You don't need to re-run the full playbook against every host. Instead, use the --limit flag to target only the new worker host. This is faster, safer, and avoids any possibility of disturbing your existing deployment.
Run the playbook against the new worker only:
time ansible-playbook -i inventory/hosts.yml install_diskover.yml --limit 10.0.1.31
This will:
Connect to only the new worker host
Configure the package repository (JFrog or offline)
Disable firewalld and SELinux
Install Python3.x and dependencies
Install Diskoverd (scanner daemon)
Install Celery (file actions task queue)
Install Diskover tools (scanners, plugins, file actions)
Start the services
The web, elasticsearch, and rabbitmq host groups are skipped entirely because they're not in the --limit set.
Verifying the PLAY RECAP
When the playbook finishes, the PLAY RECAP should show only your new worker host with failed=0:
PLAY RECAP ********************************************************************* 10.0.1.31 : ok=45 changed=30 unreachable=0 failed=0
If you see any failures, consult the Troubleshooting guide before proceeding.
Step 4: Start the Services and Verify the New Worker
Once the playbook completes, the Diskoverd and Celery packages are installed but the services are not started automatically by Ansible. You need to start them manually on the new worker before it can pick up tasks.
1. SSH into the new worker:
ssh diskover@10.0.1.31
2. Start Diskoverd:
systemctl start diskoverd
Watch the startup logs to confirm it started cleanly and registered with the Diskover Web API:
journalctl -fu diskoverd
You should see the diskoverd banner and the version number, with no connection errors. Press Ctrl+C to stop following the log.
Verify the service status:
systemctl status diskoverd
You should see active (running).
3. Start Celery:
systemctl start celery
Verify the service status:
systemctl status celery
You should see active (running).
Tail the Celery log to confirm it connected to RabbitMQ and is ready to accept file action tasks:
tail -f /opt/diskover/diskover_celery/log/celery.log
4. Check the Diskover Web UI:
Open Diskover Web in your browser and navigate to the Task Panel > Workers view. Your new worker should appear in the list alongside the existing ones and show as online.
5. Submit a test task:
From the Diskover Web UI, kick off a small scan or file action. Over subsequent tasks, you should see the new worker picking up work — scan tasks are assigned through Diskover Web UI.
Note: Both Diskoverd and Celery are configured to start automatically on boot (via systemd), so after this initial start you won't need to start them again unless the host is rebooted and they fail to come up — or you stopped them manually.
Scaling to More Than Two Workers
The process is identical for adding a third, fourth, or Nth worker. Repeat Steps 1-4 for each new host:
worker:
hosts:
10.0.1.30:
hostname: diskover-worker-1.example.com
10.0.1.31:
hostname: diskover-worker-2.example.com
10.0.1.32:
hostname: diskover-worker-3.example.com
10.0.1.33:
hostname: diskover-worker-4.example.com
And run the playbook with --limit targeting all new hosts at once (comma-separated):
time ansible-playbook -i inventory/hosts.yml install_diskover.yml --limit 10.0.1.32,10.0.1.33
Or use the group-level limit to target all workers (including existing ones — which will be a no-op if they're already up to date):
time ansible-playbook -i inventory/hosts.yml install_diskover.yml --limit worker
Task Distribution and Worker Load
Diskover uses two different mechanisms to distribute work across workers:
Scan tasks (Diskoverd to Diskover Web API):
Each Diskoverd instance registers with the Diskover Web API and polls for scan tasks assigned to it.
A single scan task always runs on one worker end-to-end — it is not split across multiple workers.
You can see and manage per-worker scan assignments in Diskover Web under the Task Panel view.
File action tasks (Celery to RabbitMQ):
File actions are routed to the specific Celery worker that originally scanned the index the action is operating on. If worker-1 scanned an index for
/data1, file actions against that index will be handled by worker-1's Celery — not a random worker in the pool.This means file action load distribution is a direct consequence of how you distribute scans. If most of your scans run on one worker, most of the file actions against those indexes will also land on that worker.
General guidance:
Concurrency settings live in Diskover Admin — Each worker's concurrency is controlled in Diskover Admin at Diskoverd > worker-name > Work Threads. If a worker is actively running tasks while other tasks remain in a "waiting" state, the worker has likely hit its Work Threads limit. Increase the Work Threads value, then restart the Diskover Task Worker service on that host for the change to take effect.
No external load balancer is needed — Workers coordinate with the Diskover Web API for scan tasks, and Celery on each worker consumes file action tasks from RabbitMQ. Neither path requires a separate load balancer in front of it.
Removing a Worker (Scaling Down)
If you need to remove a worker — decommissioning a host, rightsizing, or moving work to a different machine — follow these steps:
1. Stop the services on the worker:
ssh diskover@10.0.1.33 systemctl stop diskoverd systemctl stop celery
2. Disable the services so they don't come back on reboot:
systemctl disable diskoverd systemctl disable celery
3. Confirm the worker has drained — in Diskover Admin, verify the worker no longer shows as online and any in-flight tasks have completed or been reassigned.
4. Remove the host from your Ansible inventory:
Edit inventory/hosts.yml and remove the entry for that host:
worker:
hosts:
10.0.1.30:
hostname: diskover-worker-1.example.com
10.0.1.31:
hostname: diskover-worker-2.example.com
# Removed 10.0.1.33
5. (Optional) Clean up the host — if you're repurposing the machine, you can uninstall Diskover packages manually. The playbook does not include an uninstall routine, so these steps need to be run directly on the worker host.
RHEL/Rocky/CentOS:
dnf remove 'diskover*' -y rm -rf /opt/diskover rm -f /root/.config/diskoverd/config.yaml
Debian/Ubuntu:
apt-get remove --purge 'diskover*' -y apt-get autoremove -y rm -rf /opt/diskover rm -f /root/.config/diskoverd/config.yaml
Note: These steps remove Diskover components only. Python3.x, supporting system packages (e.g. from
common/python-pip), and OS-level changes like disabled firewalld/SELinux are left in place. If you need a completely clean host, a fresh OS reinstall is the safest option.
Important: Always stop and drain a worker before removing it from the inventory. Yanking a host while it has in-flight tasks can cause those tasks to fail or not get picked up at all.
Upgrading Workers
When you upgrade your Diskover deployment, all worker hosts are upgraded together by default (they all receive the new version of Diskoverd, Celery, and the Diskover tools). See the Running Playbooks — Upgrades guide for the full procedure.
If you want to stage an upgrade (upgrade one worker first, verify, then roll out to the rest), use --limit to target specific workers:
time ansible-playbook -i inventory/hosts.yml install_diskover.yml --limit 10.0.1.31
Verify the upgraded worker is processing tasks correctly before continuing. Then upgrade the rest:
time ansible-playbook -i inventory/hosts.yml install_diskover.yml --limit 10.0.1.30,10.0.1.32
Troubleshooting
New worker not showing up in Diskover Web
Check 1: The worker can reach the Diskover Web API, RabbitMQ, and Elasticsearch.
ssh diskover@NEW_WORKER_IP nc -zv WEB_HOST 8000 nc -zv RABBITMQ_HOST 5672 nc -zv ELASTICSEARCH_HOST 9200
Check 2: Diskoverd and Celery services are running.
systemctl status diskoverd systemctl status celery
Check 3: The Diskoverd config file points to the correct Diskover Web API host.
cat /root/.config/diskoverd/config.yaml
Check 4: There are no connection errors in the service logs.
journalctl -fu diskoverd
Playbook fails when running against the new worker
Consult the Troubleshooting guide. Common issues:
SSH authentication differences between the new host and existing hosts
Firewall blocking the new worker's outbound traffic to the Diskover Web API, RabbitMQ, or Elasticsearch
Package repository access differences (e.g. new host is in a network segment without JFrog access)
Support
If you run into issues that aren't covered here or in the Troubleshooting guide:
Support portal: https://support.diskoverdata.com
Knowledge base: https://support.diskoverdata.com/hc/en-us
When submitting a ticket for a scaling issue, include:
The
ansible.logfile from the playbook runThe PLAY RECAP output
The
inventory/hosts.ymlfile (with credentials redacted)The output of
systemctl status diskoverdandsystemctl status celeryfrom the affected workerService logs from the new worker (
systemctl start diskoverd ; journalctl -u diskoverd)
Comments
0 comments
Please sign in to leave a comment.