SpectraLogic / RioBroker
License: PRO+ (Professional Edition or higher)
Module Type: Alternate Scanner
Author: Diskover Data, Inc.
Overview
The Diskover SpectraLogic / RioBroker Scanner indexes object storage managed by Spectra Logic's RioBroker middleware directly into Diskover. If your organization relies on Spectra Logic BlackPearl systems for tape-based archival or tiered object storage, this scanner gives you full visibility into that data — searchable, analyzable, and manageable through the Diskover Web UI, just like any other indexed storage.
The scanner connects to the RioBroker REST API, enumerates your buckets, and converts the object hierarchy into standard Diskover documents in Elasticsearch. Once indexed, your archived data appears alongside all your other storage in Diskover's file tree, search, and analytics views.
Why Use This Scanner?
Archive and Tape Storage Indexing — See everything stored across your BlackPearl tape libraries without needing to access tape systems directly. Search, tag, and analyze archived data as easily as active storage.
Storage Capacity Analysis — Understand how storage is distributed across RioBroker-managed buckets. Identify large files, analyze growth trends, and plan capacity with real data.
Multi-Tier Storage Management — Combine this scanner with other Diskover scanners to create a unified view of data across active, nearline, and deep archive tiers. One search interface for all your storage.
Data Migration Planning — Build a complete inventory of objects before migrating from BlackPearl to another platform. Understand file types, sizes, and distribution patterns to plan effectively.
Compliance and Data Governance — Maintain searchable indexes of compliance-related files in archival storage. Respond to audits and governance requests efficiently using Diskover's tagging and search capabilities.
- Sample metadata from a Spectra / Rio Broker scan:
With each file / folder in the Spectra scan we get these additional metadata fields if they exists on the Rio Broker API for that object!
Understanding Spectra Logic RioBroker
Spectra Logic's RioBroker is a middleware layer that provides RESTful API access to BlackPearl object storage systems. It sits between your applications and the underlying BlackPearl hardware (tape libraries, disk storage), abstracting the complexity of managing data across different storage tiers.
Here are a few key concepts that will help you configure the scanner:
Key Concepts
Concept | What It Means |
|---|---|
RioBroker | The REST API middleware service that manages access to BlackPearl storage. The scanner communicates exclusively with this API. |
Broker | A logical grouping in RioBroker that maps to a storage configuration. Each broker is associated with an agent and a bucket. |
Agent | A RioBroker component that defines how data flows to and from a bucket. The scanner expects a 1:1 relationship between brokers and agents. |
Bucket | A storage container on the BlackPearl system. This is the unit you scan — each scan targets one bucket. |
Virtual Path | The path where scanned bucket contents appear in Diskover's file tree. You choose this when running a scan, so multiple buckets can be organized under a common root. |
How the Scanner Works
When you run a scan, the scanner:
Integrates as an Alternate Scanner with the core Diskover scanner
Authenticates with the RioBroker API and obtains a JWT token
Enumerates objects and directories within your specified bucket
Converts each object into a Diskover-compatible document with standard file metadata and Spectra-specific fields
Hands documents to Diskover's core indexing pipeline for Elasticsearch upload
Automatically refreshes the authentication token during long-running scans
The result is a fully searchable Diskover index that reflects the contents of your BlackPearl storage, complete with file sizes, directory structures, custom metadata fields, and automatic tagging.
Requirements
System Requirements
Component | Requirement |
|---|---|
Python | 3.11 or higher |
Operating System | Linux (recommended) or Windows |
Diskover | Core installation with alternate scanner support |
Network | HTTPS access to the RioBroker API endpoint (default port 5050) |
External Service Requirements
Service | Requirement |
|---|---|
Spectra RioBroker | Version 4.0.1 or higher |
RioBroker Credentials | Valid username and password with API access |
Python Dependencies
Package | Version | Purpose |
|---|---|---|
| Latest | Date parsing and manipulation |
| 1.0.0 | Async-compatible rate limiting |
| Latest | YAML configuration parsing |
| Latest | Async HTTP client for RioBroker API communication |
| Latest | High-performance event loop ( |
Installation
Step 1: Install the Scanner Package
Linux (RPM):
dnf install diskover-scanner-spectra
Windows:
The scanner files are included with the Diskover Windows installation. No separate installation step is required.
Install locations:
Platform | Path |
|---|---|
Linux |
|
Windows |
|
Step 2: Install Python Dependencies
Navigate to the scanner directory and install the required packages:
Linux:
cd /opt/diskover/scanners/scandir_spectra python3 -m pip install -r etc/requirements.txt
Windows:
cd "C:\Program Files\Diskover\scanners\scandir_spectra" python -m pip install -r etc\requirements.txt
Note: On Linux, the
uvlooppackage is installed automatically for improved async performance. On Windows, the scanner uses a native high-resolution event loop instead — no additional steps are needed.
Step 3: Verify RioBroker Connectivity
Before running a scan, confirm that the scanner can reach the RioBroker API. Test the connection with a direct API call:
curl -k -X 'POST' \
'https://YOUR_RIOBROKER_HOST:5050/api/tokens' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"username": "spectra",
"password": "spectra"
}'
A successful response returns a JWT token, confirming API access. If this fails, check your network connectivity, firewall rules (port 5050 must be open for HTTPS), and credentials.
Step 4: Test the Scanner Connection
The scanner includes a test utility that authenticates with RioBroker and lists all available brokers and buckets:
Linux:
cd /opt/diskover/scanners/scandir_spectra python3 test.py
Windows:
cd "C:\Program Files\Diskover\scanners\scandir_spectra" python test.py
Important: The
test.pyscript uses hardcoded credentials. Before running it, edit therio_broker_user,rio_broker_password,rio_broker_api_domain, andrio_broker_api_portvariables at the top oftest.pyto match your environment.
Expected output shows each broker and its associated bucket:
2024-01-15 10:30:45 - spectra_scanner - INFO - Broker: {'name': 'broker1', ...} Bucket: archive-bucket
2024-01-15 10:30:46 - spectra_scanner - INFO - Broker: {'name': 'broker2', ...} Bucket: backup-bucket
Configuration
The Spectra scanner is configured through the Diskover Admin UI. Navigate to Diskover > Alternate Scanners > Spectra to access the configuration panel.
Configuration via Diskover Admin
Navigate to Diskover Admin > Diskover > Alternate Scanners > Spectra
Enter your RioBroker connection details (hostname/IP, port, credentials)
Adjust performance tuning parameters as needed
Save the configuration
Sample Configuration in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the Spectra Scanner - covered in detail below!
Configuration Parameters
Parameter | Type | Default | Description |
|---|---|---|---|
| string |
| Username for RioBroker API authentication |
| string |
| Password for RioBroker API authentication |
| string |
| Hostname or IP address of the RioBroker server |
| integer |
| TCP port for the RioBroker HTTPS API |
| integer |
| Number of items to request per API page (pagination size) |
| integer |
| Seconds between automatic token refreshes (default is 50 minutes) |
| integer |
| Maximum concurrent requests to the RioBroker API |
| integer |
| Maximum API requests per minute (rate limiting) |
| integer |
| Number of async workers for directory traversal |
| integer |
| Directory depth threshold for logging. Set to |
Warning: Increasing
concurrent_request_limitandrequest_limit_per_minutebeyond defaults may destabilize the RioBroker server. Test changes incrementally in non-production environments and monitor RioBroker server health.
Configuration Examples
Example 1: Standard Configuration
A typical production setup connecting to a dedicated RioBroker server:
Parameter | Value |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example 2: Conservative / Shared-Server Configuration
When the RioBroker server is shared with other applications or under heavy load, use lower concurrency settings:
Parameter | Value |
|---|---|
|
|
|
|
|
|
|
|
Usage
The Spectra scanner integrates with Diskover using the standard --altscanner flag. It replaces Diskover's default filesystem walker with one that reads from RioBroker-managed buckets.
Command Syntax
Linux:
cd /opt/diskover python3 diskover.py --altscanner scandir_spectra <bucket_name>/<path>
Windows:
cd "C:\Program Files\Diskover" python diskover.py --altscanner scandir_spectra <bucket_name>/<path>
Path Format Reference
The scan path uses the format <bucket_name>/<optional_subpath>. The bucket name is the first component of the path, and the scanner automatically maps it into Diskover's file tree.
Path Format | Description | Example |
|---|---|---|
| Scan the entire bucket |
|
| Scan a specific subdirectory within the bucket |
|
Usage Examples
Basic Single Bucket Scan
Index an entire bucket with default settings:
Linux:
cd /opt/diskover python3 diskover.py --altscanner scandir_spectra archive-bucket
Windows:
cd "C:\Program Files\Diskover" python diskover.py --altscanner scandir_spectra archive-bucket
Scan a Specific Subdirectory
Index only a specific path within a bucket:
python3 diskover.py --altscanner scandir_spectra archive-bucket/2024/reports
Custom Index Name
Specify a custom Elasticsearch index name:
python3 diskover.py --altscanner scandir_spectra -i diskover-spectra-archive archive-bucket
Verbose / Debug Logging
Enable verbose logging for troubleshooting:
python3 diskover.py --altscanner scandir_spectra --loglevel DEBUG archive-bucket
Example output from a Spectra CLI Scan:
python3 /opt/diskover/diskover.py --altscanner spectra -i diskover-spectra-graphics-archive graphics diskover - INFO - Using alternate scanner spectra scanners.scandir_spectra.scandir_spectra - INFO - Initializing Spectra scanner scanners.scandir_spectra.scandir_spectra - INFO - Checking Spectra path: bucket=graphics, path=/ scanners.scandir_spectra.scandir_spectra - INFO - Scanning Spectra directory: graphics scanners.scandir_spectra.spectra_scanner.spectra_rio_broker.client - INFO - Graph request limit per minute: 6000, concurrency limit: 10 scanners.scandir_spectra.spectra_scanner.spectra - INFO - Initializing Spectra client with 8 workers diskover - INFO - configuration Default diskover - INFO - Creating index diskover-spectra-graphics-archive... diskover_elasticsearch - INFO - Tuning index settings for crawl diskover - INFO - No plugins loaded diskover - INFO - maxwalkthreads set to 8 diskover - INFO - maxthreads set to 8 diskover - INFO - maxthreaddepth set to 999 diskover - INFO - indexthreads set to 16 diskover - INFO - Enqueuing dir tree graphics scanners.scandir_spectra.scandir_spectra - INFO - Scanning Spectra directory: graphics scanners.scandir_spectra.scandir_spectra - INFO - Scanning Spectra directory: graphics/GFX_Main diskover - INFO - 1 paths still scanning ['graphics/GFX_Main'], memory usage 98.36 MB)
Integration with Index Tasks
The Spectra scanner can be configured as part of a Diskover Index Task for scheduled or automated scanning.
Field | Value |
|---|---|
Alternate Scanner |
|
When configuring an Index Task in the Diskover Admin UI, set the Alternate Scanner field to scandir_spectra and specify the bucket path as the scan target.
Example of a Diskover Index Task Scan:
Performance Tips
Start conservative — Begin with the default settings and increase concurrency only after confirming the RioBroker server handles the load well.
Tune
walk_workers— This controls how fast the scanner traverses directories. Increase for large, deep directory trees; decrease if the RioBroker server is under load.Adjust
concurrent_request_limitcarefully — Higher concurrency speeds up scanning but can destabilize the RioBroker server. Monitor server health when increasing this value.Use
log_below_depth— Set this to a low number in production to reduce log noise. Set to999during troubleshooting to see all directory traversal activity.
Metadata Fields / Elasticsearch Mappings
The Spectra scanner adds several custom metadata fields to each indexed document beyond standard Diskover file metadata. These fields are automatically mapped in Elasticsearch when the scanner runs.
Field Mappings
Field Path | ES Type | Description |
|---|---|---|
| keyword | Name of the RioBroker bucket containing the file |
| keyword | Owner of the file on the Spectra storage system |
| keyword | Group associated with the file on the Spectra storage system |
| integer | POSIX-style file permission mode |
| date | Original creation date of the file in the Spectra system |
| — | Virtual path of the file within the bucket (added via metadata, not mapped separately) |
Elasticsearch Mapping Definition
{
"mappings": {
"properties": {
"bucket": {
"type": "keyword"
},
"spectra_owner": {
"type": "keyword"
},
"spectra_group": {
"type": "keyword"
},
"spectra_mode": {
"type": "integer"
},
"creation_date": {
"type": "date"
}
}
}
}
Automatic Tagging
The scanner automatically tags every indexed file with:
bucket:<bucket_name>— Identifies which RioBroker bucket the file belongs tostorage:spectra— Identifies the file as originating from Spectra storage
These tags appear in the Diskover Web UI and can be used in searches and filters.
Searching in Diskover
Once a scan completes, the Spectra-specific metadata fields and tags are available for searching in the Diskover Web UI. Here are some common search examples.
Search Query Examples
Query | Description |
|---|---|
| Find all files in a specific bucket |
| Find all files indexed from Spectra storage |
| Find files tagged with a specific bucket name |
| Find PDF files in a specific bucket |
| Find files larger than 1 GB across all Spectra buckets |
| Find files owned by a specific user on the Spectra system |
| Find files created in a specific date range on the Spectra system |
Combining with Standard Diskover Search
Spectra metadata fields work alongside all standard Diskover search capabilities. You can combine them with path filters, file type filters, size ranges, and tags to build powerful queries across your entire indexed storage landscape.
Here we can see some sample metadata in Diskover from an object scanned by the Spectra scanner:
Troubleshooting
Common Issues
Issue | Cause | Solution |
|---|---|---|
Connection refused or timeout on startup | Scanner cannot reach the RioBroker API | Verify |
Authentication failed (401/403) | Invalid credentials or expired account | Verify username and password in the Diskover Admin configuration. Test credentials manually with |
"Each broker should have exactly one agent" error | Broker-agent relationship doesn't match the scanner's expectation | The scanner expects a 1:1 broker-to-agent mapping. Use |
Scan slows dramatically or RioBroker becomes unresponsive | Concurrency settings are too aggressive | Reduce |
Token refresh failures during long scans | Token expires before refresh interval | Reduce |
Incorrect data in Diskover UI after scanning | Stale browser cache | Clear browser cookies and cache, or open Diskover in an incognito/private window. Hard refresh with |
Debug Logging
Enable debug logging using the standard Diskover log level flag:
python3 /opt/diskover/diskover.py --altscanner scandir_spectra --loglevel DEBUG archive-bucket
Log File Locations:
Linux:
/var/log/diskover/diskover.logWindows: Check Diskover service logs or configured log location
Verifying Connectivity
Test RioBroker API access:
curl -k -X POST 'https://RIOBROKER_HOST:5050/api/tokens' \
-H 'Content-Type: application/json' \
-d '{"username": "YOUR_USER", "password": "YOUR_PASS"}'
Test scanner connectivity with the built-in test utility:
cd /opt/diskover/scanners/scandir_spectra python3 test.py
Support
Last Updated: April 2026
Diskover Data, Inc.
Comments
0 comments
Please sign in to leave a comment.