License: PRO+ (Professional Edition or higher)
Module Type: Alternate Scanner
Author: Diskover Data, Inc.
Overview / Use Cases
The Offline Media Scanner lets you build a persistent, searchable catalog of removable and offline storage — tapes, external drives, removable disks — without needing every volume physically mounted at the same time. Each media volume is scanned and indexed under a virtual root directory, so your team can search across hundreds or thousands of offline volumes right from the Diskover Web UI.
Think of it as a card catalog for your physical media library. Mount a tape, scan it, swap in the next one, scan again. Diskover stitches together a unified view of everything under a single virtual path like /OFFLINE_MEDIA, with each volume neatly filed under its own label.
Who Benefits and How
Media & Entertainment Archivists — Search across thousands of archived tapes to locate project files, raw footage, or audio assets without pulling tapes from the vault. The virtual path tells you exactly which tape to retrieve.
IT Administrators & Backup Operators — Catalog collections of external USB or eSATA backup drives by serial number or asset tag. Quickly answer questions like "which drive has last quarter's database exports?" without plugging in every drive.
Compliance & Disaster Recovery Teams — Verify that critical files exist somewhere in your offline archive set. Run gap analysis across all backup volumes from a single search, supporting audit readiness and recovery planning.
Requirements
System Requirements
Component | Requirement |
|---|---|
Python | 3.9 or higher |
Diskover | Core installation with alternate scanner support |
Storage | Sufficient disk space for the SQLite cache database (typically 1–5 MB per million files indexed) |
Media Access | A mount point (Linux/macOS) or drive letter (Windows) for the target offline media |
Python Dependencies
All required packages are included with the standard Diskover installation. No additional Python dependencies are needed.
Installation
Step 1: Install Scanner Package
Linux:
dnf install diskover-scanner-offline_media
Windows:
The scanner files are included with the Diskover Windows installation. No separate installation step is required.
Install locations:
Linux:
/opt/diskover/scanners/scandir_offline_media/Windows:
C:\Program Files\Diskover\scanners\scandir_offline_media\
Step 2: Create Cache Directory
The scanner stores directory listing data in a SQLite cache database. You need to create the cache directory before the first scan.
Linux:
sudo mkdir -p /opt/diskover/__dircache_offline_media__ sudo chown diskover:diskover /opt/diskover/__dircache_offline_media__ sudo chmod 750 /opt/diskover/__dircache_offline_media__
Windows (Command Prompt as Administrator):
mkdir "C:\Program Files\Diskover\__dircache_offline_media__"
Step 3: Verify Installation
Confirm the scanner module loads correctly:
Linux:
cd /opt/diskover
python3 -c "from scanners.scandir_offline_media import scandir_offline_media; print('Scanner module loaded successfully')"
Windows:
cd "C:\Program Files\Diskover"
python -c "from scanners.scandir_offline_media import scandir_offline_media; print('Scanner module loaded successfully')"
Step 4: Perform a Test Scan
Run a quick test scan on a mounted media volume to confirm everything works end-to-end:
Linux:
cd /opt/diskover export MEDIA_LABEL=TEST_MEDIA_001 python3 diskover.py -i diskover-offline-test --altscanner scandir_offline_media /mnt/test_media
Windows:
cd "C:\Program Files\Diskover" set MEDIA_LABEL=TEST_MEDIA_001 python diskover.py -i diskover-offline-test --altscanner scandir_offline_media E:\
If the scan completes without errors and you can see results in Diskover's search UI, the installation is working correctly.
Configuration
Configuration is managed through the Diskover Admin UI under Settings > Alternate Scanners > Offline Media.
Configuration Parameters
Parameter | Type | Default | Description |
|---|---|---|---|
| bool |
| Enable verbose logging of cache hits and misses. Useful for debugging and monitoring cache effectiveness. Logs are written to the |
| string |
| The virtual root directory path under which all media volumes are organized. Each indexed volume appears as |
| string |
| Directory for the SQLite cache database. Can be an absolute path or relative to the |
| integer |
| Cache entry expiration time in seconds. Set to |
| bool |
| Load the entire SQLite database into memory at startup for faster lookups. Warning: Can cause database corruption if the scan crashes or is interrupted. Recommended to leave as |
Environment Variable Requirements
The scanner requires one mandatory environment variable set before each scan:
Environment Variable | Required | Description |
|---|---|---|
| Yes | A unique identifier for the media volume being scanned. This becomes the subdirectory name under |
Environment Variable Overrides
You can override certain configuration parameters at runtime using environment variables:
Environment Variable | Overrides Parameter | Description |
|---|---|---|
|
| Override the virtual root directory path without changing the saved configuration. |
Configuration via Diskover Admin
Navigate to Settings > Alternate Scanners > Offline Media
Adjust the settings as needed:
Verbose — Enable for troubleshooting cache behavior
Root Path — Set the virtual root directory (default
/OFFLINE_MEDIA)Cache Directory — Specify a custom cache location if needed
Directory List Expire — Set expiration time in seconds (
0for never)Load DB Memory — Enable for performance (use with caution)
Save the configuration
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the Offline Media Scanner - covered in detail below!
Configuration Examples
Tape Archive with Custom Root Path
If you want your tape library to appear under a different virtual root than the default:
verbose: false root_path: /TAPE_ARCHIVE cachedir: /opt/diskover/__dircache_offline_media__/ dirlist_expire: 0 load_db_mem: false
High-Performance External Drive Indexing
For scenarios where you're rapidly indexing many drives and want maximum speed (with the understanding that a crash could corrupt the cache):
verbose: false root_path: /OFFLINE_MEDIA cachedir: /opt/diskover/__dircache_offline_media__/ dirlist_expire: 3600 load_db_mem: true
Virtual Path Structure
The scanner creates a virtual directory hierarchy that organizes all your offline media under one roof:
/OFFLINE_MEDIA/ ← root_path (configurable)
├── TAPE_001/ ← MEDIA_LABEL from first scan
│ ├── projects/
│ │ ├── project_a/
│ │ └── project_b/
│ └── archive/
├── TAPE_002/ ← MEDIA_LABEL from second scan
│ └── data/
└── EXTERNAL_DRIVE_SN12345/ ← MEDIA_LABEL from third scan
└── backups/
Physical paths are transformed automatically. For example:
Physical:
/mnt/tape/projects/project_a/file.txtVirtual:
/OFFLINE_MEDIA/TAPE_001/projects/project_a/file.txt
This means users searching in Diskover see the virtual path, which tells them both the file location within the volume and which media volume (TAPE_001) contains it.
Usage / Execution
The Offline Media Scanner is a standard alternate scanner that integrates with diskover.py via the --altscanner flag.
Setting the Media Label
Before every scan, set the MEDIA_LABEL environment variable to uniquely identify the volume:
Linux/macOS:
export MEDIA_LABEL=YOUR_MEDIA_IDENTIFIER
Windows (Command Prompt):
set MEDIA_LABEL=YOUR_MEDIA_IDENTIFIER
Windows (PowerShell):
$env:MEDIA_LABEL = "YOUR_MEDIA_IDENTIFIER"
First Scan — Creating a New Index
When scanning your first media volume, create a new index:
Linux:
cd /opt/diskover export MEDIA_LABEL=TAPE_2024_0001 python3 diskover.py -i diskover-offline-media --altscanner scandir_offline_media /mnt/tape
Windows:
cd "C:\Program Files\Diskover" set MEDIA_LABEL=TAPE_2024_0001 python diskover.py -i diskover-offline-media --altscanner scandir_offline_media E:\
Subsequent Scans — Appending to an Existing Index
This is critical: When adding more media volumes to an existing index, you must use the
-a(add to index) flag. Without-a, Diskover will overwrite the existing index and you'll lose all previously indexed media.
Swap your media, set a new label, and run with -a:
Linux:
cd /opt/diskover export MEDIA_LABEL=TAPE_2024_0002 python3 diskover.py -i diskover-offline-media -a --altscanner scandir_offline_media /mnt/tape
Windows:
cd "C:\Program Files\Diskover" set MEDIA_LABEL=TAPE_2024_0002 python diskover.py -i diskover-offline-media -a --altscanner scandir_offline_media E:\
Repeat this process for each additional volume — mount, set label, scan with -a.
Path Format Reference
Path Format | Description | Example |
|---|---|---|
Linux mount point | Standard mount path |
|
Windows drive letter | Drive with trailing backslash |
|
Windows UNC path | Network share |
|
Advanced Usage Examples
Custom Index Name:
export MEDIA_LABEL=ARCHIVE_VOL_42 python3 diskover.py -i diskover-tape-library-2024 --altscanner scandir_offline_media /mnt/tape
Verbose/Debug Logging:
export MEDIA_LABEL=DEBUG_TAPE_001 python3 diskover.py -i diskover-offline-test --altscanner scandir_offline_media --loglevel DEBUG /mnt/tape
Custom Virtual Root Path at Runtime:
export diskover_offlinemedia_root_path="/TAPE_ARCHIVE" export MEDIA_LABEL=TAPE_001 python3 diskover.py -i diskover-tapes --altscanner scandir_offline_media /mnt/tape
Inline Environment Variable (Linux — useful for cron jobs):
MEDIA_LABEL=TAPE_001 python3 /opt/diskover/diskover.py -i diskover-offline-media --altscanner scandir_offline_media /mnt/tape
Integration with Index Tasks
When configuring Offline Media scans through Diskover's Index Task system:
Field | Value |
|---|---|
Alternate Scanner |
|
Note: When using Index Tasks, you still need to ensure the
MEDIA_LABELenvironment variable is set in the execution environment before the task runs. For automated workflows, consider setting this in the task runner's environment or in a wrapper script.
Performance Tips
Leave
load_db_memasfalseunless you're sure your scans won't be interrupted. The performance gain is modest for most workloads, and a crash with this enabled can corrupt the cache database.The cache is most helpful for re-scanning the same volume. For one-time scans of media you won't re-scan, the cache still works but provides less benefit.
Use
dirlist_expire: 0(the default) for offline media, since the content of archived volumes typically doesn't change between scans.Choose meaningful, unique media labels — tape barcodes and drive serial numbers work well. Duplicate labels will cause confusing overlapping data in the index.
Troubleshooting
Common Issues
Issue | Cause | Solution |
|---|---|---|
| The | Set the variable with |
Second scan overwrites the index, losing previously indexed media | The | Always use |
| The mount point path is incorrect, the media isn't mounted, or the diskover user lacks read permissions. | Verify the media is mounted ( |
Duplicate or overlapping files in search results | The same | Use unique labels for every scan. Tape barcodes, drive serial numbers, or UUIDs work well. If duplicates exist, consider reindexing with unique labels. |
Virtual paths or statistics are incorrect after scan completes | Elasticsearch connectivity issues during the post-scan index update, or a corrupted cache database. | Check Elasticsearch health ( |
Windows paths appear incorrectly in the index | Backslash escaping issues or missing trailing backslash on drive letters. | Use |
Debug Logging
Enable detailed logging to diagnose issues:
python3 /opt/diskover/diskover.py --altscanner scandir_offline_media --loglevel DEBUG /mnt/tape
To specifically monitor cache behavior, enable the verbose configuration parameter, which logs cache hits and misses to the diskover.scandir_dircache logger.
Log File Locations
Linux:
/var/log/diskover/diskover.logWindows: Check Diskover service logs or the configured log output location.
Common Error Messages
ConfigurationError: The "MEDIA_LABEL" environment variable is not set!
This means the scanner could not find the MEDIA_LABEL in the current environment. This is especially common when running scans from cron, systemd, or other automated schedulers that don't inherit your shell's environment variables.
Resolution: Set the variable inline with the command:
# Cron example 0 2 * * * MEDIA_LABEL=TAPE_001 cd /opt/diskover && python3 diskover.py -i diskover-offline -a --altscanner scandir_offline_media /mnt/tape
Error creating dir cache directory
The scanner cannot create or access the cache directory.
Resolution: Verify the cache directory exists and has correct ownership and permissions:
ls -la /opt/diskover/__dircache_offline_media__ # Should be owned by the diskover user with 750 permissions
SQLite database corruption warnings
If load_db_mem was enabled and a scan was interrupted, the cache database may be corrupt.
Resolution: Delete the cache database files in the cache directory and re-scan. The cache will be rebuilt automatically:
rm -f /opt/diskover/__dircache_offline_media__/*
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.