Atempo
License: PRO+ (Professional Edition or higher)
Module Type: Alternate Scanner
Author: Diskover Data, Inc.
Overview
The Diskover Atempo Scanner connects Diskover to Atempo ADA (Advanced Digital Archive) — an enterprise archiving platform used for long-term data preservation, compliance, and hierarchical storage management. Atempo ADA moves data to tape libraries, object storage, and cloud destinations while maintaining a rich metadata catalog. The Diskover Atempo Scanner reads that catalog and indexes it into Diskover, giving you a searchable, unified view of both your active filesystems and your archived data — all from one interface.
Once indexed, archived files appear in Diskover search results just like any other file. You can filter by archive name, retention policy, tape media label, storage type, and more — without touching Atempo directly.
Use Cases
Audience |
Use Case |
|---|---|
Compliance & Legal Teams |
Rapidly locate archived records for audits, legal discovery, and regulatory requirements (GDPR, HIPAA, SOX). Verify retention policy adherence across all archives without manual browsing. |
Storage Administrators |
Get a unified view of active and archived data to support capacity planning, archive growth tracking, and storage tier optimization. |
Data Migration Teams |
Generate a complete, searchable inventory of archived content before migrating between archive systems. Use indexed metadata to scope migration projects and validate completeness. |
End Users & Data Stewards |
Locate archived files by name, original location, or date — without knowing which archive or tape holds the data. Use the indexed metadata to initiate retrieval through Atempo's native tools. |
Understanding Atempo ADA
Atempo ADA is a hierarchical storage management (HSM) and archiving platform that moves data from primary storage to lower-cost tiers (tape, object storage, cloud) while preserving the original file metadata in a PostgreSQL catalog database. Files are organized into archives — logical containers that group related data under a common name and policy.
The Diskover Atempo Scanner supports two modes for accessing archive data:
Mode |
Data Source |
Performance |
Best For |
|---|---|---|---|
Database Mode (default) |
Direct PostgreSQL access to Atempo's |
Higher — direct DB queries with O(1) folder lookups via Redis |
Production environments, large archives |
API Mode |
Atempo Web Services REST API |
Lower — HTTP overhead per request |
Environments where direct DB access isn't possible |
Note: Database mode is the default and recommended mode. API mode is available as an alternative when direct database access is not permitted. Both modes produce the same indexed output in Diskover. Switching between modes requires a configuration file change — contact Diskover Support for guidance.
Key Atempo Concepts
Concept |
Description |
|---|---|
Archive |
A named container in Atempo ADA that groups files under a common backup/archive policy |
Archive Mode |
An integer flag on each archive indicating whether it is active ( |
MIRIAEXPLOIT |
The name of Atempo's PostgreSQL catalog database — contains all archive, object, and instance metadata |
Storage Manager |
The backend storage destination for an archive (tape library, disk, object storage, cloud) |
Media |
The physical or logical storage media (e.g., LTO tape cartridges) where archived data is written |
Retention |
A named policy that defines how long archived data must be kept |
HSM Checksum |
A checksum calculated by Atempo's HSM layer for data integrity verification |
For detailed Atempo ADA documentation, see the Atempo ADA Product Page.
Requirements
System Requirements
Component |
Requirement |
|---|---|
Python |
3.9 or higher |
Diskover |
Core installation with alternate scanner support |
Network (Database Mode) |
TCP access to Atempo's PostgreSQL server — default port 5433 |
Network (API Mode) |
HTTPS access to Atempo Web Services endpoint — default port 443 |
Redis |
Running Redis server accessible from the Diskover host (Database mode only) |
Python Dependencies
Package |
Version |
Purpose |
|---|---|---|
|
1.4.x |
ORM for PostgreSQL database access (Database mode) |
|
2.9.x |
PostgreSQL database driver |
|
4.3.x |
Redis client for folder path caching (Database mode) |
External Service Requirements
Service |
Requirement |
|---|---|
Atempo ADA |
Version 5.x or higher |
PostgreSQL |
Atempo's |
Redis |
A running Redis server reachable from the Diskover host (Database mode only) |
Atempo Web Services |
REST endpoint configured and accessible (API mode only) |
Important: The scanner requires read-only access to Atempo's PostgreSQL catalog. Work with your Atempo administrator to obtain a database username and password with
SELECTprivileges on theMIRIAEXPLOITdatabase. No write access is required or recommended.
Installation
Step 1: Install Scanner Package
Linux:
dnf install diskover-scanner-atempo
Windows:
The scanner files are included with the Diskover Windows installation. No separate installation step is required.
Install locations:
Linux:
/opt/diskover/scanners/scandir_atempo/Windows:
C:\Program Files\Diskover\scanners\scandir_atempo\
Step 2: Install Python Dependencies
cd /opt/diskover/scanners/scandir_atempo python3 -m pip install -r requirements.txt
Verify each package installed correctly:
python3 -c "import sqlalchemy; print(f'SQLAlchemy: {sqlalchemy.__version__}')"
python3 -c "import psycopg2; print(f'psycopg2: {psycopg2.__version__}')"
python3 -c "import redis; print(f'Redis: {redis.__version__}')"
Step 3: Verify Redis Server
The scanner uses Redis to cache folder paths for fast traversal. Redis must be running before you start a scan.
redis-cli ping
Expected output: PONG
If Redis is not running:
sudo systemctl start redis sudo systemctl enable redis # optional: start Redis on boot
Tip: The Atempo scanner uses Redis to build its folder path cache at scan startup. If you are running Redis for other purposes, consider dedicating a separate Redis database number (e.g.,
redis_db: 1) in the scanner configuration to avoid conflicts.
Step 4: Create the Configuration Directory
The scanner uses the confuse library to locate its configuration file. Create the appropriate directory for your platform:
# Linux mkdir -p ~/.config/scandir_atempo/ # macOS mkdir -p ~/Library/Application\ Support/scandir_atempo/ # Windows (run in Command Prompt) mkdir %APPDATA%\scandir_atempo\
Step 5: Copy and Edit the Configuration File
Copy the sample configuration to the appropriate location and edit it with your environment details:
# Linux cp /opt/diskover/scanners/scandir_atempo/config.yaml ~/.config/scandir_atempo/config.yaml nano ~/.config/scandir_atempo/config.yaml # macOS cp /opt/diskover/scanners/scandir_atempo/config.yaml ~/Library/Application\ Support/scandir_atempo/config.yaml
On Windows, copy config.yaml to %APPDATA%\scandir_atempo\.
See the Configuration section below for full parameter details.
Step 6: Verify Database Connectivity
Confirm that Diskover can reach the Atempo PostgreSQL catalog:
psql -h <db_host> -p 5433 -U postgres -d MIRIAEXPLOIT -c "SELECT count(*) FROM archive;"
A successful response returns a row count. If this command fails, check your network access and PostgreSQL credentials before proceeding.
Step 7: Verify Installation
Run a quick module check to confirm all dependencies are in place:
python3 -c "from scanners.scandir_atempo import scandir_atempo_db; print('Atempo scanner loaded successfully')"
Configuration
Configuration was managed through a YAML file located in a platform-specific directory. In version 1.6.0 you can now adjust settings via Settings > Alternate Scanners > Atempo in the Diskover Admin UI.
Configuration File Locations
Platform |
Location |
|---|---|
Linux |
|
macOS |
|
Windows |
|
Configuration Parameters
Database Mode Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
PostgreSQL username for the Atempo catalog |
|
string |
(empty) |
PostgreSQL password |
|
string |
|
Hostname or IP of the PostgreSQL server |
|
integer |
|
PostgreSQL port — Atempo typically uses 5433, not the standard 5432 |
|
string |
|
Atempo catalog database name — commonly |
|
integer |
|
SQLAlchemy connection pool size |
|
integer |
|
Maximum overflow connections beyond the pool size |
Redis Parameters (Database Mode)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Hostname or IP of the Redis server |
|
integer |
|
Redis server port |
|
integer |
|
Redis database number to use for caching |
API Mode Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Full URL to the Atempo ADA Web Services endpoint |
|
boolean |
|
Enable or disable SSL certificate verification |
|
boolean |
|
Enable verbose debug logging for API requests |
Archive Selection
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
list or null |
|
List of archive names to scan. Set to |
FieldsToPull — Custom Metadata Configuration
The FieldsToPull section controls which Atempo metadata fields are extracted and how they appear in Elasticsearch. Each entry maps an Atempo attribute to a field name in the indexed document.
Parameter |
Type |
Description |
|---|---|---|
|
string |
The field name as it will appear in Elasticsearch and Diskover search |
|
string |
The corresponding attribute name in the Atempo data model |
You can add, remove, or rename entries in this section to match the metadata your team needs. The field names listed below represent the default configuration.
Complete Configuration Example
# Diskover Atempo Scanner Configuration
appName: diskover_scandir_atempo
# --- API Mode Settings (used only when running in API mode) ---
base_url: "https://atempo-server:443/meta/03FE9A23CDE96C384873658E81E7ACDE/7e16637c1e7507617e7c16650f6e/ADA/WS/"
verify_cert: false
debug: false
# --- Database Mode Settings (default mode) ---
db_user: postgres
db_password: your_secure_password
db_host: 192.168.1.50
db_port: 5433
db_name: MIRIAEXPLOIT
db_pool_size: 10
db_max_overflow: 20
# --- Redis Settings ---
redis_host: 127.0.0.1
redis_port: 6379
redis_db: 0
# --- Archives to Scan ---
# Set to null to scan all active archives
# Or specify a list of archive names (case-sensitive)
archives:
- backup_tauto
- backup_partage
- backup_tauto_One_to_One
- Backup_DataISO
# --- Metadata Fields to Extract ---
FieldsToPull:
- ESField: 'Archive Name'
AtempoField: archive_name
- ESField: 'Archive Date'
AtempoField: archive_date
- ESField: 'Archive Mode'
AtempoField: archive_mode
- ESField: 'Original location'
AtempoField: original_location
- ESField: 'Digest Type'
AtempoField: digest_type
- ESField: 'Storage name'
AtempoField: storage_manager_name
- ESField: 'Storage Type'
AtempoField: storage_manager_type
- ESField: 'Media List'
AtempoField: media
- ESField: 'HSM Checksum'
AtempoField: hsm_checksum
- ESField: 'Retention'
AtempoField: retention
- ESField: 'Archive Policy'
AtempoField: archive_policy
Usage
The Atempo scanner uses the standard Diskover --altscanner flag. All scans target the virtual /atempo root path, which the scanner maps to your configured Atempo archives.
Basic Scan
Linux:
cd /opt/diskover python3 diskover.py --altscanner scandir_atempo /atempo
Windows:
cd "C:\Program Files\Diskover" python diskover.py --altscanner scandir_atempo /atempo
Scan with a Custom Index Name
Use -i to write results to a named index — useful for scheduled scans or maintaining separate indexes for different archive sets:
python3 diskover.py -i diskover-atempo-2025-q2 --altscanner scandir_atempo /atempo
Scan a Specific Archive
Target a single archive by including its name in the path:
python3 diskover.py --altscanner scandir_atempo /atempo/backup_partage
Enable Debug Logging
python3 diskover.py --altscanner scandir_atempo --loglevel DEBUG /atempo
Parallel Crawling
Use --threads to increase scan throughput on large archives:
python3 diskover.py --altscanner scandir_atempo --threads 8 /atempo
Override Configuration Directory at Runtime
If you need to use an alternate configuration file location:
export SCANDIRATEMPODIR=/path/to/custom/config python3 diskover.py --altscanner scandir_atempo /atempo
Path Format Reference
Path |
What It Scans |
|---|---|
|
All archives listed in the |
|
A single named archive, e.g., |
Configuring as an Index Task
You can schedule Atempo scans to run automatically using Diskover's Index Tasks feature.
Field |
Value |
|---|---|
Task Name |
e.g., |
Alternate Scanner |
|
Top Path |
|
Schedule |
Set your preferred cron schedule |
Note: At scan startup, the scanner rebuilds its Redis folder cache for all configured archives. Startup time scales with the number of directories in your archives. For very large environments, plan scan schedules accordingly.
Metadata Fields
The Atempo scanner indexes custom metadata alongside standard file fields. All Atempo-specific fields are nested under the atempo_data object in each indexed document.
Field Reference
Field Path |
ES Type |
Description |
|---|---|---|
|
keyword |
Name of the Atempo archive containing this file |
|
keyword |
Timestamp when the file was archived |
|
keyword |
Atempo archive mode identifier |
|
keyword |
Name of the archive policy applied to this file |
|
keyword |
The original file path on the source system before archiving |
|
keyword |
Name of the Atempo storage manager (e.g., tape library name) |
|
keyword |
Storage type identifier (tape, disk, cloud) |
|
keyword |
Comma-separated list of media labels (e.g., LTO tape cartridge IDs) where this file's data is stored |
|
keyword |
Retention policy name applied to this file |
|
keyword |
Checksum algorithm used for data integrity |
|
keyword |
Checksum value calculated by the HSM layer |
Elasticsearch Mapping
{
"mappings": {
"properties": {
"atempo_data": {
"type": "keyword"
}
}
}
}
Example Indexed Document
{
"name": "financial_report_2024.pdf",
"path": "/atempo/backup_partage/documents/reports/financial_report_2024.pdf",
"extension": "pdf",
"size": 2457600,
"mtime": "2024-06-15T14:30:00Z",
"atempo_data": {
"Archive Name": "backup_partage",
"Archive Date": "2024-06-20T02:00:00Z",
"Archive Mode": 1,
"Original location": "\\\\fileserver\\shared\\documents\\reports\\financial_report_2024.pdf",
"Storage name": "TapeLibrary01",
"Storage Type": 1,
"Media List": "LTO001234,LTO001235",
"HSM Checksum": "a1b2c3d4e5f6...",
"Retention": "7_years",
"Archive Policy": "compliance_archive",
"Digest Type": "SHA256"
}
}
Searching in Diskover
Once indexed, Atempo archive metadata is fully searchable using Diskover's standard search syntax. All Atempo fields are stored under the atempo_data object.
Search Query Examples
Search by Archive
Query |
What It Finds |
|---|---|
|
All files in the |
|
All files in archives whose names start with |
Search by Retention and Policy
Query |
What It Finds |
|---|---|
|
Files with a 7-year retention policy applied |
|
Files archived under the |
|
Files whose retention policy name contains "compliance" |
Search by Storage Location
Query |
What It Finds |
|---|---|
|
Files stored on a specific tape library |
|
Files stored on tape-type storage |
|
Files whose data resides on tape cartridge LTO001234 |
Search by Original Location
Query |
What It Finds |
|---|---|
|
Files originally stored on a server named "fileserver" |
|
Files archived from a specific share path |
Combined Searches
Query |
What It Finds |
|---|---|
|
PDFs in a specific archive |
|
Files matching both a retention policy and archive policy |
|
Files from a specific archive on a specific tape |
|
Files larger than 1 GB stored on a specific tape library |
Tip: Archive field names that contain spaces (like
Archive Name) require escaping the space with a backslash in search queries, e.g.,atempo_data.Archive\ Name:backup_partage. Alternatively, use wildcard patterns if exact escaping is tricky in your environment.
Troubleshooting
Common Issues
Issue |
Likely Cause |
Resolution |
|---|---|---|
Scanner fails to start with database connection error |
Incorrect |
Verify connectivity with |
Scanner fails during Redis initialization |
Redis not running or unreachable |
Run |
Scan completes but no files indexed |
Archive names don't match configuration, or archives have no files |
Check archive names are exact and case-sensitive. Run with |
Specific archives missing from results |
Archive not active in Atempo ( |
Query: |
Configuration file not found error |
Config directory doesn't exist or file not copied |
Create |
SSL certificate errors (API mode) |
Self-signed certificate on Atempo Web Services |
Set |
Slow startup on large archives |
Redis cache rebuild takes time on first scan |
This is expected behavior. The scanner indexes all folder paths to Redis before scanning begins. Startup time scales with the number of directories in your archives. |
API mode returns limited archives |
API mode is a secondary access method |
Consider switching to Database mode for full archive access and better performance. Contact Diskover Support for guidance. |
Debug Logging
Run with --loglevel DEBUG to get detailed output on what the scanner is doing at each step:
Linux:
python3 /opt/diskover/diskover.py --altscanner scandir_atempo --loglevel DEBUG /atempo 2>&1 | tee /tmp/atempo_debug.log
Windows:
python "C:\Program Files\Diskover\diskover.py" --altscanner scandir_atempo --loglevel DEBUG /atempo
Log File Locations
Platform |
Location |
|---|---|
Linux |
|
Windows |
Check Diskover service logs or your configured log output location |
Connectivity Verification Commands
Test PostgreSQL connection:
psql -h <db_host> -p 5433 -U postgres -d MIRIAEXPLOIT -c "SELECT count(*) FROM archive WHERE a_mode = 1;"
Test Redis connection:
redis-cli -h <redis_host> -p 6379 ping
Check available active archives:
psql -h <db_host> -p 5433 -U postgres -d MIRIAEXPLOIT \ -c "SELECT a_name, a_mode FROM archive WHERE a_mode = 1 ORDER BY a_name;"
Verify a specific archive has indexed content:
psql -h <db_host> -p 5433 -U postgres -d MIRIAEXPLOIT \
-c "SELECT count(*) FROM instance i
JOIN object o ON i.o_node = o.o_node
JOIN archive a ON o.a_node = a.a_node
WHERE a.a_name = 'backup_partage';"
Support
Last Updated: March 2026
Diskover Data, Inc.
Comments
0 comments
Please sign in to leave a comment.