Checksums Post-Index Plugin
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview
The Checksums plugin generates and stores cryptographic hash values for files in your Diskover indices after a scan completes. These hashes serve as digital fingerprints that enable data integrity verification, migration validation, and duplicate detection.
Think of checksums as unique identifiers for your file contents. Two files with identical contents will always produce the same hash, while even a single byte change results in a completely different value. This makes checksums invaluable for tracking changes, validating data transfers, finding duplicates, and proving file authenticity.
Why Use This Plugin?
Verify data integrity — Confirm files haven't been corrupted or modified by comparing hashes over time
Validate migrations — Compare checksums before and after data moves to ensure nothing was lost or corrupted
Enable duplicate detection — Generate the hash values that the Dupesfinder plugin uses to identify duplicate files across your storage
Sample data from a Checksum execution:
Here we can see the entire Checksums array containing the FHash, MD5 and SHA256 - additionally, we can see that the MD5 and SHA256 checksums have been placed in their own columns as well!
Use Cases
Data Integrity Verification
Ensuring your files remain unchanged over time is critical for long-term storage, archival systems, and compliance requirements. The Checksums plugin lets you establish a baseline hash for each file, then verify those hashes later to confirm nothing has been corrupted or tampered with.
Typical workflow:
Run an initial Diskover scan of your storage
Execute the Checksums plugin to generate hash values for all files
Periodically re-scan and re-run checksums
Compare hash values to identify any files that have changed unexpectedly
This approach is particularly valuable for cold storage and archive systems where files should never change, or for compliance scenarios where you need to demonstrate file integrity over time.
Migration Validation
When moving data between storage systems, network locations, or cloud platforms, checksums provide definitive proof that every file arrived intact. Rather than simply comparing file counts and sizes, you can verify the actual content of each file matches the original.
Typical workflow:
Index your source storage location with Diskover
-
Run the Checksums plugin with CSV export enabled to create a hash manifest:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -c -u diskover-source-index
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -c -u diskover-source-index
Perform your data migration
Index the destination storage location with Diskover
-
Run the Checksums plugin on the destination index:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -c -u diskover-destination-index
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -c -u diskover-destination-index
Compare the CSV exports to verify all files transferred correctly
Any mismatched hashes indicate files that were corrupted during transfer and need to be re-copied.
Duplicate Detection Preparation
The Checksums plugin generates the hash values that the Dupesfinder plugin uses to identify duplicate files across your storage. Running checksums first enriches your index with the metadata needed for accurate duplicate detection later.
Why this matters: Without hash values, duplicate detection would rely solely on filename and size matching, which can produce false positives (different files with the same name and size) and miss true duplicates (identical files with different names). Hash-based detection is definitive—if two files have the same hash, they have identical content.
Typical workflow:
-
Run the Checksums plugin to generate hash values:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -u diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -u diskover-myindex
-
Run the Dupesfinder plugin to identify files with matching hashes:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_dupesfinder/diskover_dupesfinder.py -U diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_dupesfinder\diskover_dupesfinder.py" -U diskover-myindex
This two-step approach gives you complete control over when hashing occurs and which files to include, while keeping the duplicate detection logic separate and flexible.
Installation
DNF Installation (Linux RPM)
On Linux systems using DNF package management, this plugin can be installed via RPM:
sudo dnf install diskover-plugin-postindex-checksums
Note: Ensure your system is configured with the Diskover RPM repository before running the install command.
Prerequisites
Component |
Requirement |
|---|---|
Python |
3.9 or higher |
Diskover |
Core installation with Elasticsearch |
Elasticsearch |
7.x or 8.x (as supported by Diskover) |
Storage Access |
Read access to the indexed storage paths |
Memory |
Minimum 2GB RAM (4GB+ recommended for large datasets) |
Python Dependencies
Package |
Purpose |
|---|---|
xxhash |
Fast non-cryptographic hashing (required for xxhash mode) |
Installation Steps
-
Ensure the plugin file is in your Diskover plugins directory:
Linux:
/opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py
Windows:
C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py
-
Install the required Python dependency:
Linux:
python3 -m pip install xxhash
Windows (PowerShell):
python -m pip install xxhash
-
Verify the installation:
Linux:
python3 -c "import xxhash; print('xxhash version:', xxhash.VERSION)"Windows (PowerShell):
python -c "import xxhash; print('xxhash version:', xxhash.VERSION)" -
Verify Elasticsearch connectivity:
Linux:
python3 -c "from diskover_elasticsearch import es_connection_cached; print(es_connection_cached().info())"
Windows (PowerShell):
python -c "from diskover_elasticsearch import es_connection_cached; print(es_connection_cached().info())"
Configuration
Configuration is managed through the Diskover Admin Panel. Navigate to Plugins → Post Index → Checksums to access the settings.
Here is the beginning of our sample configuration, which you can see we're going to be doing a SHA256 hash on the files. There are many other configuraitons for the Checksums plugin - covered in detail below!
Configuration Parameters
Option |
Type |
Default |
Description |
|---|---|---|---|
|
int |
0 |
Number of hashing threads. 0 = auto-detect based on CPU cores |
|
boolean |
false |
Use fast hash only (filename+size MD5), skip reading file contents |
|
string |
xxhash |
Algorithm: xxhash, md5, sha1, or sha256 |
|
int |
65536 |
Block size in bytes for reading files |
|
string |
(see below) |
Directory for SQLite cache database |
|
int |
0 |
Cache expiry in seconds (0 = never expire) |
|
int |
1024 |
Minimum file size to hash (bytes) |
|
int |
1073741824 |
Maximum file size to hash (bytes, default 1GB) |
|
list |
[] |
File extensions to include (empty = all) |
|
list |
[] |
File extensions to exclude |
|
list |
[] |
Filenames to exclude |
|
list |
[] |
Directory paths to exclude (use |
|
boolean |
false |
Include hardlinks (nlink > 1) |
|
string |
"" |
Additional Elasticsearch query to filter files |
|
boolean |
false |
Restore atime/mtime after hashing |
|
object |
{} |
Path replacement for NFS/mounted shares |
|
boolean |
false |
Use disk mtime instead of index mtime for cache comparison |
|
string |
/tmp |
Directory for CSV output files |
Default cache directory:
Linux:
/opt/diskover/plugins_postindex/__diskover_hash_cache__/Windows:
C:\Program Files\Diskover\plugins_postindex\__diskover_hash_cache__\
Hash Algorithm Selection
Choosing the right hash algorithm depends on your use case. Here's how they compare:
Algorithm |
Speed |
Security Level |
Best For |
Hash Length |
|---|---|---|---|---|
xxhash |
Fastest |
Non-cryptographic |
Duplicate detection, general integrity checks |
16 chars |
md5 |
Fast |
Weak (collisions possible) |
Legacy system compatibility |
32 chars |
sha1 |
Medium |
Deprecated for security |
Legacy compatibility |
40 chars |
sha256 |
Slowest |
Strong (cryptographic) |
Compliance, archival, security requirements |
64 chars |
Recommendations:
For duplicate detection: Use
xxhashfor maximum performance. Since you're comparing files within your own storage, cryptographic security isn't necessary.For data integrity verification: Use
xxhashfor routine checks, orsha256if you need cryptographic assurance.For compliance requirements (SOX, HIPAA, GDPR): Use
sha256to meet regulatory standards for file integrity monitoring.For migration validation: Use
sha256when you need definitive proof of data integrity, orxxhashfor faster validation of large datasets.
Fast Hash Mode
The fast hash mode generates an MD5 of the filename combined with file size, without reading the file contents:
fhash = md5(filename + filesize)
This provides extremely fast fingerprinting suitable for quick duplicate detection based on name and size, but is not suitable for integrity verification since it doesn't read actual file content.
Example Configuration: Production Environment
maxthreads: 8 fast_hash_only: false hash_mode: xxhash blocksize: 1048576 # 1MB for large files cache_dir: /opt/diskover/plugins_postindex/__diskover_hash_cache__/ cache_expire_time: 0 min_size: 1024 max_size: 10737418240 # 10GB extensions: [] exclude_extensions: - tmp - log - bak exclude_files: - .DS_Store - Thumbs.db exclude_dirs: - /mnt/data/temp/* - /mnt/data/cache/* hardlinks: false restore_times: false csvdir: /var/log/diskover/checksums
Path Replacement Configuration
For environments where indexed paths differ from accessible paths (common with NFS mounts or when the indexing server accesses storage differently than the checksum worker), configure path replacement:
replace_paths: enable: true from_path: /mnt/nfs/production to_path: /data/production
Filtering Options
You can control which files get hashed using several filtering mechanisms:
Size-based filtering:
min_size: 1024 # Skip files smaller than 1KB max_size: 10737418240 # Skip files larger than 10GB
Extension-based filtering:
# Only hash specific file types extensions: - pdf - docx - xlsx # Or exclude specific file types exclude_extensions: - tmp - log - bak
Directory exclusions:
exclude_dirs: - /mnt/data/temp # Exact match - /mnt/data/cache/* # Recursive (all subdirectories) - /mnt/data/.snapshot/* # NetApp snapshots
Custom Elasticsearch queries:
# Only hash files modified in the last 30 days other_query: "mtime:[now-30d TO now]" # Only hash files with specific tags other_query: "tags:important" # Combine multiple conditions other_query: "owner:dataadmin AND mtime:[now-7d TO now]"
Running the Plugin
Basic Usage
Run the plugin from the command line, specifying the index to process:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py diskover-indexname
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" diskover-indexname
Command Line Options
Option |
Description |
|---|---|
|
Use a specific named configuration |
|
Export hash results to CSV file |
|
Enable SQLite hash cache |
|
Clear the hash cache before processing |
|
Retrieve hashes from an existing index |
|
Auto-find previous index for hash reuse |
|
Remove all hash fields from index |
|
Auto-find latest index by top path |
|
Skip files that already have hash values |
|
Override hash algorithm (xxhash/md5/sha1/sha256) |
|
Use fast hash mode (overrides config) |
|
Enable verbose logging |
|
Enable very verbose (debug) logging |
|
Print version and exit |
Example: Basic Checksums with Caching
Enable caching to avoid re-hashing unchanged files on subsequent runs:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -u -v diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -u -v diskover-myindex
Example: SHA256 for Compliance with CSV Export
Generate SHA256 hashes (for compliance requirements) and export results to a CSV file:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -m sha256 -c -u diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -m sha256 -c -u diskover-myindex
Example: Skip Already-Hashed Files
When running incrementally, skip files that already have hash values:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -e -u diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -e -u diskover-myindex
Example: Process Multiple Indices
Hash files across multiple indices in a single run:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -u diskover-index1 diskover-index2 diskover-index3
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -u diskover-index1 diskover-index2 diskover-index3
Example: Reuse Hashes from Previous Index
When you have a new index of the same storage location, reuse hashes from the previous index for unchanged files:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py --useindexauto -u diskover-newindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" --useindexauto -u diskover-newindex
Example: Remove All Hashes from an Index
If you need to clear hash data and start fresh:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -r diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -r diskover-myindex
Setting Up Automated Checksums
To run checksums automatically, use Diskover's built-in task scheduling features.
Option 1: Custom Task
Create a Custom Task in Diskover Admin to run checksums on a defined schedule.
Navigate to Task Panel → Custom Tasks in Diskover Admin
Create a new Custom Task with the appropriate configuration
Configure the schedule (daily, weekly, etc.)
Save and enable the task
Here we can see the Run Command & args needed for the Custom Task - Note that in this case you cannot use the {indexname} variable as this is not a task that creates an index, so we must use the -l (toppath) CLI option and pass in our top path!
Option 2: Post-Crawl Command (Index Task)
Run checksums automatically after each index completes by adding it as a Post-Crawl Command. This ensures your index is always enriched with hash metadata immediately after scanning.
Navigate to Task Panel → Index Tasks in Diskover Admin
Edit the Index Task you want to trigger checksums from
Add the Post-Crawl Command configuration:
Linux Example:
Field |
Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field |
Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Available Index Task Tokens:
{indexname}— The name of the index that was just created
Important:
The Post-Crawl Command field should contain ONLY the executable (e.g.,
python3,python)All script paths, flags, and arguments go in the Post-Crawl Command Args field
In your system ensure to replace the ConfigurationName above with a named configuraiton that you've created at Diskover Admin → Plugins → Post-Index → Checksums – If you are not using a custom configuration and you're just using Default than the -a flag and the ConfigurationName is not required!
Reviewing the Output
Log Output
A successful run displays progress information and final statistics:
INFO - Starting diskover checksums for indices ['diskover-myindex'] ... INFO - Using hash mode xxhash INFO - Started 8 file hash threads INFO - Found 15247 docs INFO - Queuing files from index diskover-myindex... INFO - STATS (files hashed 5000 (32.8%), files in queue 150, elapsed 0:02:34, perf 32.5 files/s, memory usage 245MB) INFO - Done file checksuming for index diskover-myindex INFO - *** Elapsed time 0:07:48 *** INFO - *** Total files: 15247 *** INFO - *** Files hashed: 15247 (0.0% reduction of total files) ***
Key metrics to watch:
Files hashed — Total number of files processed
Perf (files/s) — Processing speed, useful for estimating completion time
Memory usage — Monitor this if processing large datasets
Reduction percentage — Shows how many files were skipped due to caching or filters
CSV Export
When using the -c flag, CSV files are saved to the configured csvdir with the naming pattern:
diskover-checksums_<index>_<hashmode>_<YYYY_MM_DD_HH_MM_SS>.csv
Example: diskover-checksums_diskover-prod_xxhash_2025_01_15_14_30_00.csv
CSV columns (full hash mode):
Column |
Description |
|---|---|
File |
Full file path |
Fhash(Fast Hash) |
MD5 of filename+size |
Hash(algorithm) |
Full file content hash |
Size(bytes) |
File size |
Mtime(utc) |
Modified time in UTC |
Index |
Elasticsearch index name |
Docid |
Elasticsearch document ID |
CSV columns (fast hash only mode):
Column |
Description |
|---|---|
File |
Full file path |
Fhash(Fast Hash) |
MD5 of filename+size |
Size(bytes) |
File size |
Mtime(utc) |
Modified time in UTC |
Index |
Elasticsearch index name |
Docid |
Elasticsearch document ID |
Cache Database
When caching is enabled (-u), a SQLite database stores hash values to avoid re-hashing unchanged files. The cache uses:
Key: MD5 hash of the file path
Value: Hash value and file mtime
On subsequent runs, if a file's mtime matches the cached entry, the stored hash is reused instead of re-reading the file. This dramatically speeds up repeated runs on the same storage.
Sample output of initial execution with Cache enabled :
2026-04-09 19:38:02,635 - diskover.plugin.checksums - INFO - Done file checksuming for index diskover-build-dir-checksums 2026-04-09 19:38:02,635 - diskover.plugin.checksums - INFO - *** Total files: 578 *** 2026-04-09 19:38:02,635 - diskover.plugin.checksums - INFO - *** Files hashed: 578 (0.0% reduction of total files) *** 2026-04-09 19:38:02,636 - diskover.cache - INFO - CACHE HITS: 0, MISSES: 578, HIT RATIO: 0.0% (/opt/diskover/plugins_postindex/__diskover_hash_cache__/) 2026-04-09 19:38:02,636 - diskover.cache - INFO - Closing cache DB /opt/diskover/plugins_postindex/__diskover_hash_cache__/cache_database.db... 2026-04-09 19:38:02,643 - diskover.cache - INFO - Cache DB /opt/diskover/plugins_postindex/__diskover_hash_cache__/cache_database.db closed
Here we can see that these was a 0.0% hit ratio on the Cache and all 578 files were hashed!
Sample output of second execution with Cache enabled :
2026-04-09 19:38:08,453 - diskover.plugin.checksums - INFO - Done file checksuming for index diskover-build-dir-checksums 2026-04-09 19:38:08,453 - diskover.plugin.checksums - INFO - *** Total files: 578 *** 2026-04-09 19:38:08,453 - diskover.plugin.checksums - INFO - *** Files hashed: 0 (0.0% reduction of total files) *** 2026-04-09 19:38:08,453 - diskover.cache - INFO - CACHE HITS: 578, MISSES: 0, HIT RATIO: 100.0% (/opt/diskover/plugins_postindex/__diskover_hash_cache__/) 2026-04-09 19:38:08,454 - diskover.cache - INFO - Closing cache DB /opt/diskover/plugins_postindex/__diskover_hash_cache__/cache_database.db... 2026-04-09 19:38:08,454 - diskover.cache - INFO - Cache DB /opt/diskover/plugins_postindex/__diskover_hash_cache__/cache_database.db closed
Here we can see a 100% hit ratio on the Cache (as no files were modified between executions) and that 0 files were actually hashed!
Searching in Diskover
After running the Checksums plugin, hash values are stored in Elasticsearch and become searchable through the Diskover web interface. This enables powerful queries for data integrity verification and duplicate investigation.
Available Hash Fields
The plugin creates the following searchable fields:
Field |
Description |
|---|---|
|
Fast hash (MD5 of filename+size) |
|
xxhash content hash |
|
MD5 content hash |
|
SHA1 content hash |
|
SHA256 content hash |
Find Files by Specific Hash Value
Search for a specific hash to find all files with that exact content:
hash.xxhash: ef46db3751d8e999
hash.sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
hash.md5: d41d8cd98f00b204e9800998ecf8427e
Find All Files with Hashes
hash: *
Find Files with a Specific Hash Type
hash.sha256: *
hash.xxhash: *
Find Files Without Hashes
Useful for identifying files that weren't processed (perhaps due to size filters):
type:file AND NOT hash: *
Combine Hash Searches with Other Criteria
Find hashed PDF files larger than 1MB:
hash.xxhash: * AND extension:pdf AND size:>1048576
Find hashed files in a specific directory:
hash: * AND parent_path:/mnt/data/important/*
Find files hashed with one algorithm but not another:
hash.md5: * AND NOT hash.sha256: *
Here we can see all 578 files found have an MD5 checksum but not a SHA1!
Integration with Dupesfinder
The Checksums plugin is designed to work together with the Dupesfinder plugin for comprehensive duplicate detection. Running Checksums first enriches your index with hash metadata that Dupesfinder then uses to identify files with identical content.
Why Two Plugins?
Separating checksum generation from duplicate detection provides flexibility:
Run checksums once, find duplicates multiple times — Hash values persist in the index, so you can run duplicate detection whenever needed without re-hashing
Control resource usage — Hashing is I/O intensive; run it during off-peak hours, then run lightweight duplicate detection anytime
Different scopes — Hash a single index, then find duplicates across multiple indices
Incremental updates — Add hashes to new files without re-processing existing ones
Complete Duplicate Detection Workflow
Step 1: Generate checksums for your index
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -u -v diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -u -v diskover-myindex
Step 2: Run Dupesfinder to identify duplicates
Linux:
python3 /opt/diskover/plugins_postindex/diskover_dupesfinder/diskover_dupesfinder.py -U diskover-myindex
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_dupesfinder\diskover_dupesfinder.py" -U diskover-myindex
Automating the Workflow
You can chain both plugins as Post-Crawl Commands on an Index Task to automatically generate checksums and find duplicates after each scan:
Linux Example:
Field |
Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field |
Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Performance Tuning
Thread Configuration
The maxthreads setting controls how many files are hashed in parallel:
# Auto-detect based on CPU cores (recommended for most environments) maxthreads: 0 # Fixed thread count for controlled resource usage maxthreads: 8
Recommendations:
For local storage: Use auto-detect (0) or match your CPU core count
For network storage (NFS, SMB): Start with 4-8 threads and adjust based on I/O saturation
Block Size Optimization
The blocksize setting controls how much data is read at a time when hashing:
# Default (64KB) - good for mixed file sizes blocksize: 65536 # Large files optimization (1MB) blocksize: 1048576 # NFS optimization - match your rsize mount option blocksize: 131072 # 128KB
Recommendations:
For large files (video, archives): Increase to 1MB (1048576)
For NFS storage: Match your mount's
rsizeoption for optimal read performance
Algorithm Performance
Approximate hashing speeds on typical server hardware (single thread):
Algorithm |
Speed (MB/s) |
1GB File Time |
|---|---|---|
xxhash |
5000+ |
~0.2s |
MD5 |
400-600 |
~2s |
SHA1 |
300-500 |
~2.5s |
SHA256 |
200-400 |
~3.5s |
If performance is critical and cryptographic security isn't required, xxhash provides dramatically faster processing.
Troubleshooting
No Hashes Generated
Symptom: Plugin runs but no hash fields appear in documents.
What to check:
Verify file size filters match your files — default
min_sizeis 1024 bytes andmax_sizeis 1GBCheck that the Diskover service user has read access to the files
Review extension filters if you've configured them
Run with
-vor-Vto see which files are being processed and why others might be skipped
Diagnostic query to count eligible files:
curl -X GET "localhost:9200/diskover-myindex/_count" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string": {
"query": "type:file AND size:>=1024 AND size:<=1073741824"
}
}
}'
xxhash Module Not Found
Symptom: Error "Missing xxhash Python module"
Solution: Install the xxhash package:
Linux:
python3 -m pip install xxhash
Windows (PowerShell):
python -m pip install xxhash
Verification:
python3 -c "import xxhash; print('xxhash version:', xxhash.VERSION)"
Permission Denied Errors
Symptom: Warnings about unable to open or stat files.
What to check:
Ensure the Diskover service user has read access to the storage paths
For NFS mounts, verify export options include read permissions
Check if
replace_pathsconfiguration is needed for your environment
Test file access:
sudo -u diskover cat /path/to/problem/file > /dev/null && echo "OK"
Cache Not Working
Symptom: Files are re-hashed on every run despite using -u.
What to check:
Verify the cache directory exists and is writable
Check if file mtimes are changing between runs
Consider enabling
use_disk_mtimeif index mtimes differ from actual file mtimesTry flushing the cache with
-fand rebuilding
Debug cache behavior:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_checksums/diskover_checksums.py -V -u diskover-myindex 2>&1 | grep -E "CACHE (HIT|MISS)"
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_checksums\diskover_checksums.py" -V -u diskover-myindex 2>&1 | Select-String "CACHE (HIT|MISS)"
Slow Performance
What to try:
Use
xxhashinstead of SHA256 if cryptographic security isn't requiredIncrease
blocksizeto 1MB (1048576) for large files or network storageReduce
maxthreadsfor network storage to avoid I/O saturationEnable caching (
-u) to skip unchanged files on subsequent runsUse extension or size filters to focus on the files that matter
Elasticsearch Connection Issues
Symptom: Error connecting to Elasticsearch.
Diagnostic steps:
# Test ES connectivity curl -X GET "localhost:9200/_cluster/health" # Test from Python python3 -c "from diskover_elasticsearch import es_connection_cached; print(es_connection_cached().info())"
What to check:
Verify Elasticsearch is running
Check ES host/port configuration in Diskover settings
Verify authentication credentials if security is enabled
Check network connectivity and firewall rules
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.