Index Differential
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview
The Index Differential plugin compares two Diskover indices to identify file differences between them. Whether you're validating a data migration, verifying backup integrity, or tracking changes over time, this plugin helps you quickly identify what's different between two points in time or two storage locations.
The plugin generates detailed CSV reports showing exactly which files differ and how, and can optionally tag documents directly in Diskover so you can search and filter differences using the standard search interface.
Use Cases
Data Migration Validation
After migrating data from one storage system to another, use Index Differential to verify that all files transferred correctly. Compare the source and destination indices using checksum verification to ensure data integrity.
Backup and DR Verification
Compare your production index against a backup or disaster recovery site index to ensure data completeness. This is particularly valuable for validating that critical data is properly replicated and available for recovery scenarios.
Change Detection Over Time
Identify what changed between two scan dates by comparing indices from different time periods. This helps answer questions like "what files were added, removed, or modified in the last week?" and supports compliance auditing or operational monitoring.
Installation
Prerequisites
Component | Requirement |
|---|---|
Diskover | Licensed installation (Professional Edition or higher) |
Python | 3.9 or higher |
Elasticsearch | 7.x or 8.x (as supported by your Diskover installation) |
Installation Steps
The Index Differential plugin is included with Diskover Professional Edition and higher. Verify the plugin is present in your installation:
Linux:
ls -la /opt/diskover/plugins_postindex/diskover_indexdiff/
Windows:
dir "C:\Program Files\Diskover\plugins_postindex\diskover_indexdiff\"
You should see the following files:
diskover_indexdiff.py— The main plugin scriptREADME.md— Plugin documentation
Verify Installation
Confirm the plugin is working by checking its version:
Linux:
cd /opt/diskover/plugins_postindex/diskover_indexdiff/ python3 diskover_indexdiff.py --version
Windows:
cd "C:\Program Files\Diskover\plugins_postindex\diskover_indexdiff" python diskover_indexdiff.py --version
Configuration
Configuration is managed through the Diskover Admin Panel under Plugins > Post Index > Index Diff.
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the Index Diff plugin - covered in detail below!
Configuration Parameters
Parameter | Default | Description |
|---|---|---|
|
| Skip comparing files that have no hash value in either index. Set to |
|
| Skip files that exist in only one index (new files) when doing hash comparison. Set to |
|
| Directory where CSV output files are saved. Ensure this location has adequate disk space and appropriate permissions. |
Secondary Elasticsearch Configuration (Advanced)
For environments that need to compare indices across different Diskover systems (such as comparing a production cluster with a DR site), an optional secondary Elasticsearch connection can be configured. This is an uncommon scenario but available when needed.
When configured, use the --es2 command-line option to direct the plugin to retrieve the second index from the secondary Elasticsearch host.
Execution / Usage Guide
Command-Line Options
Option | Description |
|---|---|
| First index name (required) |
| Second index name for comparison |
| Root directory path to compare (required) |
| Alternate root directory for second index (use when mount points differ) |
| Enable hash comparison; modes: |
| Compare file sizes |
| Compare modification times |
| Compare hardlink counts |
| Filter files with Elasticsearch query string |
| Use secondary ES host (configured in Admin Panel) for index2 |
| Apply diff tags to documents in the first index |
| Skip CSV output (useful with |
| Export file list from single index without comparison |
| Compare two previously exported CSV files |
Manual Execution Examples
Example 1: Migration Validation with Checksum Verification
After migrating data from /mnt/source to /mnt/dest, verify all files transferred correctly:
Linux:
cd /opt/diskover/plugins_postindex/diskover_indexdiff/
python3 diskover_indexdiff.py \
-i diskover-source-2024.12 \
-I diskover-dest-2024.12 \
-d /mnt/source \
-D /mnt/dest \
-c md5 \
-s
Windows:
cd "C:\Program Files\Diskover\plugins_postindex\diskover_indexdiff" python diskover_indexdiff.py -i diskover-source-2024.12 -I diskover-dest-2024.12 -d D:\source -D E:\dest -c md5 -s
This compares files by path (adjusted for different roots), verifies MD5 checksums match, and checks file sizes.
Example 2: Change Detection Between Scans
Identify what changed between two weekly scans:
Linux:
cd /opt/diskover/plugins_postindex/diskover_indexdiff/
python3 diskover_indexdiff.py \
-i diskover-data-2024.12.01 \
-I diskover-data-2024.12.08 \
-d /data/project \
-s -m \
--tagindex
Windows:
cd "C:\Program Files\Diskover\plugins_postindex\diskover_indexdiff" python diskover_indexdiff.py -i diskover-data-2024.12.01 -I diskover-data-2024.12.08 -d D:\data\project -s -m --tagindex
This identifies files that were added, removed, or modified (by size or mtime) between the two scan dates and applies tags to the first index for easy searching.
Example 3: Filtered Comparison
Compare only specific file types:
Linux:
cd /opt/diskover/plugins_postindex/diskover_indexdiff/
python3 diskover_indexdiff.py \
-i diskover-media-2024 \
-I diskover-archive-2024 \
-d /media \
-q "extension:(mov OR mp4 OR mxf)" \
-c xxhash \
--tagindex
Windows:
cd "C:\Program Files\Diskover\plugins_postindex\diskover_indexdiff" python diskover_indexdiff.py -i diskover-media-2024 -I diskover-archive-2024 -d D:\media -q "extension:(mov OR mp4 OR mxf)" -c xxhash --tagindex
This compares only video files between the two indices using xxhash checksums.
Automated Execution
Index Differential can be scheduled to run automatically using Diskover's built-in task scheduling.
Option 1: Custom Task
Create a Custom Task in the Diskover Admin Panel to run the comparison on a schedule.
Sample Custom Task Configuration:
Here we can see the Run Command & args needed for the Custom Task - Note that in this case you cannot use the {indexname} variable as this is not a task that creates an index, so we must use the -l (toppath) CLI option and pass in our top path!
Option 2: Post-Crawl Command
Configure Index Differential to run automatically after an index completes by adding it as a Post-Crawl Command in your Index Task configuration.
Sample Post-Crawl Command configuraiton for Index Diff executing with an Index Task:
In your system ensure to replace the ConfigurationName above with a named configuraiton that you’ve created at Diskover Admin → Plugins → Post-Index → Index Diff – If you are not using a custom configuration and you’re just using Default than the -a flag and the ConfigurationName is not required!
Linux Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Available Index Task Tokens:
{indexname}— The name of the index that was just created
Important:
The Post-Crawl Command field should contain ONLY the executable (e.g.,
python3,python)All script paths, flags, and arguments go in the Post-Crawl Command Args field
Adjust the
-I(second index) and-d(root directory) values to match your comparison requirements
Reviewing the Output
CSV Reports
By default, Index Differential generates a timestamped CSV file containing all identified differences.
Output Location: The CSV files are saved to the directory specified in the csvdir configuration parameter (default: /tmp).
Filename Format: diskover_filediffs_<INDEX1>_<INDEX2>_<TIMESTAMP>.csv
Example: diskover_filediffs_diskover-prod-2024.12_diskover-backup-2024.12_2024_12_15_14_30_00.csv
CSV Columns
Column | Description |
|---|---|
| Difference marker indicating the type of difference |
| Full file path |
| File size in bytes |
| Modification time (ISO 8601 format) |
| Change time (ISO 8601 format) |
| Access time (ISO 8601 format) |
| Checksum value (column header includes hash mode used) |
| Number of hardlinks |
Understanding Difference Markers
Marker | Meaning |
|---|---|
| File exists only in index1 (first index) |
| File exists only in index2 (second index) |
| File size differs (showing file from index1) |
| File size differs (showing file from index2) |
| Modification time differs (showing file from index1) |
| Modification time differs (showing file from index2) |
| Checksum differs (showing file from index1) |
| Checksum differs (showing file from index2) |
| Hardlink count differs (showing file from index1) |
| Hardlink count differs (showing file from index2) |
Successful Execution
A successful run will display output similar to:
INFO: Starting diskover indexdiff ... INFO: Searching for all file docs in diskover-prod-2024.12 for path /data... INFO: Found 150000 file docs in diskover-prod-2024.12 INFO: Searching for all file docs in diskover-backup-2024.12 for path /data... INFO: Found 149985 file docs in diskover-backup-2024.12 < /data/projects/new_file.txt,1024,2024-12-15T10:30:00+00:00,... <!=size /data/reports/quarterly.xlsx,2048,2024-12-14T09:00:00+00:00,... INFO: done INFO: creating csv /tmp/diskover_filediffs_diskover-prod-2024.12_diskover-backup-2024.12_2024_12_15_14_30_00.csv... INFO: done INFO: all done
Sample CLI execution :
#python3 /opt/diskover/plugins_postindex/diskover_indexdiff/diskover_indexdiff.py -i diskover-new-index -I diskover-old-index -d /opt/diskover -D /opt/diskover -s -m 2026-04-09 23:08:37,590 - diskover.plugin.indexdiff - INFO - Starting diskover indexdiff ... 2026-04-09 23:08:37,590 - diskover.plugin.indexdiff - INFO - Using alternate configuration: Documentation Example 2026-04-09 23:08:37,590 - diskover.plugin.indexdiff - INFO - Starting diskover indexdiff ... 2026-04-09 23:08:37,592 - diskover.plugin.indexdiff - INFO - getting files from es... 2026-04-09 23:08:37,593 - diskover.plugin.indexdiff - INFO - Searching for all file docs in diskover-new-index for path /opt/diskover... 2026-04-09 23:08:37,594 - diskover.plugin.indexdiff - INFO - Searching for all file docs in diskover-old-index for path /opt/diskover... 2026-04-09 23:08:37,716 - diskover.plugin.indexdiff - INFO - Found 1134 file docs in diskover-new-index 2026-04-09 23:08:37,726 - diskover.plugin.indexdiff - INFO - Found 1076 file docs in diskover-old-index ..... (diff list of files) 2026-04-09 23:08:37,740 - diskover.plugin.indexdiff - INFO - done 2026-04-09 23:08:37,741 - diskover.plugin.indexdiff - INFO - creating csv /tmp/diskover_filediffs_diskover-new-index_diskover-old-index_2026_04_09_23_08_37.csv... 2026-04-09 23:08:37,741 - diskover.plugin.indexdiff - INFO - done 2026-04-09 23:08:37,741 - diskover.plugin.indexdiff - INFO - all done
Searching in Diskover
When you run Index Differential with the --tagindex option, the plugin applies descriptive tags to files in the first index. These tags make it easy to search for and filter differences directly in the Diskover web interface.
Tag Patterns
Tag Pattern | Description |
|---|---|
| File is new (exists only in the first index) |
| Checksum differs between indices |
| File size differs between indices |
| Modification time differs between indices |
| Hardlink count differs between indices |
Example Searches
Find all files with any difference:
tags:diff_*
Find files that are new (not in the second index):
tags:diff_newfile_*
Find files with checksum/hash differences:
tags:diff_hash_*
Find files with size differences:
tags:diff_size_*
Find files with modification time differences:
tags:diff_mtime_*
Find differences from a specific comparison (by second index name):
tags:*diskover-backup-2024*
Combine with other search criteria:
Find large files (over 1GB) with hash differences:
tags:diff_hash_* AND size:>1073741824
Find video files with any differences:
tags:diff_* AND extension:(mp4 OR mov OR mxf)
Troubleshooting
Index Not Found Error
Symptom: ERROR: diskover-indexname no such index!
Resolution:
Verify the index name is spelled correctly (index names are case-sensitive)
Confirm the index exists in Elasticsearch
For
--es2comparisons, ensure the secondary host is properly configured and reachable
Hash Comparison Shows No Results
Symptom: Using -c md5 but no hash differences are reported even though files differ.
Resolution:
Ensure the Checksums plugin was run on both indices before comparison
Check that
hash_skip_emptyis set appropriately in the configuration (default:trueskips files without hashes)Verify the hash mode matches what was used during checksum generation (e.g.,
md5,sha256,xxhash)
CSV Output Permission Error
Symptom: Permission denied when writing CSV file.
Resolution:
Ensure the
csvdirdirectory exists and is writable by the user running DiskoverCheck available disk space in the output directory
Update the
csvdirconfiguration to a writable location
Tags Not Applied to Documents
Symptom: Using --tagindex but no tags appear on documents.
Resolution:
Verify the plugin reported differences were found in the output
Check that the first index (
-i) is writableRefresh the index in Diskover or wait for automatic refresh
Search for tags using
tags:diff_*to confirm
Memory Issues with Large Indices
Symptom: Script runs slowly or encounters memory errors with very large indices.
Resolution:
Use
--esqueryto filter the comparison to a subset of filesFor very large indices, use
--filelistonlyto export file lists and compare offline with--comparecsvsConsider comparing specific directories rather than entire index roots
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.