First Index Time
License: PRO+ (Professional Edition or higher)
Plugin Type: Index Plugin
Author: Diskover Data, Inc.
Overview
The First Index Time plugin records the timestamp when files first appear in Diskover's index—commonly called the "arrival date" or "first index time." This metadata is invaluable for tracking when data enters your storage environment, independent of filesystem timestamps that may change during migrations, backups, or file modifications.
Unlike filesystem timestamps (ctime, mtime, atime), the first index time represents when Diskover first discovered and indexed a file. This provides an objective arrival date that remains consistent regardless of how the file was created or subsequently modified.
Key Capabilities
Persistent Tracking: Records the first-seen timestamp in a SQLite cache that persists across re-indexes
Inode Validation: Uses inode numbers to detect file replacements (same path, different file)
Directory Filtering: Include or exclude specific directories using exact paths or regex patterns
UTC Timestamps: Stores all timestamps in ISO 8601 format with UTC timezone for consistency
Use Cases
For Storage Administrators:
Use Case | Description |
|---|---|
Backup Automation | Identify files that arrived within a specific window (e.g., "last 7 days") to drive incremental backup workflows |
Storage Growth Analytics | Build dashboards and reports showing data arrival patterns over time to understand growth trends |
Capacity Planning | Analyze historical data ingest rates to forecast future storage needs |
Data Ingest Monitoring | Track daily, weekly, or monthly data arrival volumes across storage tiers |
For Data Governance and Compliance:
Use Case | Description |
|---|---|
Retention Policy Enforcement | Identify files based on when they entered storage for retention schedule compliance |
Data Age Verification | Prove when data arrived for regulatory audits and legal discovery |
Lifecycle Management | Trigger automated workflows based on how long data has been in the system |
Compliance Reporting | Generate reports showing data arrival patterns for compliance documentation |
For Business Workflows:
Use Case | Description |
|---|---|
Project Timeline Analysis | Determine when project deliverables first arrived in shared storage |
SLA Verification | Prove when vendor deliverables were received |
Content Freshness | Identify recently ingested content for review workflows |
Indexed Field
This plugin adds a single field to indexed documents:
Field | Elasticsearch Type | Description |
|---|---|---|
| date | UTC timestamp (ISO 8601 format) when the file was first indexed |
Understanding First Index Time
What is First Index Time?
First index time is the timestamp recorded when Diskover first encounters and indexes a file. This is fundamentally different from filesystem timestamps:
Timestamp | Source | What It Represents | Changes When |
|---|---|---|---|
| Filesystem | Last metadata change | Permissions, ownership, or content changes |
| Filesystem | Last content modification | File content is written |
| Filesystem | Last access | File is read (may be disabled) |
| Diskover | First appearance in index | Never changes once recorded |
Why Filesystem Timestamps Are Insufficient
Filesystem timestamps are often unreliable for tracking when data actually arrived in your storage:
File Migrations: Moving files between storage systems often resets or alters timestamps
Backup Restorations: Restored files may carry original timestamps from years ago
Archive Extraction: Extracted files inherit timestamps from when they were originally created
Copy Operations: Many copy tools preserve source timestamps rather than recording transfer time
Application Behavior: Some applications intentionally set timestamps to specific values
The first index time provides a reliable "arrival date" that reflects when data actually entered your managed storage environment.
How Inode Validation Works
The plugin uses inode numbers to ensure data integrity and detect file replacements:
Initial Index: Plugin records the file path, inode number, and current timestamp
Re-Index: Plugin looks up the cached entry by file path
Validation: If the stored inode matches the current inode, the cached timestamp is returned
Replacement Detection: If the inode differs (file was replaced), a new timestamp is recorded
This approach correctly handles scenarios where a file is deleted and replaced with a new file at the same path—the new file receives a new first index time, reflecting that it's genuinely a different file.
Requirements
System Requirements
Python 3.9 or higher
Diskover PRO+ license or higher
Write access to cache directory for SQLite database
Sufficient disk space for cache (minimal—approximately 100 bytes per unique file)
Python Dependencies
This plugin has no external Python dependencies beyond the Diskover core libraries. All required functionality is provided by Python's standard library.
Installation
Step 1: Configure the Plugin
Navigate to Diskover Admin → Plugins → Index Plugins
Locate First Index Time in the plugin list
Enable the plugin and configure parameters as needed
Save the configuration
Step 2: Enable in Index Task Configuration
Navigate to Diskover → Configurations → select your configuration (e.g., Default)
Scroll to the bottom and locate Index Plugins Enablement
Enable the First Index Time plugin
Save the configuration
The plugin will now run automatically during scans using this configuration.
Step 3: Verify Cache Directory
Ensure the cache directory exists and is writable by the Diskover service account:
Linux:
# Check default location ls -la /opt/diskover/__diskover_firstindextime_plugin_cache__/ # Create if needed mkdir -p /opt/diskover/__diskover_firstindextime_plugin_cache__ chown diskover:diskover /opt/diskover/__diskover_firstindextime_plugin_cache__ chmod 755 /opt/diskover/__diskover_firstindextime_plugin_cache__
Windows:
# Check default location dir "C:\Program Files\Diskover\__diskover_firstindextime_plugin_cache__" # Create if needed (run as Administrator) mkdir "C:\Program Files\Diskover\__diskover_firstindextime_plugin_cache__"
Configuration
Configuration Parameters
Parameter | Type | Default | Description |
|---|---|---|---|
| bool |
| Enable verbose logging for debugging and troubleshooting |
| string |
| Directory for SQLite cache database storage |
| int |
| Cache entry TTL in seconds. |
| list |
| Directories to include. Empty list = all directories. Supports regex patterns |
| list |
| Directories to exclude. Supports regex patterns |
Configuration Examples
Standard Configuration (Recommended)
Track all files with permanent cache retention:
{
"verbose": false,
"cachedir": "/opt/diskover/__diskover_firstindextime_plugin_cache__/",
"cache_expiretime": 0,
"include_dirs": [],
"exclude_dirs": []
}
Project-Specific Tracking
Track only files in specific project directories:
{
"verbose": false,
"cachedir": "/opt/diskover/__diskover_firstindextime_plugin_cache__/",
"cache_expiretime": 0,
"include_dirs": [
"/mnt/data/projects",
"/mnt/data/deliverables"
],
"exclude_dirs": []
}
Exclude Temporary Directories
Exclude scratch and temporary directories from tracking:
{
"verbose": false,
"cachedir": "/opt/diskover/__diskover_firstindextime_plugin_cache__/",
"cache_expiretime": 0,
"include_dirs": [],
"exclude_dirs": [
"/mnt/data/temp",
"/mnt/data/scratch",
".*\\.tmp$",
".*cache.*"
]
}
Understanding Directory Filters
The include_dirs and exclude_dirs parameters support multiple matching modes:
Pattern Type | Example | Matches |
|---|---|---|
Exact directory name |
| Directories named "temp" at any level |
Full path |
| Only that specific path |
Regex pattern |
| Any path containing "archive" |
Wildcard prefix |
| Directories ending with "2024" |
Note: Directory exclusion takes precedence over inclusion. If a path matches both lists, it is excluded.
Indexed Fields / Elasticsearch Mappings
Field Mapping
Field Path | ES Type | Format | Description |
|---|---|---|---|
| date | ISO 8601 | UTC timestamp when file was first indexed |
Field Characteristics
Date type: Enables date range queries and date histogram aggregations
ISO 8601 format: Example:
2025-03-15T14:30:00UTC timezone: All timestamps are stored in UTC for consistency
Immutable: Once recorded, the value never changes for that file (unless the inode changes)
Example Document
A file indexed with the First Index Time plugin:
{
"name": "quarterly_report.pdf",
"extension": "pdf",
"size": 2048576,
"mtime": "2025-01-10T09:15:22",
"firstindextime": "2025-03-15T14:30:00"
}
Note how firstindextime differs from mtime. The file was last modified in January but first appeared in Diskover's index in March—this could indicate the file was copied or migrated from another location.
Searching in Diskover
The firstindextime field supports the full range of Lucene date query syntax. Use these searches in the Diskover web interface search bar.
Basic Searches
Query | Description |
|---|---|
| Find all files that have first index time recorded |
| Files first indexed on a specific date |
Relative Date Searches
Diskover supports relative date math using now as the anchor point:
Query | Description |
|---|---|
| Files that arrived in the last 24 hours |
| Files that arrived in the last 7 days |
| Files that arrived in the last 30 days |
| Files that arrived in the last year |
| Files that arrived today (rounds to start of day) |
| Files that arrived this week |
| Files that arrived this month |
Date Range Searches
Query | Description |
|---|---|
| Files first indexed in Q1 2025 |
| Files first indexed in 2025 |
| Files first indexed more than 1 year ago |
| Files first indexed before 2025 |
Date Math Reference
Diskover uses Lucene date math syntax:
Unit | Meaning | Example |
|---|---|---|
| Days |
|
| Weeks |
|
| Months |
|
| Years |
|
| Hours |
|
| Minutes |
|
| Round to day |
|
| Round to week |
|
| Round to month |
|
Combined Searches
Combine first index time with other fields for powerful queries:
Query | Description |
|---|---|
| Large files (≥1GB) that arrived in the last week |
| PDF files first indexed in 2025 |
| Files in project folders that arrived last month |
| Old arrivals that haven't been modified in a year |
Backup Automation Example
To identify files for incremental backup (arrived in the last 7 days in a specific folder):
parent_path:/mnt/data/production/* AND firstindextime:[now-7d TO now]
Storage Growth Analytics Example
To analyze data arrival by quarter for capacity planning:
firstindextime:[2025-01-01 TO 2025-03-31] # Q1 arrivals firstindextime:[2025-04-01 TO 2025-06-30] # Q2 arrivals firstindextime:[2025-07-01 TO 2025-09-30] # Q3 arrivals firstindextime:[2025-10-01 TO 2025-12-31] # Q4 arrivals
Troubleshooting
Common Issues
Issue | Cause | Solution |
|---|---|---|
| Plugin not enabled or not loading | Verify plugin is enabled in both Plugin settings and Index Task Configuration |
Some files missing first index time | Files in excluded directories or not in included directories | Review |
Timestamps resetting unexpectedly | Cache was cleared or file inodes changed | Preserve cache directory; check if files were replaced or restored |
Cache directory permission errors | Diskover service account lacks write access | Set proper ownership and permissions on cache directory |
Plugin not loading | Import errors or missing dependencies | Check diskover logs for error messages during startup |
Verifying Plugin Operation
Check if plugin is loading:
Linux:
grep -i "firstindextime" /var/log/diskover/diskover.log | tail -20
Windows:
Check Diskover service logs or configured log location.Test plugin import manually:
Linux:
cd /opt/diskover python3 -c "from plugins.firstindextime import *; print('Plugin loaded successfully')"Windows:
cd "C:\Program Files\Diskover" python -c "from plugins.firstindextime import *; print('Plugin loaded successfully')"
Cache Management
The plugin stores first index times in a SQLite cache. If you need to clear the cache (which will reset all first index times), use:
Linux:
rm -rf /opt/diskover/__diskover_firstindextime_plugin_cache__/
Windows:
Remove-Item -Recurse -Force "C:\Program Files\Diskover\__diskover_firstindextime_plugin_cache__"
Warning: Clearing the cache will cause all files to receive new first index times on the next scan. Only clear the cache if you intentionally want to reset arrival tracking.
Debug Logging
Enable verbose logging in the plugin configuration to troubleshoot issues:
{
"verbose": true
}
Then monitor logs during indexing:
Linux:
tail -f /var/log/diskover/diskover.log | grep -i firstindextime
Support
Last Updated: January 2026
Diskover Data, Inc.
Comments
0 comments
Please sign in to leave a comment.