Diskover Index Plugins - User Guide Index
Overview
Index plugins extend Diskover's metadata collection capabilities during the file system crawling process. Unlike post-index plugins that process data after a crawl completes, index plugins run in real-time as files are being indexed—automatically extracting additional metadata fields that become immediately searchable in Diskover.
This page serves as the central reference for all index plugin documentation. Each guide provides installation instructions, configuration details, field mappings, search query examples, and troubleshooting information.
Key Difference from Post-Index Plugins: Index plugins require no manual execution. Simply enable them in your Index Task Configuration, and they'll automatically process files during your next scan.
Plugin Categories
🧬 Bioinformatics & Scientific Data
Plugin | Description | Use Case |
|---|---|---|
Extracts metadata from BAM, SAM, and CRAM sequence alignment files including program groups, header comments, and command-line parameters. | Enable genomics data discovery by making alignment files searchable by software version, processing parameters, and pipeline metadata. |
📁 Directory & Project Metadata
Plugin | Description | Use Case |
|---|---|---|
Reads metadata from breadcrumb files (.birthday/volinfo) placed in directories, capturing ownership, quotas, project codes, and custom attributes. | Track directory ownership, enforce storage quotas, and enable project-based data governance through embedded metadata files. |
🔐 Data Integrity & Verification
Plugin | Description | Use Case |
|---|---|---|
Generates cryptographic hash values (xxHash, MD5, SHA1, SHA256) during indexing for file integrity verification and duplicate detection. | Verify data integrity across storage tiers, identify duplicate files, and support compliance requirements for data authenticity. |
📂 File Classification & Organization
Plugin | Description | Use Case |
|---|---|---|
Categorizes files into logical groups (Document, Video, Image, Audio, Archive, etc.) based on file extensions. | Search by file category instead of individual extensions—find all "Images" rather than searching for jpg, png, tiff separately. | |
Records an immutable timestamp when files first appear in the index, independent of filesystem modification times. | Track data arrival times for backup automation, implement retention policies, and audit when data entered your storage environment. | |
Tokenizes file paths into searchable components, extracting meaningful segments from directory structures. | Search by path segments, find files by project codes embedded in paths, and enable hierarchical organization queries. |
📈 Time Series & Analytics
Plugin | Description | Use Case |
|---|---|---|
Exports directory metrics to daily Elasticsearch indices for time series analysis, tracking size, file counts, and file age distributions over time. | Build Grafana dashboards for storage growth trends, capacity planning, and historical analysis of data accumulation patterns. | |
Grafana Cloud | Exports directory metrics to Grafana Cloud (Graphite) for time series analysis in managed SaaS environments. | Integrate storage analytics with cloud-based monitoring infrastructure without maintaining local time series databases. |
🎬 Media & Creative Assets
Plugin | Description | Use Case |
|---|---|---|
Extracts metadata from image files (JPEG, PNG, TIFF, GIF, etc.) including dimensions, format, and EXIF data such as camera model, GPS coordinates, and capture date. | Search media libraries by image dimensions, camera equipment, shooting location, or capture date for photo management and DAM workflows. | |
Extracts technical metadata from video and audio files including codecs, resolution, frame rate, bitrate, and audio channel configurations. | Enable post-production searches by codec, resolution, or bitrate—find all 4K ProRes files or identify videos needing transcoding. | |
Extracts metadata from OpenEXR image files used in VFX and CGI pipelines, including channel information, compression settings, and custom attributes. | Support VFX pipeline management by making render outputs searchable by channel configuration, compression, and production metadata. | |
Extracts metadata from PDF documents including author, title, creation date, page count, PDF version, and producer application. | Search document repositories by author, title, or creation date—find all PDFs created by a specific department or application. |
🏷️ Tagging & Metadata Preservation
Plugin | Description | Use Case |
|---|---|---|
Preserves user-applied tags when re-indexing by copying tags from the previous index to the newly created index. | Maintain tagging continuity across index rebuilds—ensure months of manual classification work isn't lost during re-crawls. |
🔒 Permissions & Security
Plugin | Description | Use Case |
|---|---|---|
Extracts Unix/Linux file permissions and ownership including numeric mode, owner/group names, and permission breakdowns. | Audit file permissions across storage, identify overly permissive files, and support security compliance requirements. | |
Extracts Windows file ownership information including owner username and domain for files on Windows file servers. | Track file ownership on Windows storage, audit data ownership, and support compliance reporting for Windows environments. |
Quick Reference
Plugins That Create Searchable Fields
The following plugins add new searchable fields to your Diskover indexes:
BAM Info —
bam_info.pg,bam_info.co,bam_info.co_cmdBreadcrumb —
volinfo.*(configurable: birthdate, owner, project, quota_gb, etc.)Checksums —
hash.fhash,hash.xxhash,hash.md5,hash.sha1,hash.sha256File Kind —
filekindFirst Index Time —
firstindextimeGrafana — Creates separate
logstash-*indices with@timestamp,path,size,file_count,fileagesGrafana Cloud — Exports metrics to Graphite (no Diskover fields added)
Image Info —
image_info.width,image_info.height,image_info.format,image_info.exif_*Media Info —
media_info.Video.*,media_info.Audio.*,media_info.General.*Open EXR —
exr_info.*(channels, compression, attributes)Path Tokens —
path_tokensPDF Info —
pdf_info.*(author, title, page_count, creator, etc.)Tag Copier — Preserves existing
tagsfieldUnix Perms —
unix_perms.*(mode, owner, group, permissions)Windows Owner —
windows_owner.*(owner, domain)
Plugins That Require External Dependencies
Plugin | Dependency |
|---|---|
BAM Info | pysam 0.22.1 (Python library for BAM/SAM/CRAM) |
Checksums | xxhash (Python library, required for xxHash algorithm) |
Image Info | Pillow 10.4.0+ (Python imaging library) |
Media Info | MediaInfo CLI or ffprobe (external tool) + pymediainfo 6.1.* |
Open EXR | OpenEXR Python library |
PDF Info | PyPDF2 or pikepdf (Python library) |
Plugins for Specific Industries
Life Sciences & Genomics
BAM Info — Genomics data discovery
Media & Entertainment
Image Info — Photo and image management
Media Info — Video/audio production workflows
Open EXR — VFX and CGI pipelines
Enterprise Storage Management
Checksums — Data integrity and compliance
File Kind — Storage analytics and classification
Unix Perms / Windows Owner — Security auditing
Getting Started
New to Index Plugins?
Start with File Kind — simple to configure, immediately useful for storage analysis
Add First Index Time to track data arrival patterns
Enable Tag Copier to preserve tags across re-indexing
Managing Media Assets?
Deploy Image Info for photo libraries
Add Media Info for video production workflows
Use Open EXR for VFX render management
Data Integrity & Compliance?
Enable Checksums for integrity verification
Add Unix Perms or Windows Owner for permission auditing
Building Analytics Dashboards?
Configure Grafana for on-premises time series
Use Grafana Cloud for SaaS monitoring infrastructure
How Index Plugins Work
Unlike post-index plugins that you run manually after crawls complete, index plugins operate automatically:
Enable — Configure the plugin in Diskover Admin and enable it in your Index Task Configuration
Scan — Run a normal index scan using that configuration
Automatic Processing — The plugin extracts metadata in real-time as each file is crawled
Immediate Availability — New fields are searchable in Diskover as soon as the scan completes
No command-line execution, no post-crawl tasks, no additional scheduling required.
Support & Resources
Diskover Documentation: docs.diskoverdata.com
Support Portal: support.diskoverdata.com
Last Updated: January 2026
Diskover Data, Inc.
Comments
0 comments
Please sign in to leave a comment.