ElasticSearch Query Report Plugin
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview
The ElasticSearch Query Report plugin lets you run queries against your Diskover indices and automatically generate reports in CSV or Parquet format. Think of it as a way to extract exactly the data you need from Diskover and package it up for sharing, analysis, or compliance purposes.
Whether you need to send a weekly report of files matching certain criteria, feed data into an analytics platform, or create an audit trail for compliance, this plugin handles the heavy lifting of querying, formatting, and delivering the results.
Why Use This Plugin?
Automate recurring reports — Set it and forget it with scheduled jobs
Export data in the format you need — CSV for spreadsheets and human review, Parquet for analytics tools
Deliver reports automatically — Email CSV reports directly to stakeholders
Extract exactly what matters — Choose which fields to include, including metadata from other plugins
Use Cases
Compliance Reporting
Need to prove which files exist in certain locations, identify files that haven't been accessed in years, or document files matching specific naming patterns? The ES Query Report plugin makes this straightforward.
Example scenarios:
Generate a monthly report of all files over 1GB that haven't been modified in 2+ years
Create an inventory of files with specific extensions in regulated directories
Document files tagged with compliance-related tags for audit purposes
Scheduled Reporting
Combine this plugin with Diskover's built-in task scheduling to create fully automated reporting workflows. Reports can be generated on any schedule and automatically emailed to the right people.
Example scenarios:
Daily reports of newly created files in sensitive directories
Weekly summaries of large files for storage management review
Monthly compliance snapshots sent directly to your compliance team
Installation
Prerequisites
Component |
Requirement |
|---|---|
Python |
3.9 or higher |
Diskover |
Core installation with plugin support |
Elasticsearch |
7.x or 8.x (as supported by Diskover) |
Python Dependencies
The plugin requires two additional Python packages for data handling:
Package |
Purpose |
|---|---|
pandas |
Data manipulation and CSV generation |
pyarrow |
Parquet file format support |
Installation Steps
-
Ensure the plugin file is in your Diskover plugins directory:
Linux:
/opt/diskover/plugins_postindex/diskover_esqueryreport/diskover_esqueryreport.py
Windows:
C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\diskover_esqueryreport.py
-
Install the required Python dependencies:
Linux:
python3 -m pip install -r /opt/diskover/plugins_postindex/diskover_esqueryreport/requirements.txt
Windows (PowerShell):
python -m pip install -r "C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\requirements.txt"
-
Verify the installation by checking that the dependencies load correctly:
Linux:
python3 -c "import pandas; import pyarrow; print('Dependencies OK')"Windows (PowerShell):
python -c "import pandas; import pyarrow; print('Dependencies OK')"
Configuration
Configuration is managed through the Diskover Admin Panel. Navigate to Plugins → Post Index → ES Query Report to access the settings.
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the ES Query Report plugin - covered in detail below!
Configuration Parameters
Option |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(empty) |
The Elasticsearch query to run (required) |
|
list |
name, parent_path, extension, type, size, size_du, mtime, tags |
Fields to include in the report |
|
boolean |
true |
Show sizes as "1.5 GB" instead of bytes (CSV only) |
|
boolean |
true |
Generate a CSV file |
|
string |
diskover-%indexname-%m%d%Y_%H%M%S.csv |
CSV filename pattern |
|
string |
/tmp/ |
Where to save CSV files |
|
boolean |
false |
Generate a Parquet file |
|
string |
diskover-%indexname-%m%d%Y_%H%M%S.parquet |
Parquet filename pattern |
|
string |
/tmp/ |
Where to save Parquet files |
|
boolean |
true |
Compress CSV files to ZIP format |
|
boolean |
false |
Delete the CSV after creating the ZIP |
|
boolean |
true |
Email the report when complete |
|
list |
(empty) |
Email addresses to receive the report |
|
string |
(empty) |
Sender email address |
|
string |
Diskover ES Query Report |
Subject line for report emails |
|
boolean |
true |
Attach the CSV/ZIP file to the email |
Filename Template Tokens
You can use these tokens in your filename patterns to create dynamic, descriptive filenames:
Token |
Description |
Example Output |
|---|---|---|
%indexname |
The Diskover index name |
diskover-data-2024 |
%toppath |
Top path with underscores |
mnt_storage_archive |
%Y |
4-digit year |
2024 |
%m |
Month (01-12) |
12 |
%d |
Day (01-31) |
05 |
%H |
Hour (00-23) |
14 |
%M |
Minute (00-59) |
30 |
%S |
Second (00-59) |
45 |
All standard Python strftime codes are supported.
Example Configuration: Compliance Report
This configuration generates a weekly compliance report of large, old files:
query: "size:>1073741824 AND mtime:<now-2y" doc_fields: - name - parent_path - size - mtime - atime - owner human_sizes: true csv_output: true csv_file: compliance-old-large-files-%Y%m%d.csv csv_output_dir: /data/reports/ zipcsv: true send_email: true email_to: - compliance@company.com - storage-admin@company.com email_subject: "Weekly Compliance Report - Large Old Files" attachcsv: true parquet_output: false
Working with Nested Fields
If you're using other Diskover plugins that add metadata (like media info or checksums), you can include those fields using dot notation:
doc_fields: - name - parent_path - size - mediainfo.duration - mediainfo.video_codec - hash.md5
Fields that don't exist for a particular file will show as empty in the report.
Running the Plugin
Basic Usage
Run the plugin from the command line, specifying the index to query:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_esqueryreport/diskover_esqueryreport.py diskover-indexname
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\diskover_esqueryreport.py" diskover-indexname
The plugin will use the configuration from Diskover Admin by default.
Command Line Options
Option |
Description |
|---|---|
|
Use a specific named configuration |
|
Override the query from config |
|
Override recipient (use multiple times for multiple recipients) |
|
Override the email subject |
|
Automatically find the most recent index for a path |
|
Enable verbose logging |
|
Enable very verbose logging |
|
Print version and exit |
Example: Ad-Hoc Compliance Query
Run a one-time query to find all PDF files larger than 100MB and email the results:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_esqueryreport/diskover_esqueryreport.py \ -q "extension:pdf AND size:>104857600" \ -e compliance@company.com \ -s "Ad-Hoc Report: Large PDF Files" \ diskover-archive-2024
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\diskover_esqueryreport.py" ` -q "extension:pdf AND size:>104857600" ` -e compliance@company.com ` -s "Ad-Hoc Report: Large PDF Files" ` diskover-archive-2024
Example: Using the Latest Index
When you want to always query the most recent index for a particular storage path:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_esqueryreport/diskover_esqueryreport.py \ -l /mnt/storage/archive \ -c compliance_config
Windows (PowerShell):
python "C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\diskover_esqueryreport.py" ` -l "D:\storage\archive" ` -c compliance_config
Setting Up Scheduled Reports
To run reports automatically on a schedule, use Diskover's built-in task scheduling features rather than configuring local cron jobs or Task Scheduler.
Option 1: Custom Task
Create a Custom Task in Diskover Admin to run the ES Query Report plugin on a defined schedule.
Navigate to Task Panel → Custom Tasks in Diskover Admin
-
Create a new Custom Task with the following command:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_esqueryreport/diskover_esqueryreport.py -l /mnt/data -c daily_report
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\diskover_esqueryreport.py" -l "D:\data" -c daily_report
Configure the schedule (daily, weekly, monthly, etc.)
Save and enable the task
Option 2: Post-Crawl Command (Index Task)
Run the ES Query Report automatically after an index completes by adding it as a Post-Crawl Command to your Index Task.
Navigate to Task Panel → Index Tasks in Diskover Admin
Edit the Index Task you want to trigger the report from
Save the Index Task
Reviewing the Output
CSV Output
CSV files are human-readable and can be opened in any spreadsheet application like Excel, Google Sheets, or LibreOffice Calc.
What to expect:
First row contains column headers matching your
doc_fieldsconfigurationSize fields show human-readable values (e.g., "1.5 GB") when
human_sizesis enabledList fields (like tags) are joined with semicolons
ZIP compression is applied by default to reduce file size
Parquet Output
Parquet files are optimized for analytics tools and maintain raw numeric values for accurate calculations.
Compatible with:
Apache Spark
Pandas (Python)
AWS Athena
Databricks
Snowflake
Note: Parquet files cannot be emailed—they're saved to the configured output directory for pickup by your analytics pipeline.
Email Reports
When email is enabled, you'll receive a message containing:
The query that was executed
The attached CSV or ZIP file (if
attachcsvis enabled)A reference to where the file was saved
Verifying Successful Execution
A successful run will show log output similar to:
INFO - Starting diskover es query report ... INFO - es query: size:>1073741824 AND mtime:<now-2y INFO - found 247 matching docs INFO - Finished searching all index docs matching query INFO - Saving report to /data/reports/compliance-old-large-files-20240115.csv INFO - Compressing report /data/reports/compliance-old-large-files-20240115.csv ... INFO - Emailing report to compliance@company.com, storage-admin@company.com
Troubleshooting
No Documents Found
Symptom: Log shows "No docs found matching query!"
What to check:
Test your query in the Diskover web interface first to verify it returns results
Confirm the index name is correct and the index contains data
Verify your query syntax—the plugin uses Lucene query string syntax, not Elasticsearch DSL
Quick test:
curl -s "localhost:9200/diskover-myindex/_count" | jq '.count'
Parquet Export Fails
Symptom: Error about missing pandas or pyarrow
Solution: Install the required dependencies:
Linux:
python3 -m pip install -r /opt/diskover/plugins_postindex/diskover_esqueryreport/requirements.txt
Windows (PowerShell):
python -m pip install -r "C:\Program Files\Diskover\plugins_postindex\diskover_esqueryreport\requirements.txt"
Verify installation:
Linux:
python3 -c "import pandas; import pyarrow; print('OK')"
Windows (PowerShell):
python -c "import pandas; import pyarrow; print('OK')"
CSV File Not Created
Symptom: No CSV file appears in the output directory
What to check:
Verify the output directory exists and is writable by the diskover user
Check available disk space:
df -h /tmp/(Linux) or check drive properties (Windows)Review the plugin logs for write errors
Email Not Sent
Symptom: CSV is generated but no email arrives
What to check:
Verify SMTP settings are configured correctly in Diskover Admin
Confirm
email_toandemail_fromare both set-
Test SMTP connectivity:
nc -zv smtp.company.com 587
Check spam/junk folders for the report email
Nested Fields Return Empty Values
Symptom: Dot-notation fields like mediainfo.duration are empty in the report
What to check:
Confirm the source plugin (e.g., mediainfo) ran during indexing
Verify the exact field path—field names are case-sensitive
-
Check if the field exists in Elasticsearch:
curl -s "localhost:9200/diskover-myindex/_search?size=1" | jq '._source.mediainfo'
Need More Data?
Enable verbose logging with
-vor-Vflags for more detailed output-
Check the Diskover logs for additional error details:
Linux:
/var/log/diskover/Windows:
C:\Program Files\Diskover\logs\
Quick Reference
Minimum Required Configuration
query: "your-elasticsearch-query-here" email_to: - recipient@company.com email_from: diskover@company.com
Common Query Examples
Goal |
Query |
|---|---|
Files over 1GB |
|
Files not modified in 2 years |
|
Specific file types |
|
Files in a path |
|
Tagged files |
|
Combined criteria |
|
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.