Costs (Post)
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview / Use Cases
The Costs post-index plugin helps you understand the true expense of your data storage by calculating and assigning cost values to files and directories in Diskover. Once configured, it adds a costpergb field to your indexed data, enabling powerful cost analysis, chargeback reporting, and budget planning directly within the Diskover interface.
Why Use the Costs Plugin?
Storage costs can vary dramatically based on where data lives—premium SSD tiers cost significantly more than archive storage, and different departments may have different billing rates. This plugin lets you model those real-world costs and attach them to your actual data, making storage economics visible and actionable.
Key Capabilities:
Capability |
Description |
|---|---|
Cost Attribution |
Assign storage costs to files and directories based on location, type, or other criteria |
Flexible Pricing Rules |
Define multiple cost rates using Elasticsearch query syntax |
Binary/Decimal Support |
Calculate costs using either binary (1024) or decimal (1000) GB units to match your cost source |
Chargeback Reporting |
Generate cost reports by department, project, or any organizational structure |
Cost Analysis Dashboard |
Visualize storage costs with the built-in Cost Analysis dashboard |
Use Case 1: Departmental Chargeback / Showback
One of the most common uses for the Costs plugin is enabling IT departments to bill or report storage costs back to the business units consuming that storage.
The Challenge:
Organizations often struggle to answer basic questions like "How much does the Marketing department's storage actually cost us?" Without visibility into per-department costs, storage tends to grow unchecked because there's no financial accountability tied to consumption.
The Solution:
Configure the Costs plugin to assign your organization's actual $/GB storage rate to each department's directory structure. Once costs are calculated, you can:
Generate monthly or quarterly chargeback reports showing each department's storage costs
Create showback reports that demonstrate storage consumption without actual billing
Identify departments with unusually high storage costs for follow-up
Track cost trends over time to forecast budget needs
Example Workflow:
System Administrator configures cost rules mapping each department's root directory to the organization's storage cost rate (e.g., $0.12/GB for standard NAS storage)
Cost rules are applied automatically after each index crawl via Post-Crawl Command
Finance or IT Analyst uses the Cost Analysis dashboard to view total costs by directory, then exports data for chargeback invoicing
Department Managers can search Diskover to see their team's storage costs and identify large files or directories driving costs
Sample Configuration:
Query |
Cost Per GB |
Description |
|---|---|---|
|
0.12 |
Engineering department storage |
|
0.12 |
Marketing department storage |
|
0.15 |
Finance (higher security tier) |
|
0.18 |
Legal (compliance retention tier) |
|
0.15 |
Human Resources |
End-User Searches for Chargeback Analysis:
Once costs are applied, department managers or analysts can run searches like:
parent_path:\/data\/marketing* AND costpergb:[1 TO *]
Find all Marketing files costing more than $1
parent_path:\/data\/engineering* AND extension:zip AND costpergb:*
Find all Engineering zip archives and their costs
Use Case 2: Tiered Storage Cost Modeling
Organizations with multiple storage tiers (SSD, HDD, archive, cloud) need to understand the financial impact of data placement decisions.
The Challenge:
When data lives across multiple storage tiers with different price points, it's difficult to:
Understand the true cost of storing a project's data across tiers
Identify data that should be migrated to cheaper storage
Justify investments in data lifecycle management
Compare the cost-effectiveness of different storage strategies
The Solution:
Configure the Costs plugin with different rates for each storage tier. This creates a comprehensive view of storage costs that accounts for where data actually resides, not just how much data exists.
Example Workflow:
Storage Administrator documents the actual $/GB cost for each storage tier (often available from vendor contracts or cloud provider pricing)
System Administrator configures cost rules matching each tier's mount point or path structure to its corresponding cost rate
Capacity Planning Team uses the Cost Analysis dashboard to see total costs by tier and identify optimization opportunities
Data Owners search for their high-cost files on premium storage and evaluate whether they can be moved to archive tiers
Sample Configuration:
Query |
Cost Per GB |
Description |
|---|---|---|
|
0.45 |
Premium SSD - high performance workloads |
|
0.10 |
Standard HDD - general file storage |
|
0.023 |
Archive tier - infrequent access |
|
0.004 |
Cold storage - rare access, compliance retention |
For Cloud Storage (S3 Example):
If you're indexing AWS S3 buckets with storage class metadata:
Query |
Cost Per GB |
Description |
|---|---|---|
|
0.023 |
S3 Standard |
|
0.0125 |
S3 Infrequent Access |
|
0.004 |
S3 Glacier |
|
0.00099 |
S3 Glacier Deep Archive |
End-User Searches for Tier Optimization:
parent_path:\/tier1_ssd* AND atime:<now-180d AND costpergb:[0.10 TO *]
Find files on premium SSD that haven't been accessed in 6 months and cost more than $0.10 — candidates for migration to cheaper storage
parent_path:\/tier1_ssd* AND extension:(mp4 OR mov OR avi) AND costpergb:*
Find video files on premium SSD and their costs — large media files are often good candidates for archive storage
Additional Use Cases
Budget Planning & Forecasting:
Combine cost data with storage growth trends to project future budget requirements. By comparing costs across multiple index snapshots over time, you can answer questions like "What will our storage costs look like next quarter if growth continues at the current rate?"
Project-Based Cost Tracking:
Assign costs to project directories to understand the true storage cost of each initiative. This is particularly valuable for R&D organizations, media production companies, or any business where projects have distinct storage footprints.
Installation
The Costs plugin is included with Diskover PRO+ editions and does not require separate installation. The plugin files are located in the plugins_postindex directory of your Diskover installation.
Prerequisites
Requirement |
Details |
|---|---|
License |
Diskover PRO+ (Professional Edition or higher) |
Diskover Version |
2.x with post-index plugin support |
Python |
3.9 or higher (included with Diskover installation) |
Elasticsearch |
7.x or 8.x (as configured for your Diskover deployment) |
Verifying Installation
To confirm the plugin is available, check for the plugin file:
Linux:
ls -la /opt/diskover/plugins_postindex/diskover_costs.py
Windows:
dir "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py"
You can also verify the plugin version:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_costs.py --version
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py" --version
Configuration
Configuration is managed through the Diskover Admin Panel. Navigate to Plugins → Post Index → Costs to access the settings.
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the Costs plugin - covered in detail below!
Note: A default configuration is included with the plugin, but you should update it with your organization's actual storage costs before using the plugin in production.
Configuration Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
Max Threads |
Integer |
0 |
Number of parallel threads for processing. Set to |
Size Base |
2 or 10 |
2 |
How to calculate gigabytes. Use |
Size Field |
size / size_du |
size |
Which size field to use. Use |
Costs |
List |
(example rules) |
Your cost rules—each rule pairs an Elasticsearch query with a cost-per-GB rate. |
Understanding Size Base
This setting affects how costs are calculated and should match your cost source:
Base |
GB Calculation |
When to Use |
|---|---|---|
2 (Binary) |
1 GB = 1,073,741,824 bytes (1024³) |
Traditional storage reporting, NAS/SAN vendors, operating system file sizes |
10 (Decimal) |
1 GB = 1,000,000,000 bytes (1000³) |
Cloud provider billing (AWS, Azure, GCP), disk drive marketing capacity |
Tip: If your cost rates come from a cloud provider invoice, use base 10. If they come from your internal storage team's $/GB calculations based on capacity, ask which unit they use—this can make a ~7% difference in calculated costs.
Understanding Size Field
Field |
Description |
When to Use |
|---|---|---|
|
Logical file size in bytes |
Standard filesystems without hardlinks |
|
Disk usage (allocated size) in bytes |
Filesystems with hardlinks, sparse files, or when tracking actual disk allocation |
Recommendation: If your filesystem contains hardlinks (common in backup systems and some media workflows), use
size_duto avoid counting the same disk blocks multiple times.
Creating Cost Rules
Each cost rule consists of two parts:
Query — An Elasticsearch query string that identifies which files/directories receive this cost rate
Cost Per GB — The dollar (or other currency) amount per gigabyte
Query Syntax Tips:
Forward slashes in paths must be escaped with a backslash:
\/Use wildcards:
*matches multiple characters,?matches a single characterCombine conditions with
AND,OR,NOTQuery any indexed field, not just paths (e.g.,
extension:mp4,owner:jsmith)
Example Configuration: Departmental Chargeback
This configuration assigns cost rates to departmental directories:
Parameter |
Value |
|---|---|
Max Threads |
8 |
Size Base |
2 |
Size Field |
size |
Cost Rules:
Query |
Cost Per GB |
|---|---|
|
0.12 |
|
0.12 |
|
0.15 |
|
0.18 |
Example Configuration: Tiered Storage Pricing
This configuration models different storage tier costs:
Parameter |
Value |
|---|---|
Max Threads |
4 |
Size Base |
10 |
Size Field |
size |
Cost Rules:
Query |
Cost Per GB |
|---|---|
|
0.45 |
|
0.10 |
|
0.023 |
|
0.004 |
Execution / Usage Guide
The Costs plugin can be run manually from the command line or scheduled to run automatically after each index crawl completes.
Manual Execution
To run the plugin manually against an existing index:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_costs.py diskover-indexname-2024.01.15
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py" diskover-indexname-2024.01.15
Command-Line Options
Option |
Description |
|---|---|
|
Use a specific named configuration from the Admin Panel |
|
Enable detailed logging to see processing progress |
|
Display the plugin version |
|
Show help message |
Example with verbose output:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_costs.py -v diskover-mynas-2024.12.15
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py" -v diskover-mynas-2024.12.15
Example using a named configuration:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_costs.py -c "Cloud Pricing" diskover-s3bucket-2024.12.15
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py" -c "Cloud Pricing" diskover-s3bucket-2024.12.15
Automated Execution
For production environments, you'll want the Costs plugin to run automatically after each index crawl completes. Diskover provides two methods for scheduling this automation.
Option 1: Post-Crawl Command (Recommended)
Configure the Costs plugin to run immediately after a specific Index Task completes. This ensures cost data is always current for that index.
In the Diskover Admin Panel, edit your Index Task and configure the Post-Crawl Command:
Linux Example:
Field |
Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field |
Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Available Index Task Tokens:
{indexname}— The name of the index that was just created
Important: The Post-Crawl Command field should contain only the executable (
python3). All script paths, flags, and arguments go in the Post-Crawl Command Args field.
Sample Post-Crawl Command configuraiton for CIFS ACLS executing with an Index Task:
In your system ensure to replace the ConfigurationName above with a named configuraiton that you’ve created at Diskover Admin → Plugins → Post-Index → Costs – If you are not using a custom configuration and you’re just using Default than the -c flag and the ConfigurationName is not required!
Important:
The Post-Crawl Command field should contain ONLY the executable (e.g.,
python3,python)All script paths, flags, and arguments go in the Post-Crawl Command Args field
Option 2: Custom Task
Create a standalone scheduled task that runs the Costs plugin on a recurring basis. This is useful when you want to update costs across multiple indices or run cost calculations independently of indexing.
In the Diskover Admin Panel, navigate to Task Scheduling > Custom Tasks and create a new task:
Field |
Value |
|---|---|
Task Name |
Costs Plugin - Production NAS |
Command |
|
Arguments |
|
Schedule |
Daily at 2:00 AM (or your preferred schedule) |
Sample Custom Task Configuration:
Here we can see the Run Command & args needed for the Custom Task - Note that in this case you cannot use the {indexname} variable as this is not a task that creates an index, so we must use the -l (toppath) CLI option and pass in our top path!
Expected Behavior During Execution
When the plugin runs, it:
Connects to Elasticsearch and loads your cost configuration
For each cost rule, queries the index for matching documents
Calculates the cost for each document based on file size and your $/GB rate
Performs bulk updates to add the
costpergbfield to each document
With verbose mode (-v), you'll see progress like:
INFO: Starting diskover costs ... INFO: Finding and updating costs in index diskover-mynas-2024.12.15... INFO: es query: parent_path:\/data\/engineering* INFO: found 125000 matching docs INFO: thread 0 started updating costs in 1000 docs INFO: thread 0 finished updating costs for 1000 docs in 0.234s ...
Reviewing the Output
After the plugin completes, cost data is immediately available in your Diskover index.
Verifying Costs Were Applied
You can quickly verify that costs have been applied by searching for any document with a cost value:
In the Diskover search bar:
costpergb:*
This returns all documents that have been assigned a cost.
Understanding the Cost Calculation
The plugin calculates costs using this formula:
costpergb = cost_per_gb × (file_size_bytes ÷ gb_unit_size)
Example Calculation:
File size: 5,368,709,120 bytes (5 GiB)
Cost per GB: $0.15
Base: 2 (binary)
GB unit size: 1,073,741,824 bytes (1024³)
Calculation: 0.15 × (5,368,709,120 ÷ 1,073,741,824) = $0.75
Results are rounded to 6 decimal places for precision.
Where to Find Logs
Plugin execution logs are written to the Diskover worker logs:
Linux:
/var/log/diskover/diskover_worker.log
Windows:
C:\Program Files\Diskover\logs\diskover_worker.log
For verbose output during manual execution, the logs are displayed directly in your terminal.
Searching in Diskover
The Costs plugin adds a costpergb field to your indexed documents, enabling powerful cost-based searches and analytics.
Basic Cost Searches
Find all files with assigned costs:
costpergb:*
Find files costing more than $1:
costpergb:[1 TO *]
Find files costing between $0.10 and $5.00:
costpergb:[0.10 TO 5.00]
Find low-cost files (under $0.01):
costpergb:[0 TO 0.01]
Combined Searches
High-cost video files:
costpergb:[10 TO *] AND extension:mp4
Expensive files in a specific directory:
costpergb:[5 TO *] AND parent_path:\/data\/projects*
Files without cost data (not yet processed):
NOT costpergb:*
High-cost files owned by a specific user:
costpergb:[1 TO *] AND owner:jsmith
Old, expensive files on premium storage (migration candidates):
parent_path:\/tier1* AND atime:<now-365d AND costpergb:[1 TO *]
Cost Analysis Dashboard
Diskover includes a Cost Analysis dashboard that provides visual analytics for your storage costs. This dashboard is available once the costpergb field has been populated in your index.
Accessing the Cost Analysis Dashboard:
Navigate to the Diskover web interface
Select Analytics from the main navigation
Choose Cost Analysis from the available dashboards
Dashboard Capabilities:
The Cost Analysis dashboard enables you to:
View Total Costs — See aggregate storage costs across your entire index or filtered by search criteria
Cost by Directory — Visualize which directories are driving the highest storage costs, perfect for identifying chargeback amounts by department
Cost by File Type — Understand which file types (video, archives, documents) contribute most to storage expenses
Cost Distribution — See how costs are distributed across your storage, identifying outliers and optimization opportunities
Cost Trends — When viewing multiple indices over time, track how storage costs are growing or shrinking
Common Dashboard Workflows:
For Chargeback Reporting:
Open Cost Analysis dashboard
Filter to a specific department's directory path
View total cost and cost breakdown by subdirectory
Export data for billing or reporting
For Storage Optimization:
Open Cost Analysis dashboard
Sort by highest-cost directories
Drill down into expensive areas
Use combined searches to find old, large files in those directories for potential cleanup or archival
Troubleshooting
Costs Not Applied to Documents
Symptom: After running the plugin, the costpergb:* search returns no results.
Common Causes & Solutions:
-
Query syntax error — Ensure forward slashes in paths are escaped with
\/❌
parent_path:/data/projects*✅
parent_path:\/data\/projects*
Path doesn't match indexed data — Test your query directly in Diskover search to verify it matches documents
Index name incorrect — Double-check the exact index name (case-sensitive)
Diagnostic command (verbose mode):
Linux:
python3 /opt/diskover/plugins_postindex/diskover_costs.py -v diskover-indexname
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py" -v diskover-indexname
Look for found X matching docs in the output. If it shows found 0 matching docs, your query isn't matching any documents.
Incorrect Cost Values
Symptom: Cost values seem too high or too low compared to expectations.
Common Causes & Solutions:
Wrong size base — If using cloud provider pricing, ensure
baseis set to10(decimal). Cloud providers bill in decimal gigabytes (1000³), not binary (1024³). This difference alone can cause a ~7% variance in calculated costs.Wrong size field — If your filesystem has hardlinks, use
size_duinstead ofsizeto avoid inflated costs from double-counting.Cost rate units — Verify your
cost_per_gbvalues are per-gigabyte, not per-terabyte.
Manual Verification:
You can manually verify cost calculations:
File size: 5,368,709,120 bytes Cost per GB: $0.15 Base 2 (binary): 5,368,709,120 ÷ 1,073,741,824 × 0.15 = $0.75 Base 10 (decimal): 5,368,709,120 ÷ 1,000,000,000 × 0.15 = $0.81
Performance Issues on Large Indices
Symptom: Plugin runs very slowly or times out.
Solutions:
Increase thread count — Set
maxthreadsto a higher value (e.g., 8-16) or leave at0for auto-detectionRun during off-peak hours — Schedule via Custom Task during low-usage periods
Check Elasticsearch health — Ensure your Elasticsearch cluster has adequate resources
Configuration Not Loading
Symptom: Plugin appears to use default settings instead of your custom configuration.
Solutions:
Verify configuration is saved — In Admin Panel, ensure you clicked Save after making changes
-
Use explicit configuration name:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_costs.py -c "Default" diskover-indexname
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_costs.py" -c "Default" diskover-indexname
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.