CSV Enrichment
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview / Use Cases
The CSV Enrichment plugin lets you bring external metadata into Diskover by mapping data from standard CSV files onto your indexed files. After a Diskover scan completes, this plugin runs as a post-index process — reading your CSV, matching rows to indexed files by filename, and writing the enrichment data back into Elasticsearch. The result is that metadata from external systems (asset management platforms, legal databases, project trackers, etc.) becomes fully searchable alongside your file system data in the Diskover-Web user interface.
Why Use CSV Enrichment?
Bridge external systems to Diskover — If your metadata lives in a spreadsheet export, a DAM system dump, or a database extract, this plugin connects it to your indexed files without any custom development.
Make business context searchable — Go beyond file names and dates. Search by case ID, project code, department, approval status, or any other metadata your organization tracks.
Automate metadata workflows — Schedule enrichment to run automatically after every scan so your Diskover index always reflects the latest external data.
Common Use Cases
Asset Management Integration
Your Digital Asset Management (DAM) system exports asset metadata — asset IDs, creation dates, project associations — as CSV. Configure the plugin to map those columns into Diskover, and suddenly you can search for files by asset ID or project name directly in the Diskover-Web user interface.
Legal / eDiscovery
Legal hold notices require tracking which files are associated with specific case IDs, custodians, and matter numbers. Prepare a CSV with this information, enable tag generation (e.g., case:{case_id}), and use Diskover to quickly locate all files tied to a particular case.
Project Tracking
Project managers can export project assignments from their PM tool, enrich Diskover with project codes and workflow statuses, and then generate reports or track files through workflow stages — all from within Diskover's search interface.
Installation
The CSV Enrichment plugin is included with your Diskover Professional Edition (or higher) installation. The plugin files are located in the plugins_postindex/diskover_csv/ directory of your Diskover installation.
Prerequisites
Component | Requirement |
|---|---|
Python | 3.9 or higher |
Diskover | Core installation with Elasticsearch |
Storage | Read access to CSV files from the Diskover server |
License | PRO+ (Professional Edition or higher) |
No external Python dependencies are required — the plugin uses Python's built-in csv module.
Verify Installation
Confirm the plugin file is in place:
Linux:
ls -la /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py
Windows:
dir "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py"
You can also verify the plugin version:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py --version
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" --version
Configuration
Configuration is managed through the Diskover Admin Panel under Plugins > Post Index > CSV Enrichment.
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the CSV plugin - covered in detail below!
Main Settings
Parameter | Type | Default | Description |
|---|---|---|---|
| int | 4 | Maximum number of processing threads. Set to |
| string | "" | Full path to the CSV file containing enrichment data. |
| bool | True | If enabled, existing CSV fields on documents are overwritten with new values on subsequent runs. |
| int | 1000 | Number of documents processed in each Elasticsearch bulk update batch. |
| string | "csv" | Top-level field name that contains all CSV data in Elasticsearch. Leave blank to add fields directly at the document root. |
CSV Field Mappings
Each entry in the csv_fields list defines how a single CSV column maps to an Elasticsearch field:
Parameter | Type | Default | Description |
|---|---|---|---|
| string | (required) | Exact column header name from your CSV file (case-sensitive). |
| string | (required) | Name of the field to create in Elasticsearch. |
| string | "string" | Data type: |
| string | "%Y-%m-%d %H:%M:%S" | Python strftime format string for parsing datetime values. Only used when |
Supported Field Types
Type | Valid CSV Values | Elasticsearch Type |
|---|---|---|
| Any text | keyword/text |
| Date/time strings (see format table below) | date |
| Whole numbers (e.g., "123") | long |
| Decimal numbers (e.g., "123.45") | double |
| true, false, yes, no, 1, 0, on, off | boolean |
Common DateTime Formats:
Format String | Example Value |
|---|---|
| 2025-06-15 14:30:00 |
| 2025-06-15 |
| 06/15/2025 14:30:00 |
| 2025-06-15T14:30:00Z |
| 15-06-2025 |
Tip: If the configured format fails to parse a value, the plugin automatically tries common fallback formats and logs which format succeeded.
Path Mapping
Parameter | Type | Default | Description |
|---|---|---|---|
| string | "UNC path" | Name of the CSV column that contains file paths. |
Path Replacement
For environments where CSV paths differ from indexed paths (e.g., UNC paths vs. Linux mount points):
Parameter | Type | Default | Description |
|---|---|---|---|
| bool | False | Enable path replacement. |
| string | "" | Path prefix to find and replace (e.g., |
| string | "" | Replacement path prefix (e.g., |
Example Translation:
CSV Path | Translated Path |
|---|---|
|
|
Tag Generation
Parameter | Type | Default | Description |
|---|---|---|---|
| bool | False | Enable automatic tag generation from CSV field values. |
| string | "csv:{case_id}" | Template using |
Template Examples:
Template | CSV Data | Generated Tag |
|---|---|---|
| case_id: "12345" |
|
| project_name: "Alpha" |
|
| status: "active", priority: 1 |
|
Tags are appended to the document's existing tags field — they never replace existing tags.
Complete Configuration Example
Plugins:
Post Index:
CSV Enrichment:
Default:
maxthreads: 8
csv_file_path: /data/exports/asset_metadata.csv
overwrite_existing: true
batch_size: 1000
nested_field_name: csv
csv_fields:
- csv_column: case_id
es_field: case_id
field_type: string
- csv_column: created_datetime
es_field: created_datetime
field_type: datetime
datetime_format: "%Y-%m-%d %H:%M:%S"
- csv_column: project_name
es_field: project_name
field_type: string
- csv_column: priority
es_field: priority
field_type: integer
- csv_column: is_approved
es_field: is_approved
field_type: boolean
path_mapping:
csv_path_column: UNC path
replace_paths:
enable: true
from_path: "\\\\server\\share"
to_path: /mnt/server/share
add_tags: true
tag_template: "case:{case_id}"
Named Configurations
The CSV Enrichment plugin supports named configurations, allowing you to maintain multiple enrichment profiles for different workflows. For example, you might have one configuration for legal/eDiscovery enrichment and another for asset management — each pointing to a different CSV file with its own field mappings.
Named configurations are managed in the Diskover Admin Panel. To create one, add a new configuration block alongside Default under Plugins > Post Index > CSV Enrichment. You then reference the named configuration at runtime using the -c flag (see Execution below).
Execution / Usage Guide
The CSV Enrichment plugin can be run manually from the command line, or scheduled to run automatically after each scan using Custom Tasks or Post-Crawl Commands.
Command-Line Reference
Syntax:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py [OPTIONS] [INDEX]
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" [OPTIONS] [INDEX]
Options:
Option | Long Form | Description |
|---|---|---|
|
| Use a named configuration from the Admin Panel. |
|
| Override the configured CSV file path (takes precedence over the Admin Panel setting). |
|
| Auto-find the most recent index based on a top path. |
|
| Enable verbose logging. |
|
| Enable very verbose (debug) logging. |
| Print plugin version and exit. |
Manual Execution Examples
Basic run against a specific index:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py diskover-myindex
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" diskover-myindex
Use a named configuration:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -c legal_enrichment diskover-myindex
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -c legal_enrichment diskover-myindex
Override the CSV file at runtime:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -f /data/new_export.csv diskover-myindex
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -f "C:\data ew_export.csv" diskover-myindex
Auto-detect the latest index with verbose output:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -l /mnt/data -v
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -l "D:\data" -v
Automated Execution
Important: Never use crontab or local Task Scheduler configurations. All automated execution should be managed through Diskover's built-in scheduling tools.
Option 1: Custom Task
Schedule the CSV Enrichment plugin as a Custom Task in the Diskover Admin Panel. This allows you to run the plugin on a defined schedule or trigger it on demand.
Sample Custom Task Configuration:
Here we can see the Run Command & args needed for the Custom Task - Note that in this case you cannot use the {indexname} variable as this is not a task that creates an index, so we must use the -l (toppath) CLI option and pass in our top path!
Option 2: Post-Crawl Command
Attach the plugin to an Index Task as a Post-Crawl Command so it runs automatically every time that scan completes. This is ideal for keeping enrichment data in sync with your latest index.
Linux Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
To use a named configuration with a Post-Crawl Command:
Linux Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Available Index Task Tokens:
{indexname}— The name of the index that was just created
Important:
The Post-Crawl Command field should contain ONLY the executable (
python3orpython)All script paths, flags, and arguments go in the Post-Crawl Command Args field
Sample Post-Crawl Command configuraiton for the CSV plugin executing with an Index Task:
In your system ensure to replace the ConfigurationName above with a named configuraiton that you’ve created at Diskover Admin → Plugins → Post-Index → CSV Enrichment – If you are not using a custom configuration and you’re just using Default than the -c flag and the ConfigurationName is not required!
Expected Behavior During Execution
When the plugin runs, it follows this sequence:
Loads the CSV file — Parses rows and builds a lookup dictionary keyed by filename.
Queries Elasticsearch — Searches the target index for files matching CSV filenames (in efficient batches).
Enriches documents — Writes CSV metadata back to matching documents using bulk updates.
Reports progress — Logs statistics every 3 seconds showing documents processed and update rates.
Completes — Prints a final summary with total documents updated.
Reviewing the Output
Where to Find Results
Diskover-Web UI — Enriched metadata appears on file detail pages and is available in search results. If you configured a
nested_field_name(default:csv), the fields appear under that namespace (e.g.,csv.case_id,csv.project_name).Tags — If tag generation is enabled, generated tags appear in the standard
tagsfield on each enriched document.Logs — The plugin logs progress and results to the console during execution. Use
-v(verbose) or-V(very verbose) for additional detail.
What Success Looks Like
A successful run produces log output similar to:
Loaded 500 enrichment records from CSV Searching for 500 filenames from CSV Processed batch 1/1, matched 487 docs STATS (docs processed 487, docs updated 487, elapsed 12.3s) CSV enrichment complete. Total documents updated: 487
Identifying Problems
"0 matched docs" — The filenames in your CSV don't match any files in the index. Check filename spelling, case sensitivity, and path replacement settings.
Datetime parsing warnings — Your
datetime_formatdoesn't match the actual values in the CSV. Run with-Vto see sample values.Bulk upload errors — Usually indicates Elasticsearch issues. Check cluster health and available disk space.
Searching in Diskover
The CSV Enrichment plugin adds searchable metadata fields to your indexed documents. By default, all enriched fields are nested under the csv namespace, making them easy to find and query in the Diskover-Web search bar.
CSV File Format Requirements
Before diving into search, here's what your source CSV needs:
A header row with column names
A path column (configurable, default:
UNC path)UTF-8 encoding (with or without BOM)
Sample CSV:
UNC path,case_id,created_datetime,project_name,status \\server\share\documents\contract_2024.pdf,CASE-12345,2024-06-15 10:30:00,Project Alpha,active \\server\share\documents\invoice_q1.xlsx,CASE-12346,2024-07-20 14:15:30,Project Beta,pending \\server\share\images\logo_final.png,CASE-12345,2024-06-15 10:30:00,Project Alpha,active
Note: Files are matched by filename only (not full path). If multiple files in your index share the same filename, all of them will be enriched with the same CSV metadata.
Search Query Examples
Find all files enriched by the CSV plugin:
csv:*
Find files by case ID:
csv.case_id:CASE-12345
Find files by project name:
csv.project_name:"Project Alpha"
Find files enriched within a date range:
csv.created_datetime:[2025-01-01 TO 2025-12-31]
Find files by status:
csv.status:active
Find files tagged by the CSV plugin:
tags:case*
Combine CSV metadata with file system queries:
csv.case_id:CASE-12345 AND extension:pdf AND size:>1048576
csv.project_name:"Project Alpha" AND parent_path:*\/documents\/*
Find files that have NOT been enriched:
type:file AND NOT csv:*
Tip: Use the Diskover-Web filters panel alongside these queries to further narrow results by file type, size, age, or location.
Troubleshooting
CSV File Not Found
Symptom: Error message: CSV file not found: /path/to/file.csv
Solution:
Verify the file path in your configuration is correct.
Confirm the Diskover service account has read permissions on the CSV file.
Check for typos in the path — this is case-sensitive on Linux.
No Documents Matched
Symptom: The plugin completes but reports 0 documents updated.
Solution:
Confirm the filenames in your CSV exactly match the indexed filenames (case-sensitive).
If using UNC paths in your CSV, make sure path replacement is enabled and configured correctly.
Run with
-V(very verbose) to see the filenames being searched and whether path translation is working.
DateTime Parsing Failures
Symptom: Warnings in the log about failed datetime parsing.
Solution:
Run with
-Vto see sample values and the format being used.Ensure your
datetime_formatmatches the actual date format in your CSV. Common mistake: month/day order (US%m/%d/%Yvs. international%d/%m/%Y).The plugin tries fallback formats automatically, but setting the correct primary format avoids unnecessary warnings.
Encoding Errors
Symptom: UnicodeDecodeError or garbled characters in enriched fields.
Solution:
Save your CSV file with UTF-8 encoding. Most spreadsheet applications offer "CSV UTF-8" as an export option.
The plugin handles UTF-8 with BOM (
utf-8-sig) automatically.
Performance Tuning
If the plugin is running slowly on large CSV files:
Increase
maxthreads(e.g., 8–16) for more parallel processing.Increase
batch_sizefor more efficient Elasticsearch bulk operations.Verify your Elasticsearch cluster is healthy:
curl localhost:9200/_cluster/healthSchedule enrichment during off-peak hours when the Elasticsearch cluster is less loaded.
Debug Logging
For detailed troubleshooting, run the plugin with verbose flags:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -V diskover-myindex
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -V diskover-myindex
The -V flag enables very verbose output, showing CSV row processing, path translations, Elasticsearch queries, and bulk update details.
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.