CSV Enrichment

License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.

Overview / Use Cases

The CSV Enrichment plugin lets you bring external metadata into Diskover by mapping data from standard CSV files onto your indexed files. After a Diskover scan completes, this plugin runs as a post-index process — reading your CSV, matching rows to indexed files by filename, and writing the enrichment data back into Elasticsearch. The result is that metadata from external systems (asset management platforms, legal databases, project trackers, etc.) becomes fully searchable alongside your file system data in the Diskover-Web user interface.

Why Use CSV Enrichment?

Bridge external systems to Diskover — If your metadata lives in a spreadsheet export, a DAM system dump, or a database extract, this plugin connects it to your indexed files without any custom development.
Make business context searchable — Go beyond file names and dates. Search by case ID, project code, department, approval status, or any other metadata your organization tracks.
Automate metadata workflows — Schedule enrichment to run automatically after every scan so your Diskover index always reflects the latest external data.

Common Use Cases

Asset Management Integration
Your Digital Asset Management (DAM) system exports asset metadata — asset IDs, creation dates, project associations — as CSV. Configure the plugin to map those columns into Diskover, and suddenly you can search for files by asset ID or project name directly in the Diskover-Web user interface.

Legal / eDiscovery
Legal hold notices require tracking which files are associated with specific case IDs, custodians, and matter numbers. Prepare a CSV with this information, enable tag generation (e.g., case:{case_id}), and use Diskover to quickly locate all files tied to a particular case.

Project Tracking
Project managers can export project assignments from their PM tool, enrich Diskover with project codes and workflow statuses, and then generate reports or track files through workflow stages — all from within Diskover's search interface.

Installation

The CSV Enrichment plugin is included with your Diskover Professional Edition (or higher) installation. The plugin files are located in the plugins_postindex/diskover_csv/ directory of your Diskover installation.

Prerequisites

Component	Requirement
Python	3.9 or higher
Diskover	Core installation with Elasticsearch
Storage	Read access to CSV files from the Diskover server
License	PRO+ (Professional Edition or higher)

No external Python dependencies are required — the plugin uses Python's built-in csv module.

Verify Installation

Confirm the plugin file is in place:

Linux:

ls -la /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py

Windows:

dir "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py"

You can also verify the plugin version:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py --version

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" --version

Configuration

Configuration is managed through the Diskover Admin Panel under Plugins > Post Index > CSV Enrichment.

Sample Configuraiton in Diskover Admin:

Here is the beginning of our sample configuration There are many other configuraitons for the CSV plugin - covered in detail below!

Main Settings

Parameter	Type	Default	Description
`maxthreads`	int	4	Maximum number of processing threads. Set to `0` for auto-detection based on CPU cores.
`csv_file_path`	string	""	Full path to the CSV file containing enrichment data.
`overwrite_existing`	bool	True	If enabled, existing CSV fields on documents are overwritten with new values on subsequent runs.
`batch_size`	int	1000	Number of documents processed in each Elasticsearch bulk update batch.
`nested_field_name`	string	"csv"	Top-level field name that contains all CSV data in Elasticsearch. Leave blank to add fields directly at the document root.

CSV Field Mappings

Each entry in the csv_fields list defines how a single CSV column maps to an Elasticsearch field:

Parameter	Type	Default	Description
`csv_column`	string	(required)	Exact column header name from your CSV file (case-sensitive).
`es_field`	string	(required)	Name of the field to create in Elasticsearch.
`field_type`	string	"string"	Data type: `string`, `datetime`, `integer`, `float`, or `boolean`.
`datetime_format`	string	"%Y-%m-%d %H:%M:%S"	Python strftime format string for parsing datetime values. Only used when `field_type` is `datetime`.

Supported Field Types

Type	Valid CSV Values	Elasticsearch Type
`string`	Any text	keyword/text
`datetime`	Date/time strings (see format table below)	date
`integer`	Whole numbers (e.g., "123")	long
`float`	Decimal numbers (e.g., "123.45")	double
`boolean`	true, false, yes, no, 1, 0, on, off	boolean

Common DateTime Formats:

Format String	Example Value
`%Y-%m-%d %H:%M:%S`	2025-06-15 14:30:00
`%Y-%m-%d`	2025-06-15
`%m/%d/%Y %H:%M:%S`	06/15/2025 14:30:00
`%Y-%m-%dT%H:%M:%SZ`	2025-06-15T14:30:00Z
`%d-%m-%Y`	15-06-2025

Tip: If the configured format fails to parse a value, the plugin automatically tries common fallback formats and logs which format succeeded.

Path Mapping

Parameter	Type	Default	Description
`csv_path_column`	string	"UNC path"	Name of the CSV column that contains file paths.

Path Replacement

For environments where CSV paths differ from indexed paths (e.g., UNC paths vs. Linux mount points):

Parameter	Type	Default	Description
`enable`	bool	False	Enable path replacement.
`from_path`	string	""	Path prefix to find and replace (e.g., `\\server\share`).
`to_path`	string	""	Replacement path prefix (e.g., `/mnt/server/share`).

Example Translation:

CSV Path	Translated Path
`\\server\share\docs\file.pdf`	`/mnt/server/share/docs/file.pdf`

Tag Generation

Parameter	Type	Default	Description
`add_tags`	bool	False	Enable automatic tag generation from CSV field values.
`tag_template`	string	"csv:{case_id}"	Template using `{field_name}` placeholders that reference your configured ES field names.

Template Examples:

Template	CSV Data	Generated Tag
`case:{case_id}`	case_id: "12345"	`case:12345`
`project:{project_name}`	project_name: "Alpha"	`project:Alpha`
`{status}_{priority}`	status: "active", priority: 1	`active_1`

Tags are appended to the document's existing tags field — they never replace existing tags.

Complete Configuration Example

Plugins:
  Post Index:
    CSV Enrichment:
      Default:
        maxthreads: 8
        csv_file_path: /data/exports/asset_metadata.csv
        overwrite_existing: true
        batch_size: 1000
        nested_field_name: csv
        csv_fields:
          - csv_column: case_id
            es_field: case_id
            field_type: string
          - csv_column: created_datetime
            es_field: created_datetime
            field_type: datetime
            datetime_format: "%Y-%m-%d %H:%M:%S"
          - csv_column: project_name
            es_field: project_name
            field_type: string
          - csv_column: priority
            es_field: priority
            field_type: integer
          - csv_column: is_approved
            es_field: is_approved
            field_type: boolean
        path_mapping:
          csv_path_column: UNC path
        replace_paths:
          enable: true
          from_path: "\\\\server\\share"
          to_path: /mnt/server/share
        add_tags: true
        tag_template: "case:{case_id}"

Named Configurations

The CSV Enrichment plugin supports named configurations, allowing you to maintain multiple enrichment profiles for different workflows. For example, you might have one configuration for legal/eDiscovery enrichment and another for asset management — each pointing to a different CSV file with its own field mappings.

Named configurations are managed in the Diskover Admin Panel. To create one, add a new configuration block alongside Default under Plugins > Post Index > CSV Enrichment. You then reference the named configuration at runtime using the -c flag (see Execution below).

Execution / Usage Guide

The CSV Enrichment plugin can be run manually from the command line, or scheduled to run automatically after each scan using Custom Tasks or Post-Crawl Commands.

Command-Line Reference

Syntax:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py [OPTIONS] [INDEX]

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" [OPTIONS] [INDEX]

Options:

Option	Long Form	Description
`-c`	`--configurationname`	Use a named configuration from the Admin Panel.
`-f`	`--csvfile`	Override the configured CSV file path (takes precedence over the Admin Panel setting).
`-l`	`--latestindex`	Auto-find the most recent index based on a top path.
`-v`	`--verbose`	Enable verbose logging.
`-V`	`--vverbose`	Enable very verbose (debug) logging.
	`--version`	Print plugin version and exit.

Manual Execution Examples

Basic run against a specific index:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py diskover-myindex

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" diskover-myindex

Use a named configuration:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -c legal_enrichment diskover-myindex

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -c legal_enrichment diskover-myindex

Override the CSV file at runtime:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -f /data/new_export.csv diskover-myindex

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -f "C:\data
ew_export.csv" diskover-myindex

Auto-detect the latest index with verbose output:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -l /mnt/data -v

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -l "D:\data" -v

Automated Execution

Important: Never use crontab or local Task Scheduler configurations. All automated execution should be managed through Diskover's built-in scheduling tools.

Option 1: Custom Task

Schedule the CSV Enrichment plugin as a Custom Task in the Diskover Admin Panel. This allows you to run the plugin on a defined schedule or trigger it on demand.

Sample Custom Task Configuration:

Here we can see the Run Command & args needed for the Custom Task - Note that in this case you cannot use the {indexname} variable as this is not a task that creates an index, so we must use the -l (toppath) CLI option and pass in our top path!

Option 2: Post-Crawl Command

Attach the plugin to an Index Task as a Post-Crawl Command so it runs automatically every time that scan completes. This is ideal for keeping enrichment data in sync with your latest index.

Linux Example:

Field	Value
Post-Crawl Command	`python3`
Post-Crawl Command Args	`/opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py {indexname}`

Windows Example:

Field	Value
Post-Crawl Command	`python`
Post-Crawl Command Args	`"C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" {indexname}`

To use a named configuration with a Post-Crawl Command:

Linux Example:

Field	Value
Post-Crawl Command	`python3`
Post-Crawl Command Args	`/opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -c legal_enrichment {indexname}`

Windows Example:

Field	Value
Post-Crawl Command	`python`
Post-Crawl Command Args	`"C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -c legal_enrichment {indexname}`

Available Index Task Tokens:

{indexname} — The name of the index that was just created

Important:

The Post-Crawl Command field should contain ONLY the executable (python3 or python)
All script paths, flags, and arguments go in the Post-Crawl Command Args field

Sample Post-Crawl Command configuraiton for the CSV plugin executing with an Index Task:

In your system ensure to replace the ConfigurationName above with a named configuraiton that you’ve created at Diskover Admin → Plugins → Post-Index → CSV Enrichment – If you are not using a custom configuration and you’re just using Default than the -c flag and the ConfigurationName is not required!

Expected Behavior During Execution

When the plugin runs, it follows this sequence:

Loads the CSV file — Parses rows and builds a lookup dictionary keyed by filename.
Queries Elasticsearch — Searches the target index for files matching CSV filenames (in efficient batches).
Enriches documents — Writes CSV metadata back to matching documents using bulk updates.
Reports progress — Logs statistics every 3 seconds showing documents processed and update rates.
Completes — Prints a final summary with total documents updated.

Reviewing the Output

Where to Find Results

Diskover-Web UI — Enriched metadata appears on file detail pages and is available in search results. If you configured a nested_field_name (default: csv), the fields appear under that namespace (e.g., csv.case_id, csv.project_name).
Tags — If tag generation is enabled, generated tags appear in the standard tags field on each enriched document.
Logs — The plugin logs progress and results to the console during execution. Use -v (verbose) or -V (very verbose) for additional detail.

What Success Looks Like

A successful run produces log output similar to:

Loaded 500 enrichment records from CSV
Searching for 500 filenames from CSV
Processed batch 1/1, matched 487 docs
STATS (docs processed 487, docs updated 487, elapsed 12.3s)
CSV enrichment complete. Total documents updated: 487

Identifying Problems

"0 matched docs" — The filenames in your CSV don't match any files in the index. Check filename spelling, case sensitivity, and path replacement settings.
Datetime parsing warnings — Your datetime_format doesn't match the actual values in the CSV. Run with -V to see sample values.
Bulk upload errors — Usually indicates Elasticsearch issues. Check cluster health and available disk space.

Searching in Diskover

The CSV Enrichment plugin adds searchable metadata fields to your indexed documents. By default, all enriched fields are nested under the csv namespace, making them easy to find and query in the Diskover-Web search bar.

CSV File Format Requirements

Before diving into search, here's what your source CSV needs:

A header row with column names
A path column (configurable, default: UNC path)
UTF-8 encoding (with or without BOM)

Sample CSV:

UNC path,case_id,created_datetime,project_name,status
\\server\share\documents\contract_2024.pdf,CASE-12345,2024-06-15 10:30:00,Project Alpha,active
\\server\share\documents\invoice_q1.xlsx,CASE-12346,2024-07-20 14:15:30,Project Beta,pending
\\server\share\images\logo_final.png,CASE-12345,2024-06-15 10:30:00,Project Alpha,active

Note: Files are matched by filename only (not full path). If multiple files in your index share the same filename, all of them will be enriched with the same CSV metadata.

Search Query Examples

Find all files enriched by the CSV plugin:

csv:*

Find files by case ID:

csv.case_id:CASE-12345

Find files by project name:

csv.project_name:"Project Alpha"

Find files enriched within a date range:

csv.created_datetime:[2025-01-01 TO 2025-12-31]

Find files by status:

csv.status:active

Find files tagged by the CSV plugin:

tags:case*

Combine CSV metadata with file system queries:

csv.case_id:CASE-12345 AND extension:pdf AND size:>1048576

csv.project_name:"Project Alpha" AND parent_path:*\/documents\/*

Find files that have NOT been enriched:

type:file AND NOT csv:*

Tip: Use the Diskover-Web filters panel alongside these queries to further narrow results by file type, size, age, or location.

Troubleshooting

CSV File Not Found

Symptom: Error message: CSV file not found: /path/to/file.csv

Solution:

Verify the file path in your configuration is correct.
Confirm the Diskover service account has read permissions on the CSV file.
Check for typos in the path — this is case-sensitive on Linux.

No Documents Matched

Symptom: The plugin completes but reports 0 documents updated.

Solution:

Confirm the filenames in your CSV exactly match the indexed filenames (case-sensitive).
If using UNC paths in your CSV, make sure path replacement is enabled and configured correctly.
Run with -V (very verbose) to see the filenames being searched and whether path translation is working.

DateTime Parsing Failures

Symptom: Warnings in the log about failed datetime parsing.

Solution:

Run with -V to see sample values and the format being used.
Ensure your datetime_format matches the actual date format in your CSV. Common mistake: month/day order (US %m/%d/%Y vs. international %d/%m/%Y).
The plugin tries fallback formats automatically, but setting the correct primary format avoids unnecessary warnings.

Encoding Errors

Symptom: UnicodeDecodeError or garbled characters in enriched fields.

Solution:

Save your CSV file with UTF-8 encoding. Most spreadsheet applications offer "CSV UTF-8" as an export option.
The plugin handles UTF-8 with BOM (utf-8-sig) automatically.

Performance Tuning

If the plugin is running slowly on large CSV files:

Increase maxthreads (e.g., 8–16) for more parallel processing.
Increase batch_size for more efficient Elasticsearch bulk operations.
Verify your Elasticsearch cluster is healthy: curl localhost:9200/_cluster/health
Schedule enrichment during off-peak hours when the Elasticsearch cluster is less loaded.

Debug Logging

For detailed troubleshooting, run the plugin with verbose flags:

Linux:

python3 /opt/diskover/plugins_postindex/diskover_csv/diskover_csv.py -V diskover-myindex

Windows:

python "C:\Program Files\Diskover\plugins_postindex\diskover_csv\diskover_csv.py" -V diskover-myindex

The -V flag enables very verbose output, showing CSV row processing, path translations, Elasticsearch queries, and bulk update details.

Support

Last Updated: April 2026

CSV Enrichment

Overview / Use Cases

Why Use CSV Enrichment?

Common Use Cases

Installation

Prerequisites

Verify Installation

Configuration

Main Settings

CSV Field Mappings

Supported Field Types

Path Mapping

Path Replacement

Tag Generation

Complete Configuration Example

Named Configurations

Execution / Usage Guide

Command-Line Reference

Manual Execution Examples

Automated Execution

Option 1: Custom Task

Option 2: Post-Crawl Command

Expected Behavior During Execution

Reviewing the Output

Where to Find Results

What Success Looks Like

Identifying Problems

Searching in Diskover

CSV File Format Requirements

Search Query Examples

Troubleshooting

CSV File Not Found

No Documents Matched

DateTime Parsing Failures

Encoding Errors

Performance Tuning

Debug Logging

Support

Related articles