Breadcrumb Index Plugin User Guide
License: PRO+ (Professional Edition or higher)
Plugin Type: Index Plugin
Author: Diskover Data, Inc.
Overview
The Breadcrumb plugin extracts metadata from breadcrumb files during Diskover indexing and adds this information to directory documents in your Elasticsearch index. Breadcrumb files are small text files containing key-value metadata about a directory—typically used to track project ownership, creation dates, storage quotas, and other organizational metadata.
When enabled, this plugin automatically reads breadcrumb files as directories are crawled, making their metadata searchable alongside all your other file system data. This allows you to answer questions like "Which directories belong to the Engineering department?" or "Show me all project directories created before 2023" directly from the Diskover search interface.
Key Capabilities
Automatic Metadata Extraction: Reads key-value pairs from breadcrumb files and adds them to directory documents during indexing
Configurable Structure: Customize the breadcrumb directory name, file name, and target field name to match your existing conventions
Auto-Tagging: Automatically tags directories containing breadcrumb data for easy filtering
Date Normalization: Converts birthdate fields from dd/mm/yyyy to yyyy-mm-dd format for proper date searching
Field Filtering: Optionally extract only specific fields from breadcrumb files
Performance Caching: SQLite caching improves re-scan performance for large environments
Use Cases
For Storage Administrators
Use Case | Description |
|---|---|
Quota Management | Find all directories with specific quota allocations for capacity planning |
Age-Based Policies | Identify directories older than a threshold for archival or cleanup |
Ownership Tracking | Locate all directories owned by a specific user or department |
Provisioning Audit | Track which directories were provisioned, when, and by whom |
Migration Planning | Group directories by project or department for targeted data migrations |
For Data Managers and Project Leads
Use Case | Description |
|---|---|
Project Discovery | Find all directories associated with a specific project code |
Department Storage | Locate and measure all storage allocated to a particular department |
Resource Attribution | Track storage consumption by project or cost center |
Compliance Reporting | Generate reports of storage allocation by team, requestor, or time period |
Understanding Breadcrumb Files
Before configuring this plugin, it helps to understand what breadcrumb files are and how they work. This section explains the concept for users who may be new to this approach.
What Are Breadcrumb Files?
Breadcrumb files are simple text files that contain metadata about a directory in a key-value format. Organizations typically create these files during directory provisioning workflows to record important information such as:
When the directory was created (birthdate)
Who owns or requested the directory
Which project or department the directory belongs to
Quota or storage allocation information
Ticket or request references for audit trails
The term "breadcrumb" comes from the idea of leaving a trail of information—these files provide context about why a directory exists and who is responsible for it.
Typical Directory Structure
Breadcrumb files are usually stored in a hidden subdirectory within the target directory. Here's an example structure:
/mnt/projects/
└── alpha_project/
├── .birthday/ ← Breadcrumb directory (hidden)
│ └── volinfo ← Breadcrumb file with metadata
├── data/
├── scripts/
└── results/
In this example, the .birthday directory contains a file called volinfo with metadata about the alpha_project directory.
Breadcrumb File Format
Breadcrumb files use a simple key: value text format with one entry per line. Here's an example breadcrumb file:
birthdate: 15/03/2024 owner: sarah.chen project: AlphaResearch department: DataScience quota_gb: 500 requestor: mike.johnson cost_center: CC-4521 ticket: STOR-78432
Each line contains a field name, followed by a colon, followed by the value. The plugin reads these pairs and makes them searchable in Diskover.
Special Field Handling
Field | Handling |
|---|---|
| Automatically normalized from dd/mm/yyyy to yyyy-mm-dd format for Elasticsearch date compatibility |
All other fields | Stored as keyword values exactly as they appear in the file |
Why Use Breadcrumb Files?
Benefit | Description |
|---|---|
Decentralized Metadata | Metadata lives with the data itself, surviving moves and migrations |
Simple Format | Easy to create and update with any text editor or automated script |
Provisioning Integration | Can be automatically generated during directory provisioning workflows |
Audit Trail | Provides a historical record of when and why directories were created |
Cross-Platform | Plain text format works across all operating systems and storage platforms |
Requirements
License
License: PRO+ (Professional Edition or higher)
Python Dependencies
This plugin has no external Python package dependencies beyond Diskover's core requirements.
System Requirements
Python 3.9 or higher
Diskover indexer with plugin support enabled
Read access to breadcrumb files during indexing
Write access to cache directory (if caching is enabled)
Breadcrumb File Requirements
For the plugin to successfully extract metadata, breadcrumb files must meet these criteria:
Text file with UTF-8 encoding
One key-value pair per line in
key: valueformatNon-empty file (empty files are skipped with a warning)
Installation
Step 1: Install Dependencies
This plugin has no additional dependencies to install.
Step 2: Configure the Plugin
Navigate to Diskover Admin > Plugins > Index Plugins > Breadcrumb
Enable the plugin
Configure the parameters to match your breadcrumb file structure (see Configuration section below)
Save your configuration
Step 3: Enable in Index Task Configuration
Navigate to Diskover > Configurations > select your configuration (e.g., Default)
Scroll to the bottom to find Index Plugins Enablement
Enable the Breadcrumb plugin
Save the configuration
The plugin will now run automatically during any scan that uses this configuration.
Step 4: Create Breadcrumb Files (If Not Already Present)
If your environment doesn't already have breadcrumb files, you can create them manually or through automated provisioning scripts.
Linux Example:
# Create breadcrumb directory mkdir -p /mnt/projects/my_project/.birthday # Create breadcrumb file with metadata cat > /mnt/projects/my_project/.birthday/volinfo << 'EOF' birthdate: 15/03/2024 owner: sarah.chen project: AlphaResearch department: DataScience quota_gb: 500 requestor: mike.johnson cost_center: CC-4521 ticket: STOR-78432 EOF
Windows Example:
# Create breadcrumb directory New-Item -ItemType Directory -Path "D:\Projects\my_project\.birthday" -Force # Create breadcrumb file with metadata @" birthdate: 15/03/2024 owner: sarah.chen project: AlphaResearch department: DataScience quota_gb: 500 requestor: mike.johnson cost_center: CC-4521 ticket: STOR-78432 "@ | Out-File -FilePath "D:\Projects\my_project\.birthday\volinfo" -Encoding UTF8
Configuration
Configuration Parameters
Parameter | Type | Default | Description |
|---|---|---|---|
| string |
| Directory name containing breadcrumb files. Leave blank if breadcrumb files are located directly in the parent directory (not a subdirectory). |
| string |
| Name of the breadcrumb file containing metadata |
| string |
| Elasticsearch field name where breadcrumb data will be stored. This determines how you search for the data in Diskover. |
| string |
| Tag automatically applied to directories containing breadcrumb data |
| bool |
| Enable SQLite caching for improved re-index performance |
| string | See below | Directory for SQLite cache database |
| int |
| Cache entry expiration in seconds (0 = never expire) |
| list |
| List of specific fields to extract. Empty list extracts all fields. |
Default Cache Directories:
Linux:
/opt/diskover/__breadcrumb_plugin_cache__/Windows:
C:\Program Files\Diskover\__breadcrumb_plugin_cache__\
Understanding Key Parameters
breadcrumb_dir_name
This setting specifies the subdirectory that contains the breadcrumb file. Common configurations:
Value | Use Case |
|---|---|
| Hidden directory (default convention) |
| Alternative hidden directory name |
| Non-hidden directory for better Windows visibility |
(empty) | Breadcrumb file is directly in the parent directory |
target_field_name
The Elasticsearch field name determines how you'll search for breadcrumb data in Diskover:
Configuration Value | Search Query Example |
|---|---|
|
|
|
|
|
|
extract_fields
Control which fields are extracted from breadcrumb files:
Configuration | Behavior |
|---|---|
| Extract all fields from the breadcrumb file |
| Extract only these three fields |
| Extract only department and cost center |
Example Configurations
Default Configuration
Standard breadcrumb extraction with all fields:
{
"breadcrumb_dir_name": ".birthday",
"breadcrumb_file_name": "volinfo",
"target_field_name": "volinfo",
"breadcrumb_dir_parent_tag": "quotadir",
"enable_cache": true,
"cache_dir": "/opt/diskover/__breadcrumb_plugin_cache__/",
"cache_expire_time": 0,
"extract_fields": []
}
Storage Quota Tracking Configuration
Focused on quota and ownership metadata:
{
"breadcrumb_dir_name": ".birthday",
"breadcrumb_file_name": "volinfo",
"target_field_name": "storage_info",
"breadcrumb_dir_parent_tag": "quota-tracked",
"enable_cache": true,
"cache_dir": "/opt/diskover/__breadcrumb_plugin_cache__/",
"cache_expire_time": 0,
"extract_fields": ["owner", "department", "quota_gb", "birthdate"]
}
Windows-Compatible Configuration
Using visible directory names for Windows environments:
{
"breadcrumb_dir_name": "_metadata",
"breadcrumb_file_name": "dirinfo.txt",
"target_field_name": "dirinfo",
"breadcrumb_dir_parent_tag": "has-metadata",
"enable_cache": true,
"cache_dir": "C:\\Program Files\\Diskover\\__breadcrumb_plugin_cache__\\",
"cache_expire_time": 0,
"extract_fields": []
}
Direct File Configuration
When breadcrumb files are directly in the parent directory (no subdirectory):
{
"breadcrumb_dir_name": "",
"breadcrumb_file_name": ".volinfo",
"target_field_name": "volinfo",
"breadcrumb_dir_parent_tag": "quotadir",
"enable_cache": true,
"cache_dir": "/opt/diskover/__breadcrumb_plugin_cache__/",
"cache_expire_time": 0,
"extract_fields": []
}
Indexed Fields / Elasticsearch Mappings
The Breadcrumb plugin adds a configurable object field (default: volinfo) to directory documents containing the extracted metadata.
Field Mappings
Field Path | ES Type | Description |
|---|---|---|
| object | Root container for all breadcrumb metadata |
| date | Directory creation date (format: yyyy-MM-dd) |
| keyword | Any additional fields from the breadcrumb file |
| text | Full path to the breadcrumb file (for debugging) |
Example Indexed Document
Given the example breadcrumb file from earlier:
birthdate: 15/03/2024 owner: sarah.chen project: AlphaResearch department: DataScience quota_gb: 500 requestor: mike.johnson cost_center: CC-4521 ticket: STOR-78432
The resulting directory document in Elasticsearch would include:
{
"name": "alpha_project",
"type": "directory",
"path": "/mnt/projects/alpha_project",
"tags": ["quotadir"],
"volinfo": {
"birthdate": "2024-03-15",
"owner": "sarah.chen",
"project": "AlphaResearch",
"department": "DataScience",
"quota_gb": "500",
"requestor": "mike.johnson",
"cost_center": "CC-4521",
"ticket": "STOR-78432",
"breadcrumb_file_path": "/mnt/projects/alpha_project/.birthday/volinfo"
}
}
Notice that the birthdate value has been automatically converted from 15/03/2024 to 2024-03-15 for proper date handling.
Searching in Diskover
Once indexing is complete, you can search for directories based on their breadcrumb metadata. The examples below assume the default target_field_name of volinfo and use the sample breadcrumb file shown earlier.
Basic Searches
Find all directories with breadcrumb metadata:
volinfo:*
Find directories tagged as quota directories:
tags:quotadir
Searching by Owner
Query | Description |
|---|---|
| Find directories owned by sarah.chen |
| Find directories owned by anyone with "chen" in their username |
| Find all directories that have an owner specified |
Searching by Project
Query | Description |
|---|---|
| Find directories for the AlphaResearch project |
| Find directories for projects starting with "Alpha" |
| Find all directories with a project assigned |
Searching by Department
Query | Description |
|---|---|
| Find all DataScience department directories |
| Find directories for multiple departments |
| Find directories without a department assigned |
Searching by Cost Center
Query | Description |
|---|---|
| Find directories charged to a specific cost center |
| Find directories for cost centers starting with CC-45 |
Searching by Quota
Query | Description |
|---|---|
| Find directories with 500 GB quota |
| Find directories with quotas of 1 TB or more |
| Find directories with small quotas (100 GB or less) |
Searching by Ticket Reference
Query | Description |
|---|---|
| Find directories provisioned via a specific ticket |
| Find all directories with storage ticket references |
| Find all directories with any ticket reference |
Searching by Birthdate
Query | Description |
|---|---|
| Find directories created on a specific date |
| Find directories created in 2024 |
| Find directories created before 2023 (older directories) |
| Find directories created in the last year |
| Find directories created in the last 90 days |
Combined Searches
These queries combine multiple criteria for more targeted results:
Query | Description |
|---|---|
| Find AlphaResearch directories owned by sarah.chen |
| Find DataScience directories created in 2024 |
| Find quota-tracked directories larger than 1 GB |
| Find directories for a cost center that are older than 2 years |
| Find DataScience directories without an owner specified |
| Find directories requested by Mike but owned by someone else |
Storage Analysis Searches
Query | Description |
|---|---|
| Find old directories (2+ years) that are nearly empty (less than 1 MB) |
| Find directories with large quotas (1 TB+) but small actual usage (less than 10 GB) |
| Find quota-tracked directories in archive locations |
Troubleshooting
Common Issues
Issue | Cause | Solution |
|---|---|---|
Plugin not loading | Import errors or missing dependencies | Check the diskover log for errors: |
Breadcrumb data not appearing | Breadcrumb file doesn't exist or isn't readable | Verify the file exists: |
Birthdate not parsing correctly | Date format doesn't match expected dd/mm/yyyy | Update breadcrumb files to use dd/mm/yyyy or pre-format as yyyy-mm-dd |
Tags not being applied | Breadcrumb data extraction failed | Check logs for parsing errors and verify breadcrumb file format |
Cache permission errors | Diskover service user lacks write access | Create cache directory with appropriate ownership (see below) |
Slow indexing performance | Cache not enabled or cache on slow storage | Enable caching and place cache directory on fast local storage |
Verifying Breadcrumb File Setup
Check if the breadcrumb directory exists:
Linux:
ls -la /path/to/directory/.birthday/
Windows:
Get-ChildItem -Path "D:\path\to\directory\.birthday" -Force
Check if the breadcrumb file is readable and has content:
Linux:
cat /path/to/directory/.birthday/volinfo wc -l /path/to/directory/.birthday/volinfo
Windows:
Get-Content "D:\path\to\directory\.birthday\volinfo"
Validate file encoding:
Linux:
file /path/to/directory/.birthday/volinfo
The output should indicate UTF-8 or ASCII text.
Cache Management
If you experience cache-related issues, you can clear the cache to force re-extraction of breadcrumb data.
Linux:
rm -rf /opt/diskover/__breadcrumb_plugin_cache__/
Windows:
Remove-Item -Recurse -Force "C:\Program Files\Diskover\__breadcrumb_plugin_cache__\"
Fix cache directory permissions (Linux):
mkdir -p /opt/diskover/__breadcrumb_plugin_cache__ chown diskover:diskover /opt/diskover/__breadcrumb_plugin_cache__ chmod 755 /opt/diskover/__breadcrumb_plugin_cache__
Debug Logging
To troubleshoot issues, monitor the Diskover log for breadcrumb-related messages:
Linux:
tail -f /var/log/diskover/diskover.log | grep -i breadcrumb
Windows:
Check the Diskover service logs or configured log location for entries containing "breadcrumb".
Test Breadcrumb File Parsing
You can manually test if a breadcrumb file will parse correctly:
# Save as test_breadcrumb.py and run with: python3 test_breadcrumb.py
path = "/path/to/directory/.birthday/volinfo"
with open(path, 'r') as f:
for line in f:
line = line.strip()
if line:
idx = line.find(':')
if idx > 0:
name = line[:idx].strip()
val = line[idx+1:].strip()
print(f"Field: {name} = {val}")
else:
print(f"Invalid line (no colon found): {line}")
Support
Last Updated: January 2026
Diskover Data, Inc.
Comments
0 comments
Please sign in to leave a comment.