Auto Tag
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview
The Auto Tag plugin automatically applies tags to files and directories in your Diskover index based on Elasticsearch queries. Instead of manually tagging thousands of files, you define rules that identify files matching specific criteria—age, size, location, file type—and Auto Tag does the rest.
This plugin runs after a Diskover indexing scan completes, making it ideal for implementing consistent tagging policies across your entire storage environment.
Why Use Auto Tag?
Data Lifecycle Management — Automatically identify files that haven't been accessed in months or years, flagging them for archive or deletion review
Cleanup Automation — Tag temporary files, cache directories, and backup files that are candidates for removal
Workflow Integration — Prepare files for downstream processing by the Auto Clean plugin through a controlled, two-phase workflow
Policy Enforcement — Apply organizational tagging standards consistently without manual intervention
Installation
Auto Tag is included with Diskover Professional Edition and higher. The plugin files should already be present in your installation.
DNF Installation (Linux RPM)
On Linux systems using DNF package management, this plugin can be installed via RPM:
sudo dnf install diskover-plugin-postindex-autotag
Note: Ensure your system is configured with the Diskover RPM repository before running the install command.
Verify Installation
Confirm the plugin files exist in your Diskover installation:
Linux:
ls -la /opt/diskover/plugins_postindex/diskover_autotag/
Windows:
dir "C:\Program Files\Diskover\plugins_postindex\diskover_autotag\"
You should see:
diskover_autotag.py— The main plugin scriptREADME.md— Plugin documentation
Prerequisites
Component | Requirement |
|---|---|
Python | 3.9 or higher |
Diskover | Professional Edition or higher with plugin support |
Elasticsearch | 7.x or 8.x (as supported by your Diskover version) |
No additional Python dependencies are required—Auto Tag uses only libraries included with the core Diskover installation.
Configuration
Configuration is managed through the Diskover Admin Panel. Navigate to Plugins → Post Index → AutoTag to access the settings.
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the AutoTag plugin - covered in detail below!
Configuration Parameters
Parameter | Type | Default | Description |
|---|---|---|---|
| Integer | 4 | Number of parallel processing threads. Increase for larger indices. |
| List | See below | Tagging rules applied to directories |
| List | See below | Tagging rules applied to files |
Tagging Rules Structure
Each rule in the dirs or files list contains:
Field | Type | Description |
|---|---|---|
| String | Elasticsearch query string to match documents (same syntax as the Diskover search bar) |
| List | Tags to apply to matching documents |
Default Configuration
The plugin ships with example rules that demonstrate common tagging scenarios:
Directory Rules:
dirs=[
TagParams(
query='mtime:{* TO now/m-6M/d}',
tags=['autotag', 'archive']
),
TagParams(
query='name:(*tmp* OR *TMP* OR *temp* OR *TEMP* OR *Temp* OR *cache* OR *CACHE* OR *Cache*) AND mtime:{* TO now/m-2M/d}',
tags=['autotag', 'delete']
)
]
File Rules:
files=[
TagParams(
query='extension:(tmp* OR temp* OR cache* OR bak* OR old* OR delete*) AND mtime:{* TO now/m-6M/d} AND atime:{* TO now/m-6M/d}',
tags=['autotag', 'cleanlist', 'delete']
),
TagParams(
query='mtime:{* TO now/m-6M/d}',
tags=['autotag', 'archive']
)
]
These defaults tag directories and files older than 6 months with archive, and identify temporary/cache items older than 2 months for deletion review.
Here we can see different rule sets both both Directories and Files within a single AutoTag Rule Set!
Use Cases
Use Case 1: Data Lifecycle Management
Tag old files for archival based on modification time thresholds.
Scenario: Your organization wants to identify files that haven't been modified in 6 months for archive review, and files untouched for a year for potential deletion.
Configuration:
files=[
TagParams(
query='mtime:{* TO now/m-6M/d}',
tags=['lifecycle', 'archive_candidate']
),
TagParams(
query='mtime:{* TO now/m-1y/d}',
tags=['lifecycle', 'delete_candidate']
)
]
Workflow:
Auto Tag runs after each index scan
Storage administrators review tagged files in the Diskover web interface
Approved files are processed for archival or deletion
Use Case 2: Cleanup Candidate Identification
Identify temporary and cache files that are candidates for cleanup.
Scenario: Your storage contains accumulated temporary files, cache directories, and old backups consuming valuable space.
Configuration:
files=[
TagParams(
query='extension:(tmp OR temp OR bak OR old OR ~) AND atime:{* TO now/m-30d/d}',
tags=['cleanup', 'temp_file']
),
TagParams(
query='name:*.bak AND size:[1048576 TO *]',
tags=['cleanup', 'large_backup']
)
]
dirs=[
TagParams(
query='name:(*cache* OR *tmp* OR *temp* OR .git) AND mtime:{* TO now/m-60d/d}',
tags=['cleanup', 'cache_directory']
)
]
Use Case 3: Workflow Integration with Auto Clean
Implement a controlled two-phase cleanup workflow where Auto Tag identifies candidates and Auto Clean acts on approved items.
Phase 1 — Auto Tag Configuration:
files=[
TagParams(
query='extension:(log OR LOG) AND size:[104857600 TO *] AND mtime:{* TO now/m-7d/d}',
tags=['autoclean_candidate', 'large_old_log']
)
]
Phase 2 — Auto Clean Configuration (processes tagged files):
files=[
Action(
action='delete',
query='tags:(autoclean_candidate AND approved)',
tags=['cleaned']
)
]
Workflow:
Auto Tag runs and tags large old log files as
autoclean_candidateAdministrator reviews tagged files in Diskover web UI
Administrator adds
approvedtag to files that should be deletedAuto Clean runs and deletes files with both tags
This two-phase approach ensures human review before any destructive actions occur.
Use Case 4: Project and Path-Based Classification
Classify files based on their location in the directory hierarchy.
Configuration:
files=[
TagParams(
query='parent_path:*\\/projects\\/active\\/*',
tags=['project', 'active']
),
TagParams(
query='parent_path:*\\/projects\\/archive\\/*',
tags=['project', 'archived']
),
TagParams(
query='parent_path:*\\/departments\\/finance\\/*',
tags=['department', 'finance', 'sensitive']
)
]
Note: Backslashes must be escaped in path queries. Use
\\/to match a forward slash in paths.
Execution
Auto Tag can be run manually from the command line or scheduled to run automatically after index scans.
Command-Line Options
Option | Description |
|---|---|
| Show help message and exit |
| Use a named configuration defined in Diskover Admin |
| Add new tags to existing tags instead of replacing them |
| Auto-find most recent index based on top path |
| Enable verbose logging |
| Enable very verbose (debug) logging |
| Print version number and exit |
Manual Execution
Linux:
# Tag a specific index python3 /opt/diskover/plugins_postindex/diskover_autotag/diskover_autotag.py diskover-myvolume-2025.01.14 # Auto-find latest index for a path python3 /opt/diskover/plugins_postindex/diskover_autotag/diskover_autotag.py -l /mnt/data # Use a specific configuration with verbose logging python3 /opt/diskover/plugins_postindex/diskover_autotag/diskover_autotag.py -c archive_policy -v diskover-myvolume-2025.01.14 # Add tags to existing tags (instead of replacing) python3 /opt/diskover/plugins_postindex/diskover_autotag/diskover_autotag.py -a diskover-myvolume-2025.01.14
Windows:
# Tag a specific index python "C:\Program Files\Diskover\plugins_postindex\diskover_autotag\diskover_autotag.py" diskover-myvolume-2025.01.14 # Auto-find latest index for a path python "C:\Program Files\Diskover\plugins_postindex\diskover_autotag\diskover_autotag.py" -l E:\data # Use a specific configuration with verbose logging python "C:\Program Files\Diskover\plugins_postindex\diskover_autotag\diskover_autotag.py" -c archive_policy -v diskover-myvolume-2025.01.14 # Add tags to existing tags (instead of replacing) python "C:\Program Files\Diskover\plugins_postindex\diskover_autotag\diskover_autotag.py" -a diskover-myvolume-2025.01.14
Understanding Add vs Replace Mode
The -a/--addtags flag controls how tags are applied:
Replace Mode (default):
Existing tags on matched documents are completely replaced with the new tags
Best when you want clean, definitive tag assignments
Additive Mode (-a flag):
New tags are merged with existing tags
Duplicate tags are automatically skipped
Best when you want to accumulate tags from multiple rules or preserve manual tags
Example:
# File currently has tags: ['project_x', 'reviewed'] # Replace mode result: tags = ['autotag', 'archive'] # Additive mode result: tags = ['project_x', 'reviewed', 'autotag', 'archive']
Automated Execution
To run Auto Tag automatically after each index scan, configure it as a Post-Crawl Command in your Index Task.
Linux Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Available Index Task Tokens:
{indexname}— The name of the index that was just created
In your system ensure to replace the ConfigurationName above with a named configuraiton that you've created at Diskover Admin → Plugins → Post-Index → AutoTag – If you are not using a custom configuration and you're just using Default than the -c flag and the ConfigurationName is not required!
Reviewing the Output
During Execution
With verbose logging enabled (-v or -V), Auto Tag displays progress information:
Finding and updating tags in index diskover-myvolume-2025.01.14...
es query: mtime:{* TO now/m-6M/d} AND type:directory
found 1,247 matching docs
thread 0 started tagging 500 docs
thread 1 started tagging 500 docs
thread 0 finished tagging 500 docs in 2.34s
thread 1 finished tagging 500 docs in 2.51s
thread 2 started tagging 247 docs
thread 2 finished tagging 247 docs in 1.12s
Verifying Results
After Auto Tag completes, verify tags were applied correctly:
Open the Diskover web interface
Search for tagged files using a query like
tags:autotag
Comments
0 comments
Please sign in to leave a comment.