Illegal File Name Plugin
License: PRO+ (Professional Edition or higher)
Plugin Type: Post-Index Plugin
Author: Diskover Data, Inc.
Overview
The Illegal File Name plugin helps you identify files and directories with problematic names that could cause issues during migrations, break applications, or exceed path length limits. It scans your indexed data for filenames containing illegal characters or names that are excessively long, then tags them in Diskover for easy discovery and reporting.
Key Capabilities:
Detect filenames with characters that may cause cross-platform compatibility issues
Find files with names exceeding configurable length thresholds
Tag problematic files for easy filtering and reporting in Diskover
Optionally remediate issues through automated file renaming (advanced feature)
Use Cases
Cross-Platform Data Migration (Linux to Windows)
When migrating data from Linux to Windows storage, certain characters that are valid on Linux are prohibited on Windows (such as < > : " | ? *). This plugin identifies these files before migration so you can address naming issues proactively, preventing failed transfers and broken file references.
Long Path/Filename Compliance
Windows has a traditional MAX_PATH limit of 260 characters for the full file path. Files with long names deep in directory structures may exceed this limit. The plugin flags filenames over a configurable character threshold, helping you identify potential issues before they cause problems.
Application and Script Compatibility
Legacy applications, backup software, and shell scripts often have difficulty handling special characters in filenames. The plugin can identify files with characters outside a configurable whitelist, ensuring compatibility with sensitive systems.
Installation
Prerequisites
Diskover installation (Professional Edition or higher)
Python 3.9 or higher
Access to the Diskover configuration interface
Installation Steps
The Illegal File Name plugin is included with Diskover Professional Edition and higher. The plugin files are located in the post-index plugins directory:
Linux:
/opt/diskover/plugins_postindex/diskover_illegalfilename/
Windows:
C:\Program Files\Diskover\plugins_postindex\diskover_illegalfilename\
To verify the plugin is installed, check that the following files exist:
diskover_illegalfilename.pyREADME.md
Configuration
Configuration is managed through the Diskover Admin Panel under Plugins > Post Index > Illegal Filename.
Sample Configuraiton in Diskover Admin:
Here is the beginning of our sample configuration There are many other configuraitons for the Illegal File Name plugin - covered in detail below!
Core Settings
Setting | Description | Default |
|---|---|---|
| Number of parallel processing threads. Set to 0 for automatic detection based on CPU cores. | 4 |
| Enable scanning of files for illegal names | True |
| Enable scanning of directories for illegal names | True |
| Enable detection of filenames exceeding the character threshold | True |
| Minimum characters to flag a filename as "long" (format: | v32 |
Character Validation Settings
Setting | Description | Default |
|---|---|---|
| List of allowed non-alphanumeric characters. All ASCII letters and numbers are valid by default. Special characters must be escaped with backslash. |
|
Common Character Escaping:
Character | Escaped Form |
|---|---|
Hyphen |
|
Period |
|
Space |
|
Parentheses |
|
Brackets |
|
Filtering Settings
Setting | Description | Default |
|---|---|---|
| Limit scanning to specific file extensions. Leave empty |
|
| Absolute directory paths to skip during scanning. Example: |
|
Tagging Settings
Setting | Description | Default |
|---|---|---|
| Tag applied to files with illegal characters |
|
| Tag applied to files with long names |
|
Remediation Settings (Advanced)
These settings control how filenames are sanitized when using the remediation feature:
Setting | Description | Default |
|---|---|---|
| Apply NFKD unicode normalization during sanitization | True |
| Convert filenames to ASCII (requires | False |
| Maximum filename length; longer names are truncated | 255 |
| Replace spaces with underscores during remediation | True |
Example Configuration: Windows Migration
For preparing Linux data for Windows migration:
valid_chars: ['\-', '_', '\.', '\(', '\)', '\[', '\]', '\ ']
check_file: True
check_directory: True
check_long_names: True
long_name_min_chars: v200
extensions: []
exclude_dirs: ['/mnt/archive/legacy']
illegal_tag: win_migration_issue
long_tag: win_path_too_long
Execution
Manual Execution
You can run the plugin manually from the command line to scan a specific index.
Linux:
python3 /opt/diskover/plugins_postindex/diskover_illegalfilename/diskover_illegalfilename.py -v <indexname>
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_illegalfilename\diskover_illegalfilename.py" -v <indexname>
Replace <indexname> with your Diskover index name (e.g., diskover-nas01-2024.01.15).
Command-Line Options
Option | Description |
|---|---|
| Enable detailed logging output |
| Use a named configuration from Diskover Admin |
| Automatically find the most recent index for a given top path |
| Enable automated file renaming (see Advanced: Remediation) |
| Preview renaming changes without modifying files |
| Display plugin version |
Automated Execution
Using Custom Tasks
You can schedule the Illegal File Name plugin to run automatically using Diskover's Custom Tasks feature.
Sample Custom Task Configuration:
Here we can see the Run Command & args needed for the Custom Task - Note that in this case you cannot use the {indexname} variable as this is not a task that creates an index, so we must use the -l (toppath) CLI option and pass in our top path!
Using Post-Crawl Commands
To run the plugin automatically after each index completes, configure it as a Post-Crawl Command in your Index Task.
Linux Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Windows Example:
Field | Value |
|---|---|
Post-Crawl Command |
|
Post-Crawl Command Args |
|
Sample Post-Crawl Command configuraiton for Illegal File Name executing with an Index Task:
In your system ensure to replace the ConfigurationName above with a named configuraiton that you’ve created at Diskover Admin → Plugins → Post-Index → Illegal Filename – If you are not using a custom configuration and you’re just using Default than the -c flag and the ConfigurationName is not required!
Reviewing the Output
Console Output
When running with verbose mode (-v), the plugin provides detailed progress information:
Finding any illegal file names in index diskover-nas01-2024.01.15... illegal file found: /data/projects/report<final>.pdf illegal file found: /data/projects/budget|2024.xlsx Finding any long file names in index diskover-nas01-2024.01.15... long file name found: /data/documents/This_is_a_very_long_filename_that_exceeds_the_threshold.docx Finished tagging 47 docs in 00:00:12 Illegal file names found: 35 Long file names found: 12
Log Files
Plugin execution logs are written to the standard Diskover log location. Review these logs for detailed information about scan results and any errors encountered.
Success Indicators
A successful scan will show:
Count of illegal filenames found
Count of long filenames found (if enabled)
Total documents tagged
Execution time
Searching in Diskover
After the plugin runs, tagged files are easily discoverable in Diskover's search interface.
Sample Diskover query output for tags: illegalname which shows several files on a Windows scan that has spaces and underscores:
Sample Queries
Find all files with illegal characters:
tags:illegalname
Find all files with long names:
tags:longname
Find files with either issue:
tags:illegalname OR tags:longname
Find illegal files of a specific type:
tags:illegalname AND extension:pdf
Find illegal files in a specific directory:
tags:illegalname AND parent_path:*projects*
Find long names over a specific length (direct regex query):
name:/.{100,}/
Find illegal files modified in the last 30 days:
tags:illegalname AND mtime:[now-30d TO now]
Tip: If you configured custom tag names in the plugin settings, use those tag names in your searches instead of the defaults.
Advanced: Remediation (File Renaming)
⚠️ Warning: The remediation feature directly modifies filenames on your filesystem. This is a destructive operation that cannot be undone automatically. Always run a dry-run first and ensure you have backups before proceeding.
The plugin can automatically rename files to remove illegal characters and truncate long names. This is useful when you need to bulk-fix naming issues, but should be used with caution.
Workflow for Safe Remediation
Step 1: Run Detection Only
First, run the plugin in detection mode to identify all problematic files:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_illegalfilename/diskover_illegalfilename.py -v <indexname>
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_illegalfilename\diskover_illegalfilename.py" -v <indexname>
Step 2: Review Tagged Files
Search Diskover for tagged files and review them to ensure you understand what will be changed.
Step 3: Preview Changes (Dry-Run)
Run with --fixnamesdryrun to see what changes would be made without actually modifying files:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_illegalfilename/diskover_illegalfilename.py --fixnamesdryrun -v <indexname>
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_illegalfilename\diskover_illegalfilename.py" --fixnamesdryrun -v <indexname>
The output will show proposed renames:
Renaming /data/file<test>.txt => /data/file_test_.txt (DRY-RUN)
Step 4: Execute Remediation
Only after reviewing the dry-run output, proceed with actual renaming:
Linux:
python3 /opt/diskover/plugins_postindex/diskover_illegalfilename/diskover_illegalfilename.py -f -v <indexname>
Windows:
python "C:\Program Files\Diskover\plugins_postindex\diskover_illegalfilename\diskover_illegalfilename.py" -f -v <indexname>
Remediation Output
After remediation, the plugin reports:
File names fixed: Successfully renamed files
File names fixed (errors): Files that could not be renamed (permissions, file not found, etc.)
File names fixed (skipped): Files where sanitization produced the same name
Troubleshooting
Plugin Not Finding Expected Files
Possible Causes:
check_fileorcheck_directoryis set to FalseThe
extensionsfilter is limiting which files are scannedFiles are in an excluded directory path
The character you expect to be flagged is actually in the
valid_charslist
Resolution: Review your configuration settings and adjust the filters as needed.
Remediation Not Renaming Files
Possible Causes:
The Diskover service user doesn't have write permissions to the filesystem
The sanitized filename is the same as the original
The target filename already exists (collision)
Resolution: Check filesystem permissions and review the dry-run output to understand why specific files are being skipped.
Performance Issues
For large indices, consider:
Increasing
maxthreadsfor faster processingUsing
extensionsfilter to limit scopeUsing
exclude_dirsto skip unnecessary directories
Enabling Debug Output
Run with the -v (verbose) flag to see detailed logging:
python3 /opt/diskover/plugins_postindex/diskover_illegalfilename/diskover_illegalfilename.py -v <indexname>
Support
Last Updated: April 2026
Comments
0 comments
Please sign in to leave a comment.