Download
License: PRO+ (Professional Edition or higher)
Plugin Type: File Action
Author: Diskover Data, Inc.
Overview
The Download file action provides a secure way to download files from your indexed storage locations directly through the Diskover web interface. Rather than requiring direct access to storage systems, users can browse, search, and download files they need—all while your organization maintains control over what types of files can be downloaded.
Key Features
Content-Based File Type Detection: Uses intelligent MIME detection to identify files by their actual content, not just their file extension—preventing disguised files from bypassing security policies
Configurable File Type Restrictions: Administrators can block specific file types (images, executables, archives) based on organizational policies
PDF Security Scanning: Optional detection of PDFs containing embedded images, useful for preventing unauthorized image extraction
Large File Support: Handles files of any size through efficient chunked processing
Seamless Browser Integration: Downloaded files go directly to your browser's configured download location
Use Cases
Document Distribution
Enable secure download of approved document types (PDF, Word, Excel) while blocking image files to prevent unauthorized extraction of visual assets or sensitive photographs.
Enterprise Data Access
Provide controlled file access for remote workers who need to download files from centralized storage, with restrictions based on organizational data policies.
Compliance and Data Loss Prevention
Implement file type filtering to prevent exfiltration of specific file types. Block executable downloads to prevent malware distribution. Scan PDFs for embedded images to detect potential data hiding.
Media Asset Protection
Media organizations can block direct download of image files through Diskover while still allowing users to browse metadata and download approved file types like documents and spreadsheets.
Understanding MIME Types
The Download file action uses MIME (Multipurpose Internet Mail Extensions) types to identify and filter files. Understanding MIME types helps you make informed decisions about which file types to allow or block.
What Are MIME Types?
MIME types are standardized labels that identify file formats. Unlike file extensions (which can easily be changed), MIME types are determined by examining the actual content of a file. This makes them more reliable for security purposes.
MIME Type | Description | Common Extensions |
|---|---|---|
| PDF documents | |
| JPEG images | .jpg, .jpeg |
| PNG images | .png |
| ZIP archives | .zip |
| Word documents | .docx |
| Excel spreadsheets | .xlsx |
| Windows executables | .exe, .dll |
Why Content-Based Detection Matters
The Download file action uses a library called libmagic to examine the actual bytes inside a file to determine its true type. This prevents security bypass attempts where someone might rename a file (for example, renaming virus.exe to document.pdf) to trick extension-based filters.
Requirements
Diskover Requirements
Diskover PRO+ license or higher
Diskover Web interface access
File action support enabled
External Dependencies
The Download file action requires the following Python libraries on the Diskover server:
Package | Purpose |
|---|---|
python-magic | Detects file types using content analysis |
PyMuPDF | Scans PDFs for embedded images |
Pillow | Image processing and validation |
Installation & Setup
1. Install the File Action Package
dnf install diskover-file-actions-download
2. Install Python Dependencies
python3 -m pip install -r /opt/diskover/fileactions/download/requirements.txt
3. Verify Installation
Run these commands to confirm dependencies are properly installed:
# Verify python-magic
python3 -c "import magic; print('python-magic installed successfully')"
# Verify PyMuPDF
python3 -c "import fitz; print(f'PyMuPDF version: {fitz.version}')"
# Verify Pillow
python3 -c "from PIL import Image; print(f'Pillow version: {Image.__version__}')"
4. Restart Services
# On the Diskover Web host systemctl restart diskover-admin # On all Diskover Task Worker host(s) systemctl restart celery
5. Verify the File Action is Available
Log into the Diskover web interface, select any file, and confirm that Download appears in the Actions menu.
Configuration
Configuration is managed through the Diskover Admin interface.
Location: Diskover Admin > Configuration > Plugins > File Actions > Download
Configuration Parameters
Setting | Default | Description |
|---|---|---|
|
| When set to |
| See below | List of MIME types that cannot be downloaded. Files matching these types will be blocked regardless of their file extension. |
Default Blocked File Types
By default, the following file types are blocked:
Image Types:
JPEG, PNG, GIF, BMP, WebP, SVG, TIFF, ICO, HEIF, HEIC
Archive Types:
ZIP, TAR, GZIP
Executable Types:
Windows executables (.exe, .dll)
Note: These defaults can be customized by your administrator based on organizational needs. The blocked types shown above are starting recommendations, not requirements.
Configuration Examples
Standard Configuration
The default configuration blocks common image, archive, and executable formats while allowing documents, spreadsheets, and other business files:
allow_pdf_with_images: Trueforbidden_mime_types: Default list (images, archives, executables)
High-Security Configuration
For environments with strict data loss prevention requirements:
allow_pdf_with_images: False (blocks PDFs with embedded images)forbidden_mime_types: Default list plus additional types like:application/x-7z-compressed(7-Zip)application/x-rar-compressed(RAR)text/html(HTML files)application/javascript(JavaScript files)
Document-Only Configuration
For environments where only office documents should be downloadable:
allow_pdf_with_images: Trueforbidden_mime_types: Block everything except PDF, Word, Excel, and PowerPoint MIME types
Usage Guide
Downloading Files
Navigate to Your Files: Use Diskover's search or browse features to locate the files you want to download
Select Files: Click the checkbox next to each file you want to download
You can select multiple files for batch download
Only files can be downloaded (not directories)
Launch the Download Action: Click the Actions button in the toolbar, then select Download
Receive Your Files: The selected files will download directly to your browser's configured download location
Batch Downloads
When you select multiple files and initiate a download, each file is processed individually. Files will download as they complete processing, so larger files may finish after smaller ones.
What Happens Behind the Scenes
When you request a download:
The system verifies the file exists and is accessible
The file's actual content is analyzed to determine its true type (not just the extension)
The file type is checked against the blocked list
If it's a PDF and image scanning is enabled, the PDF is checked for embedded images
If all checks pass, the file is prepared and sent to your browser
Handling Blocked Files
If a file cannot be downloaded due to security policies, you'll see one of these messages:
Blocked File Type
Download Error: The file type 'image/jpeg' cannot be downloaded.
What this means: The file's content type is on your organization's blocked list.
What to do:
If you need this file type for legitimate work, contact your administrator to discuss policy adjustments
Consider whether an alternative format might work (for example, embedding images in a document)
PDF with Embedded Images Blocked
/storage/documents/report.pdf has images. Viewing is blocked.
What this means: The PDF contains embedded images, and your organization has enabled PDF image scanning to prevent this.
What to do:
Request a version of the document without embedded images
Contact your administrator if PDF image downloads should be enabled for your use case
Security Considerations
Content-Based Detection
The Download file action examines file contents rather than trusting file extensions. A file named document.pdf that is actually a JPEG image will be correctly identified and blocked (if images are restricted). This protects against:
Renamed malicious files
Extension spoofing attacks
Disguised executables
PDF Image Scanning
When allow_pdf_with_images is set to False, the system scans PDF structure for any embedded images. This includes:
Visible images in the document
Hidden or transparent images
Tracking pixels
Some PDFs may contain invisible images (like 1x1 tracking pixels) that trigger this protection even when no visible images are present.
Best Practices
Download only files you need for legitimate work purposes
If a file is unexpectedly blocked, verify with your administrator before seeking workarounds
Report any suspicious file behavior to your IT security team
Troubleshooting
Common Issues
Issue | Cause | Solution |
|---|---|---|
Download never starts | File may be blocked or processing | Check for error messages; verify the file type isn't restricted |
"File type cannot be downloaded" error | File's MIME type is on the blocked list | Contact administrator if you need access to this file type |
PDF blocked for having images | PDF contains embedded images and scanning is enabled | Request an image-free version or ask administrator to adjust settings |
Download times out | Very large file or network issues | Try again; for consistently large files, contact administrator about timeout settings |
File appears corrupted after download | Possible processing issue | Retry the download; if problem persists, report to administrator |
Debug Logging
If you're troubleshooting issues with your administrator, logs can be found at:
Diskover Admin:
/var/log/diskover/diskover-admin.logCelery Task Worker:
/var/log/celery/worker.logWeb Server (Nginx):
/var/log/nginx/error.log
Set log level to DEBUG in configuration for more detailed output when diagnosing issues.
Verifying the Installation
Administrators can verify the Download file action is working correctly:
# Test libmagic functionality
python3 -c "import magic; m = magic.Magic(mime=True); print(m.from_file('/etc/passwd'))"
# Should output: text/plain
# Verify all Python dependencies
python3 -c "import magic; print('python-magic: OK')"
python3 -c "import fitz; print('PyMuPDF: OK')"
python3 -c "from PIL import Image; print('Pillow: OK')"
Support
Documentation: https://docs.diskoverdata.com
Support Portal: https://support.diskoverdata.com
Last Updated: January 2026
Diskover Data, Inc.
Comments
0 comments
Please sign in to leave a comment.