These terms may have different meanings in other contexts, but the definitions below apply specifically to the Diskover user environment.
Data Curation
Data curation refers to managing data from various sources as a valuable asset. It involves having a clear data strategy and reliable methods to access, integrate, cleanse, govern, store, and prepare data for analytics. Effective curation ensures your data stays useful over time and remains available for reuse and long-term preservation.
Directory/Folder
Although “directory” and “folder” can differ slightly in other contexts, in this guide they mean the same thing: a container used to organize files and other directories.
Learn how to use the directories in the user interface →
Elasticsearch
Elasticsearch powers Diskover’s search and analytics engine. It stores indexed metadata and enables fast queries, reports, and visualizations across large storage environments.
Learn more about Diskover’s architecture →
Hardlinks
A hardlink is like a shortcut that acts just like the original file. It connects a name to a file on your computer, so even if the file is renamed, the hardlink will still point to the same content. Unlike a regular shortcut, which might break if the file is moved or renamed, a hardink keeps working no matter what.
Think of hardlinks like copies of a file, but without using extra disk space. Unlike symbolic or soft links, which are just pointers to the original file, hardlinks are directly tied to the file itself, so they keep working even if the file is renamed.
✏️ Hardlinks are commonly used in the media and entertainment industry to reference the same digital asset across multiple shot folders without duplicating the file.
Feature |
Soft Links |
Hardlinks |
|---|---|---|
Can span file systems |
Yes |
No |
Can link between directories |
Yes |
No |
Inode number |
Different from the original file |
Same as the original file |
File permissions |
Different from the original file |
Same as the original file |
Permissions update |
No, permissions are not updated |
Yes, permissions are updated if the source file's permissions change |
Contents |
Points to the path, not the contents |
Contains the actual contents of the original file |
File system boundaries |
Can cross file system boundaries |
Cannot cross file system boundaries |
Disk space usage |
Does not consume additional space |
No additional space until all links are deleted |
Link type |
A pointer to the original file |
A reference to the actual data of the file |
Required for files |
Yes |
Yes, must have at least one hardllink |
Can link directories |
Yes |
No |
Hash Values vs Inodes
Inodes and hash values both identify files, but in different ways and for different purposes, such as search and deduplication.
🔆 When to use what:
Use inode when tracking a file on the same system (even if renamed or moved).
Use hashes when comparing file content across systems or locations.
DESCRIPTION |
INODE |
HASH VALUE |
|---|---|---|
What it represents |
A file’s unique ID in the filesystem. |
A file’s unique fingerprint based on its content. |
How it works |
Each file or directory is assigned an inode number by the filesystem. |
A hash function (e.g., MD5, SHA-1) generates a fixed-length value from the file’s content. |
Changes when… |
File is moved or renamed → Inode stays the same. |
File content changes ➝ Hash changes. |
Used for |
Tracking files within the same filesystem (including hardlinks). |
Verifying file integrity and detecting duplicates across different storage locations. |
Diskover search field |
|
|
Key limitation |
Inode numbers are unique only within a single filesystem—same file copied to another storage system gets a new inode. |
Two different files with identical content will have the same hash, even if their metadata (name, date, location, etc.) differs. |
Index/Indexes/Indices
An index is a searchable inventory of all metadata (attributes) about files within a volume. Instead of searching through the operating system directly, Diskover searches through the index for faster results.
Both indexes and indices are correct plural forms; indices is more common in technical contexts.
You can have multiple indices (snapshots/inventories) of the same storage volume from different points in time. Indices contain core metadata such as directory name, file name, size, extension, creation date, modification date, owner, and more. Diskover can also add enriched metadata, which provides business context.
Metadata/Attributes
Metadata is the information about a file — the details that describe what it is, where it’s located, who created it, etc. In Diskover, metadata falls into two main categories:
Core Metadata (always collected):
These are the basic details your file system already knows, such as:
File name
Size
Type/extension
Timestamps (creation date, modification date, last access date)
Path/where it’s located
Owner and permissions
Extra Metadata (added via plugins):
This is additional information that gives files business context, such as:
Customer or asset details
Media-specific metadata (codec, duration, resolution, etc.)
Scientific metadata (bam, sam, etc.)
Diskover stores and organizes metadata so you can search, filter, analyze, and automate using factual information about your files — instead of manually tracking everything yourself.
Path
A path is the full location of a directory or file, for example:
/mnt/lucidlink/projects/Pistachio/WonderfulPistachios_GangnamStyle.mov
Recursive and Non-Recursive
Non-Recursive: A non-recursive action applies only to the items directly inside the selected path. It does not include any sub-folders or their contents.
Recursive: A recursive action applies to everything inside the selected path, including all sub-folders and their files at every level.
Size
Definition by field name:
size: The default reported file size. This is the standard “logical size” shown in most Diskover views.
size_du: The allocated size on disk—how much storage the file actually takes up based on filesystem block usage. Useful for capacity planning.
file_size: Another logical size field that behaves like
size. If you see bothsizeandfile_sizein your environment, check with your System Administrator which one applies to your setup.file_size_du: Allocated size for file items only. Similar to
size_dubut excludes directories.pscale.size_logical: Logical file size reported specifically from Dell PowerScale.
pscale.size_physical_data: The actual physical storage consumed on PowerScale, accounting for protection overhead and layout. Most accurate for true footprint on PowerScale systems.
pscale.size_protection: Amount of space used for protection overhead (mirroring, erasure coding, etc.) on PowerScale.
size_du_norecurs: Allocated size without including subfolders. Best used when you want the true size of a single folder level.
size_logical: Logical file size (same meaning as
size), but pulled from a different metadata field.size_norecurs: Logical size of a directory without counting the contents of subfolders.
size_physical_data: The actual bytes stored on disk, reported by supported scanners. Not available in all environments—contact your System Administrator if you don't see this field.
size_protection: The storage space used for data protection overhead (such as mirroring or erasure coding), reported by supported scanners. Not available in all environments—contact your System Administrator if you don't see this field.
🔆 By category:
Standard size fields:
size,size_du,size_logical,file_size,file_size_duNon-recursive fields:
size_norecurs,size_du_norecursDell PowerScale fields:
pscale.size_logical,pscale.size_physical_data,pscale.size_protectionVendor-specific / extended fields:
size_physical_data,size_protection
Stemming
Stemming is applied to .text fields in Diskover searches. It allows Diskover to match different forms of the same word automatically (e.g., run, running, runner).
⚠️ Stemming only applies to .text fields (such as name.text). Fields like name and path do not use stemming.
HOW IT WORKS |
EXAMPLES |
WHAT THIS MEANS FOR USERS |
|---|---|---|
|
Stemming matches words with the same root, so you don’t need to search for every variation. Instead of searching for the exact word, Diskover looks for related words with the same root. |
|
|
Timestamps
atime → last accessed → the file may have been opened by you, or may have been accessed by some other program or a remote machine. Anytime a file has been accessed, its access time changes.
ctime → last changed → the modification can be in terms of its content or in terms of its attributes—whenever anything about a file changes (except its access time), its ctime changes.
-
mtime → last modified → indicates the time the contents of the file have been changed—mind you, only the contents, not the attributes—for instance:
If you open a file and change some (or all) of its content, its mtime gets updated.
If you change a file's attribute (like read-write permissions, metadata), its mtime doesn't change, but ctime will.
Volume
A volume is any storage location indexed by Diskover—for example, a Windows share or drive, a Linux mount/NFS export, cloud storage like an S3 bucket, Dell PowerScale storage, a laptop, or even a USB drive.
Your organization may refer to volumes using other names, such as storage volume, mount, mount point, top-level path, or top-level storage
Comments
0 comments
Please sign in to leave a comment.