Overview
Visit the System Readiness section for further information on preparing your system for Diskover.
Packages |
Usage |
|---|---|
Python 3.8+ |
Required for Diskover scanners/workers and Diskover-Web |
Elasticsearch 8.x |
Is the heart of Diskover |
PHP 8.x and PHP-FPM |
Required for Diskover-Web |
NGINX |
Required for Diskover-Web |
Security
Disabling SELinux and using a software firewall is optional and not required to run Diskover.
Internet access is preferred during the installation to download packages with yum.
Recommended Operating Systems
Note that Windows and Mac are only supported for scanners.
Linux* |
Windows |
Mac |
|---|---|---|
|
|
|
Diskover can technically run on all flavors of Linux, although only the ones mentioned above are fully supported.
Config Architecture Overview
Elasticsearch Requirements
Elasticsearch Version
Diskover is currently tested and deployed with Elasticsearch v8.x. Note that ES7 Python packages are required to connect to an Elasticsearch v8 cluster.
Elasticsearch Architecture Overview and Terminology
Please refer to this diagram to better understand the terminology used by Elasticsearch and throughout the Diskover documentation.
Elasticsearch Cluster
The foundation of the Diskover platform consists of a series of Elasticsearch indices, which are created and stored within the Elasticsearch endpoint.
An important configuration for Elasticsearch is that you will want to set Java heap mem size - it should be half your Elasticsearch host ram up to a max of 32 GB.
For more detailed Elasticsearch guidelines, please refer to AWS sizing guidelines.
For more information on resilience in small clusters.
Indices
Rule of Thumb for Shard Size
Try to keep shard size between 10 – 50 GB
Ideal shard size approximately 20 – 40 GB
Once you have a reference for your index size, you can decide to shard if applicable. To check the size of your indices, from the user interface, go to → ⛭ → Indices:
Examples
An index that is 60 GB in size: you will want to set shards to 3 and replicas* to 1 or 2 and spread across 3 ES nodes.
An index that is 5 GB in size: you will want to set shards to 1 and replicas* to 1 or 2 and be on 1 ES node or spread across 3 ES nodes (recommended).
⚠️ Replicas help with search performance, redundancy and provide fault tolerance. When you change shard/replica numbers, you have to delete the index and re-scan.
Estimating Elasticsearch Storage Requirements
Individual Index Size
1 GB for every 5 million files/folders
20 GB for every 100 million files/folders
⚠️ The size of the files is not relevant.
Replicas/Shard Sizes
Replicas increase the size requirements by the number of replicas. For example, a 20 GB index with 2 replicas will require a total storage capacity of 60 GB since a copy of the index (all docs) is on other Elasticsearch nodes. Multiple shards do not increase the size of an index but increase the diffusion of the data across the total number of shards for the index, in a multiple node cluster this helps with redundancy especially when also leveraging replicas.
⚠️ The number of docs per shard is limited to 2 billion, which is a hard Lucene limit.
Rolling Indices
Each Diskover scan results in the creation of a new Elasticsearch index.
Multiple indices for a target location can be maintained to keep the history of your storage.
Elasticsearch overall storage requirements will depend on historical index requirements.
For rolling indices, you can multiply the amount of data generated for a storage index by the number of indices desired for retention period. For example, if you generate 2 GB for a day for a given storage index, and you want to keep 30 days of indices, 60 GB of storage is required to maintain a total of 30 indices.
Requirements for POC and Production Deployments
Proof of Concept |
Production Deployment |
|
|---|---|---|
Nodes |
1 node |
3 nodes for performance and redundancy are recommended |
CPU |
8 to 32 cores |
8 to 32 cores |
RAM |
8 to 16 GB (8 GB reserved to Elasticsearch memory heap) |
64 GB per node (32 GB reserved to Elasticsearch memory heap |
DISK |
250 to 500 GB of SSD storage per node (root 150 GB, home 25 GB, var 800 GB) |
1 TB of SSD storage per node (root 150 GB, home 25 GB, var 800 GB) |
AWS Instance Sizing Resource Requirements
AWS Elasticsearch Domain |
AWS EC2 Web-Server |
AWS Indexers |
|
|---|---|---|---|
Minimum |
i3.large |
t3.small |
t3.large |
Recommended |
i3.xlarge |
t3.medium |
t3.xlarge |
Diskover-Web Server Requirements
The Diskover-Web HTML5 user interface requires a Web server platform. It provides visibility, analysis, workflows, and file actions from the indices that reside on the Elasticsearch endpoint.
Requirements for POC and Production Deployments
Proof of Concept |
Production Deployment |
|
|---|---|---|
CPU |
8 to 32 cores |
8 to 32 cores |
RAM |
8 to 16 GB |
8 to 16 GB |
DISK |
100 GB of SSD storage (root 75 GB, home 25 GB) |
100 GB of SSD storage (root 75 GB, home 25 GB) |
Diskover Scanners Requirements
You can install Diskover scanners on a server or virtual machine. Multiple scanners can be run on a single machine or multiple machines for parallel crawling.
The scanning host uses a separate thread for each directory at level 1 of a top crawl directory. If you have many directories at level 1, you will want to increase the number of CPU cores and adjust max threads in the diskover config. This parameter, as well as many others, can be configured from the user interface, which contains help text to guide you.
Requirements for POC and Deployment
Proof of Concept |
Production Deployment |
|
|---|---|---|
CPU |
8 to 32 cores |
8 to 32 cores |
RAM |
8 to 16 GB |
8 to 16 GB |
DISK |
250 to 500 GB SSD |
500 GB (root 450 GB, home 25 GB) |
Skills and Knowledge Requirements
This document is intended for Service Professionals and System Administrators who install the Diskover software components. The installer should have strong familiarity with:
Operating System on which on-premise Diskover scanner(s) are installed.
-
Basic knowledge of:
EC2 Operating System on which Diskover-Web HTML5 user interface is installed.
Configuring a Web Server (NGINX).
Client side networking configuration and requirements.
⚠️ Attempting to install and configure Diskover without proper experience or training can affect system performance and security configuration.
⏱️ The initial install, configuration, and deployment of the Diskover are expected to take 1 to 3 hours, depending on the size of your environment and the time consumed with network connectivity.
Comments
0 comments
Please sign in to leave a comment.