Diskover API
Overview
The Diskover API provides programmatic access to file system index data, search, tagging, task management, and platform configuration. It is split into two APIs:
API | Base path | Auth | Purpose |
|---|---|---|---|
Main API |
| HTTP Basic or JWT Bearer | Search, indexing, tagging, task and worker management |
Admin API |
| JWT Bearer | Configuration management, scopes, templates, workers |
Base URL examples:
https://your-diskover-host/api.php https://your-diskover-host/diskover_admin/api/
All endpoints in this document are appended to the appropriate base URL. Examples use https://diskover.example.com as a placeholder.
Authentication
HTTP Basic Authentication
Most Main API endpoints accept HTTP Basic Auth:
curl -u username:password https://diskover.example.com/api.php/list
JWT Bearer Token
JWT tokens are required for all Admin API endpoints and can optionally be used for Main API endpoints.
Step 1 — Generate a token (using Basic Auth):
curl -u username:password https://diskover.example.com/api.php/generate_token
Step 2 — Use the token in subsequent requests:
curl -H "Authorization: Bearer <access_token>" https://diskover.example.com/api.php/list
Tokens expire. Use /refresh_token to obtain a new access token without re-authenticating.
Admin API: All
/diskover_admin/api/endpoints require JWT Bearer auth. HTTP Basic Auth is not accepted on the Admin API.
LDAP / Active Directory
LDAP users authenticate with their directory credentials. Group membership determines access level — standard users have read access to search endpoints; admin group members have full API access.
Response Format
All Main API responses follow this structure:
Success (HTTP 200):
{
"status": 200,
"message": {
"data": { ... }
}
}
Error (HTTP 4xx / 5xx):
{
"status": 400,
"message": "Error description"
}
Some endpoints return data directly under message without a data key (noted per endpoint below).
Error Codes
Code | Meaning |
|---|---|
| Success |
| Bad request — missing or invalid parameters |
| Unauthorized — invalid or missing credentials |
| Feature requires a higher license tier |
| Forbidden — insufficient permissions |
| Resource not found |
| Internal server error |
Main API Endpoints
Authentication
Generate JWT Token
GET /generate_token
Generate access and refresh tokens. Requires HTTP Basic Auth.
curl -u username:password https://diskover.example.com/api.php/generate_token
Response:
{
"status": 200,
"message": {
"data": {
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
"refresh_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
}
}
}
Refresh JWT Token
POST /refresh_token
Generate a new access token from a valid refresh token.
Request body:
{
"token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
}
Response: Same structure as /generate_token.
Validate Session Token
POST /token_validation
Validates a PHP web session token. Used by diskover-admin to verify that a browser session from diskover-web is active. Returns whether the session token is valid — does not validate JWT tokens.
Request body:
{
"session_token": "abc123def456"
}
Response:
{
"status": true,
"message": {
"valid": false
}
}
Note:
validis returned directly undermessage, not undermessage.data.statusis a boolean in this endpoint, not an integer.
File Action Authorization
POST /fileactionauth
Check whether the current user is authorized to use a specific file action.
Request body:
{
"fileaction_name": "pdf"
}
Response:
{
"status": 200,
"message": {
"data": {
"authorized": true
}
}
}
Index Management
List Indices
GET /list
Returns all Diskover indices in Elasticsearch, sorted by creation date descending.
curl -u username:password https://diskover.example.com/api.php/list
Response:
{
"status": 200,
"message": {
"data": [
{
"index": "diskover-prod-240318143022",
"docs.count": "1024000",
"store.size": "2.5gb",
"health": "green",
"status": "open"
}
]
}
}
Get Latest Index
GET /latest
GET /latest?toppath={path}
Returns the most recently created index for each top path, or for a specific path if toppath is provided.
# All paths curl -u username:password https://diskover.example.com/api.php/latest # Specific path curl -u username:password "https://diskover.example.com/api.php/latest?toppath=/data/shared"
Response (all paths):
{
"status": 200,
"message": {
"data": {
"/data/shared": "diskover-shared-240318143022",
"/data/archive": "diskover-archive-240317091500"
}
}
}
Response (single path):
{
"status": 200,
"message": {
"data": "diskover-shared-240318143022"
}
}
Search and Analytics
Search Files and Directories
GET /{index}/search?query={query}&size={size}&page={page}
Parameters:
Parameter | Required | Default | Description |
|---|---|---|---|
| Yes | — | Lucene query string |
| No | 1000 | Results per page |
| No | 1 | Page number |
Query syntax examples:
*.pdf size:>1048576 mtime:[now-7d TO now] owner:john AND extension:doc type:file AND NOT tags:archive
curl -u username:password \ "https://diskover.example.com/api.php/diskover-prod-240318/search?query=*.pdf%20AND%20size:>1048576&size=50"
Response:
{
"status": 200,
"message": {
"totalhits": 245,
"data": [
{
"_index": "diskover-prod-240318",
"_id": "abc123",
"_source": {
"name": "report.pdf",
"parent_path": "/data/documents",
"size": 2457600,
"owner": "john",
"group": "users",
"mtime": "2024-03-18T10:30:00Z",
"type": "file",
"extension": "pdf",
"tags": ["reviewed"]
}
}
]
}
}
Get Disk Space
GET /{index}/diskspace
Disk space totals for an index (sourced from spaceinfo documents written at crawl time).
Response:
{
"status": 200,
"message": {
"data": [
{
"path": "/data",
"totalSize": 1099511627776,
"usedSize": 879609302221,
"freeSize": 219902325555,
"freePercent": 20.0,
"availableSize": 219902325555,
"availablePercent": 20.0
}
]
}
}
Get Disk Space Across Multiple Indices
GET /multi-diskspace?indices={index1,index2,...}
Aggregates disk space data across one or more indices. Accepts a comma-separated list of index names.
curl -u username:password \ "https://diskover.example.com/api.php/multi-diskspace?indices=diskover-prod-240318,diskover-archive-240317"
Response: Same structure as /{index}/diskspace.
Get Top Paths
GET /{index}/toppaths
Returns the top-level paths that were crawled in the index.
Response:
{
"status": 200,
"message": {
"data": ["/data/shared", "/data/archive", "/data/projects"]
}
}
Metrics and Aggregations
Get Metrics
GET /{index}/metrics?field={field}&type={type}&size={size}&interval={interval}
Parameters:
Parameter | Default | Description |
|---|---|---|
|
| Field to aggregate on. Common: |
|
| Filter by doc type: |
|
| Number of top buckets to return |
|
| For date fields: |
Date fields (mtime, atime, ctime) use date histogram aggregation. All other fields use terms aggregation.
# Distribution by extension curl -u username:password \ "https://diskover.example.com/api.php/diskover-prod-240318/metrics?field=extension&type=file&size=10" # Distribution by owner curl -u username:password \ "https://diskover.example.com/api.php/diskover-prod-240318/metrics?field=owner"
Response:
{
"status": 200,
"message": {
"index": "diskover-prod-240318",
"field": "extension",
"metrics": {
"overall_stats": {
"total_size": 1099511627776,
"total_size_formatted": "1.0 TB",
"total_count": 209715
},
"overall_counts": {
"total_files": 195000,
"total_directories": 14715,
"total_items": 209715
},
"breakdown": [
{
"key": "pdf",
"doc_count": 45230,
"total_size": 236223201280,
"total_size_formatted": "220.0 GB",
"avg_size": 5222687,
"avg_size_formatted": "4.98 MB",
"size_percentage": 21.5
}
]
}
}
}
Tagging
Get Tag Counts
GET /{index}/tagcount
GET /{index}/tagcount?tag={tag}&type={type}
Parameters:
Parameter | Description |
|---|---|
| Specific tag to count. Omit to return counts for all tags. |
|
|
Response (specific tag):
{"status": 200, "message": {"data": 150}}
Response (all tags):
{
"status": 200,
"message": {
"data": {"important": 150, "archive": 75, "delete": 23}
}
}
Get Tag Sizes
GET /{index}/tagsize
GET /{index}/tagsize?tag={tag}&type={type}
Total size in bytes of items with a given tag. Same parameters as tagcount.
Response:
{"status": 200, "message": {"data": 107374182400}}
List Tagged Items
GET /{index}/tags?tag={tag}&size={size}&page={page}
Returns documents matching a given tag. Omit tag to return untagged items.
Tag Directories
PUT /{index}/tagdirs
Apply or remove tags on directories. Pass an empty tags array to remove all tags.
Request body:
{
"dirs": ["/data/projects/2024", "/data/archive"],
"tags": ["archive", "reviewed"],
"recursive": "true",
"tagfiles": "true"
}
Field | Description |
|---|---|
| Array of directory paths |
| Tags to apply. Empty array removes all tags. |
| Apply to subdirectories: |
| Apply to files within the directories: |
Response:
{"status": 200, "message": "523 directory docs updated"}
Tag Files
PUT /{index}/tagfiles
Request body:
{
"files": ["/data/documents/report.pdf", "/data/documents/summary.doc"],
"tags": ["reviewed", "approved"]
}
Response:
{"status": 200, "message": "2 file docs updated"}
Task Management
List Tasks
GET /tasks
GET /tasks?worker={worker}&id={id}&name={name}&alltasks={bool}
Parameter | Description |
|---|---|
| Filter by worker name |
| Get a specific task by ID |
| Get a specific task by name |
| Include disabled and running tasks |
Response:
{
"status": 200,
"message": {
"data": {
"tasks": [
{
"id": "abc123def456",
"type": "index",
"name": "Daily Scan - Shared Drive",
"crawl_paths": "/data/shared",
"run_min": "0",
"run_hour": "2",
"run_day_month": "*",
"run_month": "*",
"run_day_week": "*",
"last_status": "success",
"last_start_time": "2024-03-18T02:00:00Z",
"last_finish_time": "2024-03-18T03:45:00Z",
"disabled": false,
"assigned_worker": "worker-01"
}
]
}
}
}
Add Task
POST /addtask
Request body (index task):
{
"type": "index",
"name": "Weekly Full Scan",
"description": "Complete filesystem scan",
"crawl_paths": "/data/shared",
"run_min": "0",
"run_hour": "2",
"run_day_month": "*",
"run_month": "*",
"run_day_week": "0",
"auto_index_name": true,
"retries": 3,
"retry_delay": 300,
"timeout": 3600,
"email": "admin@company.com",
"assigned_worker": "any",
"disabled": false,
"run_now": false
}
Request body (custom task):
{
"type": "custom",
"name": "Cleanup Old Logs",
"run_command": "/usr/local/bin/cleanup.sh",
"run_command_args": "--days 30",
"run_min": "0",
"run_hour": "1",
"run_day_month": "*",
"run_month": "*",
"run_day_week": "*",
"retries": 1,
"retry_delay": 60,
"timeout": 1800,
"assigned_worker": "worker-01",
"disabled": false
}
Response:
{
"status": 200,
"message": {
"id": "xyz789ghi012",
"type": "index",
"name": "Weekly Full Scan"
}
}
Update Task
PUT /updatetask
Request body:
{
"id": "abc123def456",
"status": "clear",
"disabled": false,
"run_hour": "3"
}
Valid status values: clear, starting, running, success, failed, completed
Response: {"status": 200, "message": "task updated"}
Delete Task
DELETE /deletetask
Request body (by ID):
{"id": "abc123def456"}
Request body (by name):
{"name": "Weekly Full Scan"}
Response: {"status": 200, "message": "task deleted"}
Worker Management
List Workers
GET /workers
Response:
{
"status": 200,
"message": {
"data": [
{
"name": "worker-01",
"hostname": "server01.company.com",
"state": "idle",
"last_heartbeat": "2024-03-18T10:30:00Z",
"disabled": false
}
]
}
}
Get Worker Info
GET /workerinfo?worker={name}
Get Worker for Index
GET /{index}/worker4index
Returns the worker that created a specific index.
Get Worker for Path
GET /worker4path?path={path}
Returns the appropriate worker for a given file path based on existing index coverage.
Update Worker
PUT /updateworker
Update worker registration data (typically called by the worker process itself).
Worker Heartbeat
PUT /heartbeat
Keep-alive signal from a worker. Called periodically by diskoverd.
Request body:
{"name": "worker-01"}
Claim Task
PUT /claimtask
Worker claims a task for execution.
Request body:
{"task_id": "abc123def456", "worker_name": "worker-01"}
Add Task Log Entry
PUT /tasklog
Worker posts a task execution record on completion.
Request body:
{
"task_id": "abc123def456",
"task_name": "Weekly Scan",
"task_type": "index",
"worker": "worker-01",
"start_time": "2024-03-18T02:00:00Z",
"finish_time": "2024-03-18T04:30:00Z",
"task_time": 9000,
"status": "finished",
"error": ""
}
Admin API Endpoints
The Admin API is served by diskover-admin at /diskover_admin/api/. All endpoints require a JWT Bearer token obtained from the Main API's /generate_token endpoint.
# Get token TOKEN=$(curl -s -u admin:password https://diskover.example.com/api.php/generate_token \ | python3 -c "import sys,json; print(json.load(sys.stdin)['message']['data']['access_token'])") # Use token for admin API curl -H "Authorization: Bearer $TOKEN" \ https://diskover.example.com/diskover_admin/api/config/info
Note: The Admin API base path is
/diskover_admin/api/, not/api.php. The host is the same.
System Info
Get System Info
GET /diskover_admin/api/config/info
Returns Diskover version, build info, and configuration metadata.
Configuration Scopes
Diskover stores all configuration as named scopes in the database. Each scope maps to a component's Pydantic config model.
List All Scopes
GET /diskover_admin/api/config/scopes
Returns the full scope tree. Plugins are nested under Plugins.Index, Plugins.Post Index, and Plugins.File Actions.
Response structure:
{
"Diskover": {
"Configurations": {"default": {...}},
"Elasticsearch": {...}
},
"Plugins": {
"Index": {"mediainfo": {...}, "checksums": {...}},
"Post Index": {"autotag": {...}},
"File Actions": {"pdf": {...}}
},
"Web": {
"General": {...},
"Elasticsearch": {...}
}
}
Get Scope Config
GET /diskover_admin/api/config/scopes/{scope}
Returns the current configuration for a specific scope. The scope string must be URL-encoded (e.g., Plugins.Index.mediainfo.Default → Plugins.Index.mediainfo.Default).
Create / Update Scope Config
POST /diskover_admin/api/config/scopes/{scope}
Create a new named configuration or update an existing one.
Delete Scope Config
DELETE /diskover_admin/api/config/scopes/{scope}
Tasks (Admin API)
The Admin API provides direct task management independent of the worker queue.
List Tasks
GET /diskover_admin/api/config/tasks
Get Task
GET /diskover_admin/api/config/tasks/{task_id}
Update Task
POST /diskover_admin/api/config/tasks/{task_id}
Task Log (Admin API)
Get Task Log
GET /diskover_admin/api/config/tasklog
Add Log Entry
POST /diskover_admin/api/config/tasklog
Clear Task Log
DELETE /diskover_admin/api/config/tasklog
Workers (Admin API)
List Workers
GET /diskover_admin/api/config/workers
Get Worker
GET /diskover_admin/api/config/workers/{name}
Update Worker
POST /diskover_admin/api/config/workers/{name}
Templates (Admin API)
Task templates allow saving a task configuration for reuse.
List Templates
GET /diskover_admin/api/config/templates
Get Template
GET /diskover_admin/api/config/templates/{id}
Create / Update Template
POST /diskover_admin/api/config/templates/{id}
Delete Template
DELETE /diskover_admin/api/config/templates/{id}
Resource Discovery (Admin API)
List Active File Actions
GET /diskover_admin/api/config/fileactions?active=true
Response:
{
"fileactions": [
{"name": "pdf"},
{"name": "fixperms"},
{"name": "liveview"}
]
}
List Scanners
GET /diskover_admin/api/config/scanners
Response:
{
"scanners": ["scandir_s3", "scandir_azure", "scandir_powerscale"]
}
Code Examples
Python
import requests
class DiskoverAPI:
def __init__(self, base_url, username, password):
self.base_url = base_url.rstrip('/')
self.admin_base = base_url.replace('/api.php', '')
self.auth = (username, password)
self.token = None
def generate_token(self):
r = requests.get(f'{self.base_url}/generate_token', auth=self.auth)
r.raise_for_status()
self.token = r.json()['message']['data']['access_token']
return self.token
def _bearer_headers(self):
return {'Authorization': f'Bearer {self.token}'}
def search(self, index, query, size=100, page=1):
r = requests.get(
f'{self.base_url}/{index}/search',
auth=self.auth,
params={'query': query, 'size': size, 'page': page}
)
r.raise_for_status()
return r.json()
def tag_files(self, index, files, tags):
r = requests.put(
f'{self.base_url}/{index}/tagfiles',
auth=self.auth,
json={'files': files, 'tags': tags}
)
r.raise_for_status()
return r.json()
def get_admin_config_info(self):
r = requests.get(
f'{self.admin_base}/diskover_admin/api/config/info',
headers=self._bearer_headers()
)
r.raise_for_status()
return r.json()
# Usage
api = DiskoverAPI('https://diskover.example.com/api.php', 'admin', 'password')
# Generate token (needed for admin API)
api.generate_token()
# Search for large PDFs
results = api.search('diskover-prod-240318', '*.pdf AND size:>10485760', size=50)
print(f"Found {results['message']['totalhits']} large PDFs")
# Tag files
api.tag_files('diskover-prod-240318',
['/data/docs/report.pdf'],
['reviewed'])
# Admin API
info = api.get_admin_config_info()
curl — Common Operations
BASE="https://diskover.example.com/api.php"
USER="admin:password"
INDEX="diskover-prod-240318"
# List indices
curl -u $USER "$BASE/list"
# Search
curl -u $USER "$BASE/$INDEX/search?query=*.mp4%20AND%20size:>1073741824"
# Get metrics by owner
curl -u $USER "$BASE/$INDEX/metrics?field=owner&size=20"
# Tag a directory recursively
curl -u $USER -X PUT "$BASE/$INDEX/tagdirs" \
-H "Content-Type: application/json" \
-d '{"dirs":["/data/archive/2022"],"tags":["archive"],"recursive":"true","tagfiles":"true"}'
# Remove tags
curl -u $USER -X PUT "$BASE/$INDEX/tagdirs" \
-H "Content-Type: application/json" \
-d '{"dirs":["/data/archive/2022"],"tags":[],"recursive":"true","tagfiles":"true"}'
# Admin API — get system info (requires Bearer token)
TOKEN=$(curl -s -u $USER "$BASE/generate_token" \
| python3 -c "import sys,json; print(json.load(sys.stdin)['message']['data']['access_token'])")
curl -H "Authorization: Bearer $TOKEN" \
"https://diskover.example.com/diskover_admin/api/config/info"
Comments
0 comments
Please sign in to leave a comment.