Files

Chris Lu 8ec9ff4a12 Refactor plugin system and migrate worker runtime (#8369 )

* admin: add plugin runtime UI page and route wiring

* pb: add plugin gRPC contract and generated bindings

* admin/plugin: implement worker registry, runtime, monitoring, and config store

* admin/dash: wire plugin runtime and expose plugin workflow APIs

* command: add flags to enable plugin runtime

* admin: rename remaining plugin v2 wording to plugin

* admin/plugin: add detectable job type registry helper

* admin/plugin: add scheduled detection and dispatch orchestration

* admin/plugin: prefetch job type descriptors when workers connect

* admin/plugin: add known job type discovery API and UI

* admin/plugin: refresh design doc to match current implementation

* admin/plugin: enforce per-worker scheduler concurrency limits

* admin/plugin: use descriptor runtime defaults for scheduler policy

* admin/ui: auto-load first known plugin job type on page open

* admin/plugin: bootstrap persisted config from descriptor defaults

* admin/plugin: dedupe scheduled proposals by dedupe key

* admin/ui: add job type and state filters for plugin monitoring

* admin/ui: add per-job-type plugin activity summary

* admin/plugin: split descriptor read API from schema refresh

* admin/ui: keep plugin summary metrics global while tables are filtered

* admin/plugin: retry executor reservation before timing out

* admin/plugin: expose scheduler states for monitoring

* admin/ui: show per-job-type scheduler states in plugin monitor

* pb/plugin: rename protobuf package to plugin

* admin/plugin: rename pluginRuntime wiring to plugin

* admin/plugin: remove runtime naming from plugin APIs and UI

* admin/plugin: rename runtime files to plugin naming

* admin/plugin: persist jobs and activities for monitor recovery

* admin/plugin: lease one detector worker per job type

* admin/ui: show worker load from plugin heartbeats

* admin/plugin: skip stale workers for detector and executor picks

* plugin/worker: add plugin worker command and stream runtime scaffold

* plugin/worker: implement vacuum detect and execute handlers

* admin/plugin: document external vacuum plugin worker starter

* command: update plugin.worker help to reflect implemented flow

* command/admin: drop legacy Plugin V2 label

* plugin/worker: validate vacuum job type and respect min interval

* plugin/worker: test no-op detect when min interval not elapsed

* command/admin: document plugin.worker external process

* plugin/worker: advertise configured concurrency in hello

* command/plugin.worker: add jobType handler selection

* command/plugin.worker: test handler selection by job type

* command/plugin.worker: persist worker id in workingDir

* admin/plugin: document plugin.worker jobType and workingDir flags

* plugin/worker: support cancel request for in-flight work

* plugin/worker: test cancel request acknowledgements

* command/plugin.worker: document workingDir and jobType behavior

* plugin/worker: emit executor activity events for monitor

* plugin/worker: test executor activity builder

* admin/plugin: send last successful run in detection request

* admin/plugin: send cancel request when detect or execute context ends

* admin/plugin: document worker cancel request responsibility

* admin/handlers: expose plugin scheduler states API in no-auth mode

* admin/handlers: test plugin scheduler states route registration

* admin/plugin: keep worker id on worker-generated activity records

* admin/plugin: test worker id propagation in monitor activities

* admin/dash: always initialize plugin service

* command/admin: remove plugin enable flags and default to enabled

* admin/dash: drop pluginEnabled constructor parameter

* admin/plugin UI: stop checking plugin enabled state

* admin/plugin: remove docs for plugin enable flags

* admin/dash: remove unused plugin enabled check method

* admin/dash: fallback to in-memory plugin init when dataDir fails

* admin/plugin API: expose worker gRPC port in status

* command/plugin.worker: resolve admin gRPC port via plugin status

* split plugin UI into overview/configuration/monitoring pages

* Update layout_templ.go

* add volume_balance plugin worker handler

* wire plugin.worker CLI for volume_balance job type

* add erasure_coding plugin worker handler

* wire plugin.worker CLI for erasure_coding job type

* support multi-job handlers in plugin worker runtime

* allow plugin.worker jobType as comma-separated list

* admin/plugin UI: rename to Workers and simplify config view

* plugin worker: queue detection requests instead of capacity reject

* Update plugin_worker.go

* plugin volume_balance: remove force_move/timeout from worker config UI

* plugin erasure_coding: enforce local working dir and cleanup

* admin/plugin UI: rename admin settings to job scheduling

* admin/plugin UI: persist and robustly render detection results

* admin/plugin: record and return detection trace metadata

* admin/plugin UI: show detection process and decision trace

* plugin: surface detector decision trace as activities

* mini: start a plugin worker by default

* admin/plugin UI: split monitoring into detection and execution tabs

* plugin worker: emit detection decision trace for EC and balance

* admin workers UI: split monitoring into detection and execution pages

* plugin scheduler: skip proposals for active assigned/running jobs

* admin workers UI: add job queue tab

* plugin worker: add dummy stress detector and executor job type

* admin workers UI: reorder tabs to detection queue execution

* admin workers UI: regenerate plugin template

* plugin defaults: include dummy stress and add stress tests

* plugin dummy stress: rotate detection selections across runs

* plugin scheduler: remove cross-run proposal dedupe

* plugin queue: track pending scheduled jobs

* plugin scheduler: wait for executor capacity before dispatch

* plugin scheduler: skip detection when waiting backlog is high

* plugin: add disk-backed job detail API and persistence

* admin ui: show plugin job detail modal from job id links

* plugin: generate unique job ids instead of reusing proposal ids

* plugin worker: emit heartbeats on work state changes

* plugin registry: round-robin tied executor and detector picks

* add temporary EC overnight stress runner

* plugin job details: persist and render EC execution plans

* ec volume details: color data and parity shard badges

* shard labels: keep parity ids numeric and color-only distinction

* admin: remove legacy maintenance UI routes and templates

* admin: remove dead maintenance endpoint helpers

* Update layout_templ.go

* remove dummy_stress worker and command support

* refactor plugin UI to job-type top tabs and sub-tabs

* migrate weed worker command to plugin runtime

* remove plugin.worker command and keep worker runtime with metrics

* update helm worker args for jobType and execution flags

* set plugin scheduling defaults to global 16 and per-worker 4

* stress: fix RPC context reuse and remove redundant variables in ec_stress_runner

* admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants

* admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API

* admin/handlers: implement buffered rendering to prevent response corruption

* admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups

* admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve

* admin/plugin: implement atomic file writes and fix run record side effects

* admin/plugin: use P prefix for parity shard labels in execution plans

* admin/plugin: enable parallel execution for cancellation tests

* admin: refactor time.Time fields to pointers for better JSON omitempty support

* admin/plugin: implement pointer-safe time assignments and comparisons in plugin core

* admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor

* admin/plugin: update scheduler activity tracking to use time pointers

* admin/plugin: fix time-based run history trimming after pointer refactor

* admin/dash: fix JobSpec struct literal in plugin API after pointer refactor

* admin/view: add D/P prefixes to EC shard badges for UI consistency

* admin/plugin: use lifecycle-aware context for schema prefetching

* Update ec_volume_details_templ.go

* admin/stress: fix proposal sorting and log volume cleanup errors

* stress: refine ec stress runner with math/rand and collection name

- Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction.
- Replaced crypto/rand with seeded math/rand PRNG for bulk payloads.
- Added documentation for EcMinAge zero-value behavior.
- Added logging for ignored errors in volume/shard deletion.

* admin: return internal server error for plugin store failures

Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors.

* admin: implement safe channel sends and graceful shutdown sync

- Added sync.WaitGroup to Plugin struct to manage background goroutines.
- Implemented safeSendCh helper using recover() to prevent panics on closed channels.
- Ensured Shutdown() waits for all background operations to complete.

* admin: robustify plugin monitor with nil-safe time and record init

- Standardized nil-safe assignment for *time.Time pointers (CreatedAt, UpdatedAt, CompletedAt).
- Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk.
- Fixed debounced persistence to trigger immediate write on job completion.

* admin: improve scheduler shutdown behavior and logic guards

- Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection.
- Removed redundant nil guard in buildScheduledJobSpec.
- Standardized WaitGroup usage for schedulerLoop.

* admin: implement deep copy for job parameters and atomic write fixes

- Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state.
- Ensured atomicWriteFile creates parent directories before writing.

* admin: remove unreachable branch in shard classification

Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded.

* admin: secure UI links and use canonical shard constants

- Added rel="noopener noreferrer" to external links for security.
- Replaced magic number 14 with erasure_coding.TotalShardsCount.
- Used renderEcShardBadge for missing shard list consistency.

* admin: stabilize plugin tests and fix regressions

- Composed a robust plugin_monitor_test.go to handle asynchronous persistence.
- Updated all time.Time literals to use timeToPtr helper.
- Added explicit Shutdown() calls in tests to synchronize with debounced writes.
- Fixed syntax errors and orphaned struct literals in tests.

* Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* admin: finalize refinements for error handling, scheduler, and race fixes

- Standardized HTTP 500 status codes for store failures in plugin_api.go.
- Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown.
- Fixed race condition in safeSendDetectionComplete by extracting channel under lock.
- Implemented deep copy for JobActivity details.
- Used defaultDirPerm constant in atomicWriteFile.

* test(ec): migrate admin dockertest to plugin APIs

* admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors

* admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures

* admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage

* admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID

* admin/plugin: fix racy Shutdown channel close with sync.Once

* admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg

* admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only

* admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators

* test/ec: check http.NewRequest errors to prevent nil req panics

* test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1

* plugin(ec): raise default detection and scheduling throughput limits

* topology: include empty disks in volume list and EC capacity fallback

* topology: remove hard 10-task cap for detection planning

* Update ec_volume_details_templ.go

* adjust default

* fix tests

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

2026-02-18 13:42:41 -08:00

dashboards

feat: add S3 bucket size and object count metrics (#7776 )

2025-12-15 19:23:25 -08:00

templates

Refactor plugin system and migrate worker runtime (#8369 )

2026-02-18 13:42:41 -08:00

.helmignore

Added helm chart publish github actions to github pages on tagging (#4219 )

2023-02-17 22:25:50 -08:00

Chart.yaml

4.13

2026-02-16 17:01:19 -08:00

README.md

Refactor plugin system and migrate worker runtime (#8369 )

2026-02-18 13:42:41 -08:00

values.yaml

Refactor plugin system and migrate worker runtime (#8369 )

2026-02-18 13:42:41 -08:00

README.md

SEAWEEDFS - helm chart (2.x+)

Add the helm repo

helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm

Install the helm chart

helm install seaweedfs seaweedfs/seaweedfs

(Recommended) Provide `values.yaml`

helm install --values=values.yaml seaweedfs seaweedfs/seaweedfs

Info:

master/filer/volume are stateful sets with anti-affinity on the hostname, so your deployment will be spread/HA.
chart is using memsql(mysql) as the filer backend to enable HA (multiple filer instances) and backup/HA memsql can provide.
mysql user/password are created in a k8s secret (default: <release>-seaweedfs-db-secret) and injected to the filer with ENV.
cert config exists and can be enabled, but not been tested, requires cert-manager to be installed.

Prerequisites

Database

leveldb is the default database, this supports multiple filer replicas that will sync automatically, with some limitations.

When the limitations apply, or for a large number of filer replicas, an external datastore is recommended.

Such as MySQL-compatible database, as specified in the values.yaml at filer.extraEnvironmentVars. This database should be pre-configured and initialized. If using the default db-init-config, the configmap name is now dynamic (e.g., <release>-seaweedfs-db-init-config). You can override this name via filer.dbInitConfigName.

To initialize manually:

CREATE TABLE IF NOT EXISTS `filemeta` (
  `dirhash`   BIGINT NOT NULL       COMMENT 'first 64 bits of MD5 hash value of directory field',
  `name`      VARCHAR(766) NOT NULL COMMENT 'directory or file name',
  `directory` TEXT NOT NULL         COMMENT 'full path to parent directory',
  `meta`      LONGBLOB,
  PRIMARY KEY (`dirhash`, `name`)
) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Alternative database can also be configured (e.g. leveldb, postgres) following the instructions at filer.extraEnvironmentVars.

Node Labels

Kubernetes nodes can have labels which help to define which node(Host) will run which pod:

Here is an example:

s3/filer/master needs the label sw-backend=true
volume need the label sw-volume=true

to label a node to be able to run all pod types in k8s:

kubectl label node YOUR_NODE_NAME sw-volume=true sw-backend=true

on production k8s deployment you will want each pod to have a different host, especially the volume server and the masters, all pods (master/volume/filer) should have anti-affinity rules to disallow running multiple component pods on the same host.

If you still want to run multiple pods of the same component (master/volume/filer) on the same host please set/update the corresponding affinity rule in values.yaml to an empty one:

affinity: ""

PVC - storage class

On the volume stateful set added support for k8s PVC, currently example with the simple local-path-provisioner from Rancher (comes included with k3d / k3s) https://github.com/rancher/local-path-provisioner

you can use ANY storage class you like, just update the correct storage-class for your deployment.

current instances config (AIO):

1 instance for each type (master/filer+s3/volume)

You can update the replicas count for each node type in values.yaml, need to add more nodes with the corresponding labels if applicable.

Most of the configuration are available through values.yaml any pull requests to expand functionality or usability are greatly appreciated. Any pull request must pass chart-testing.

S3 configuration

To enable an s3 endpoint for your filer with a default install add the following to your values.yaml:

filer:
  s3:
    enabled: true

Enabling Authentication to S3

To enable authentication for S3, you have two options:

let the helm chart create an admin user as well as a read only user
provide your own s3 config.json file via an existing Kubernetes Secret

Use the default credentials for S3

Example parameters for your values.yaml:

filer:
  s3:
    enabled: true
    enableAuth: true

Provide your own credentials for S3

Example parameters for your values.yaml:

filer:
  s3:
    enabled: true
    enableAuth: true
    existingConfigSecret: my-s3-secret

Example existing secret with your s3 config to create an admin user and readonly user, both with credentials:

---
# Source: seaweedfs/templates/seaweedfs-s3-secret.yaml
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: my-s3-secret
  namespace: seaweedfs
  labels:
    app.kubernetes.io/name: seaweedfs
    app.kubernetes.io/component: s3
stringData:
  # this key must be an inline json config file
  seaweedfs_s3_config: '{"identities":[{"name":"anvAdmin","credentials":[{"accessKey":"snu8yoP6QAlY0ne4","secretKey":"PNzBcmeLNEdR0oviwm04NQAicOrDH1Km"}],"actions":["Admin","Read","Write"]},{"name":"anvReadOnly","credentials":[{"accessKey":"SCigFee6c5lbi04A","secretKey":"kgFhbT38R8WUYVtiFQ1OiSVOrYr3NKku"}],"actions":["Read"]}]}'

Admin Component

The admin component provides a modern web-based administration interface for managing SeaweedFS clusters. It includes:

Dashboard: Real-time cluster status and metrics
Volume Management: Monitor volume servers, capacity, and health
File Browser: Browse and manage files in the filer
Maintenance Operations: Trigger maintenance tasks via workers
Object Store Management: Create and manage buckets with web interface

Enabling Admin

To enable the admin interface, add the following to your values.yaml:

admin:
  enabled: true
  port: 23646
  grpcPort: 33646  # For worker connections
  adminUser: "admin"
  adminPassword: "your-secure-password"  # Leave empty to disable auth
  
  # Optional: persist admin data
  data:
    type: "persistentVolumeClaim"
    size: "10Gi"
    storageClass: "your-storage-class"
  
  # Optional: enable ingress
  ingress:
    enabled: true
    host: "admin.seaweedfs.local"
    className: "nginx"

The admin interface will be available at http://<admin-service>:23646 (or via ingress). Workers connect to the admin server via gRPC on port 33646.

Admin Authentication

If adminPassword is set, the admin interface requires authentication:

Username: Value of adminUser (default: admin)
Password: Value of adminPassword

If adminPassword is empty or not set, the admin interface runs without authentication (not recommended for production).

Admin Data Persistence

The admin component can store configuration and maintenance data. You can configure storage in several ways:

emptyDir (default): Data is lost when pod restarts
persistentVolumeClaim: Data persists across pod restarts
hostPath: Data stored on the host filesystem
existingClaim: Use an existing PVC

Worker Component

Workers are maintenance agents that execute cluster maintenance tasks such as vacuum, volume balancing, and erasure coding. Workers connect to the admin server via gRPC and receive task assignments.

Enabling Workers

To enable workers, add the following to your values.yaml:

worker:
  enabled: true
  replicas: 2  # Scale based on workload
  jobType: "vacuum,volume_balance,erasure_coding"  # Job types this worker can handle
  maxDetect: 1  # Maximum concurrent detection requests
  maxExecute: 4  # Maximum concurrent execution jobs per worker
  
  # Working directory for task execution
  # Default: "/tmp/seaweedfs-worker"
  # Note: /tmp is ephemeral - use persistent storage (hostPath/existingClaim) for long-running tasks
  workingDir: "/tmp/seaweedfs-worker"
  
  # Optional: configure admin server address
  # If not specified, auto-discovers from admin service in the same namespace by looking for
  # a service named "<release-name>-admin" (e.g., "seaweedfs-admin").
  # Auto-discovery only works if the admin is in the same namespace and same Helm release.
  # For cross-namespace or separate release scenarios, explicitly set this value.
  # Example: If main SeaweedFS is deployed in "production" namespace:
  #   adminServer: "seaweedfs-admin.production.svc:33646"
  adminServer: ""
  
  # Workers need storage for task execution
  # Note: Workers use a Deployment, which does not support `volumeClaimTemplates` 
  # for dynamic PVC creation per pod. To use persistent storage, you must 
  # pre-provision a PersistentVolumeClaim and use `type: "existingClaim"`.
  data:
    type: "emptyDir"  # Options: "emptyDir", "hostPath", or "existingClaim"
    hostPathPrefix: /storage  # For hostPath
    # claimName: "worker-pvc"  # For existingClaim with pre-provisioned PVC
  
  # Resource limits for worker pods
  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"
    limits:
      cpu: "2"
      memory: "2Gi"

Worker Job Types

Workers can be configured with different job types:

vacuum: Reclaim deleted file space
volume_balance: Balance volumes across volume servers
erasure_coding: Handle erasure coding operations

You can configure workers with all job types or create specialized worker pools with specific job types.

Worker Deployment Strategy

For production deployments, consider:

Multiple Workers: Deploy 2+ worker replicas for high availability
Resource Allocation: Workers need sufficient CPU/memory for maintenance tasks
Storage: Workers need temporary storage for vacuum and balance operations (size depends on volume size)
Specialized Workers: Create separate worker deployments for different job types if needed

Example specialized worker configuration:

For specialized worker pools, deploy separate Helm releases with different job types:

values-worker-vacuum.yaml (for vacuum operations):

# Disable all other components, enable only workers
master:
  enabled: false
volume:
  enabled: false
filer:
  enabled: false
s3:
  enabled: false
admin:
  enabled: false

worker:
  enabled: true
  replicas: 2
  jobType: "vacuum"
  maxExecute: 2
  # REQUIRED: Point to the admin service of your main SeaweedFS release
  # Replace <namespace> with the namespace where your main seaweedfs is deployed
  # Example: If deploying in namespace "production":
  #   adminServer: "seaweedfs-admin.production.svc:33646"
  adminServer: "seaweedfs-admin.<namespace>.svc:33646"

values-worker-balance.yaml (for balance operations):

# Disable all other components, enable only workers
master:
  enabled: false
volume:
  enabled: false
filer:
  enabled: false
s3:
  enabled: false
admin:
  enabled: false

worker:
  enabled: true
  replicas: 1
  jobType: "volume_balance"
  maxExecute: 1
  # REQUIRED: Point to the admin service of your main SeaweedFS release
  # Replace <namespace> with the namespace where your main seaweedfs is deployed
  # Example: If deploying in namespace "production":
  #   adminServer: "seaweedfs-admin.production.svc:33646"
  adminServer: "seaweedfs-admin.<namespace>.svc:33646"

Deploy the specialized workers as separate releases:

# Deploy vacuum workers
helm install seaweedfs-worker-vacuum seaweedfs/seaweedfs -f values-worker-vacuum.yaml

# Deploy balance workers
helm install seaweedfs-worker-balance seaweedfs/seaweedfs -f values-worker-balance.yaml

Enterprise

For enterprise users, please visit seaweedfs.com for the SeaweedFS Enterprise Edition, which has a self-healing storage format with better data protection.

README.md

SEAWEEDFS - helm chart (2.x+)

Add the helm repo

Install the helm chart

(Recommended) Provide values.yaml

Info:

Prerequisites

Database

Node Labels

PVC - storage class

current instances config (AIO):

S3 configuration

Enabling Authentication to S3

Use the default credentials for S3

Provide your own credentials for S3

Admin Component

Enabling Admin

Admin Authentication

Admin Data Persistence

Worker Component

Enabling Workers

Worker Job Types

Worker Deployment Strategy

Enterprise

(Recommended) Provide `values.yaml`