fix(worker): add metrics HTTP server and health checks for Kubernetes (#7860)
* feat(worker): add metrics HTTP server and debug profiling support - Add -metricsPort flag to enable Prometheus metrics endpoint - Add -metricsIp flag to configure metrics server bind address - Implement /metrics endpoint for Prometheus-compatible metrics - Implement /health endpoint for Kubernetes readiness/liveness probes - Add -debug flag to enable pprof debugging server - Add -debug.port flag to configure debug server port - Fix stats package import naming conflict by using alias - Update usage examples to show new flags Fixes #7843 * feat(helm): add worker metrics and health check support - Update worker readiness probe to use httpGet on /health endpoint - Update worker liveness probe to use httpGet on /health endpoint - Add metricsPort flag to worker command in deployment template - Support both httpGet and tcpSocket probe types for backward compatibility - Update values.yaml with health check configuration This enables Kubernetes pod lifecycle management for worker components through proper health checks on the new metrics HTTP endpoint. * feat(mini): align all services to share single debug and metrics servers - Disable S3's separate debug server in mini mode (port 6060 now shared by all) - Add metrics server startup to embedded worker for health monitoring - All services now share the single metrics port (9327) and single debug port (6060) - Consistent pattern with master, filer, volume, webdav services * fix(worker): fix variable shadowing in health check handler - Rename http.ResponseWriter parameter from 'w' to 'rw' to avoid shadowing the outer 'w *worker.Worker' parameter - Prevents potential bugs if future code tries to use worker state in handler - Improves code clarity and follows Go best practices * fix(worker): remove unused worker parameter in metrics server - Change 'w *worker.Worker' parameter to '_' as it's not used - Clarifies intent that parameter is intentionally unused - Follows Go best practices and improves code clarity * fix(helm): fix trailing backslash syntax errors in worker command - Fix conditional backslash placement to prevent shell syntax errors - Only add backslash when metricsPort OR extraArgs are present - Prevents worker pod startup failures due to malformed command arguments - Ensures proper shell command parsing regardless of configuration state * refactor(worker): use standard stats.StartMetricsServer for consistency - Replace custom metrics server implementation with stats.StartMetricsServer to match pattern used in master, volume, s3, filer_sync components - Simplifies code and improves maintainability - Uses glog.Fatal for errors (consistent with other SeaweedFS components) - Remove unused net/http and prometheus/promhttp imports - Automatically provides /metrics and /health endpoints via standard implementation
This commit is contained in:
@@ -233,8 +233,9 @@ func initMiniS3Flags() {
|
||||
miniS3Options.iamConfig = miniIamConfig
|
||||
miniS3Options.auditLogConfig = cmdMini.Flag.String("s3.auditLogConfig", "", "path to the audit log config file")
|
||||
miniS3Options.allowDeleteBucketNotEmpty = miniS3AllowDeleteBucketNotEmpty
|
||||
miniS3Options.debug = cmdMini.Flag.Bool("s3.debug", false, "serves runtime profiling data via pprof")
|
||||
miniS3Options.debugPort = cmdMini.Flag.Int("s3.debug.port", 6060, "http port for debugging")
|
||||
// In mini mode, S3 uses the shared debug server started at line 681, not its own separate debug server
|
||||
miniS3Options.debug = new(bool) // explicitly false
|
||||
miniS3Options.debugPort = cmdMini.Flag.Int("s3.debug.port", 6060, "http port for debugging (unused in mini mode)")
|
||||
}
|
||||
|
||||
// initMiniWebDAVFlags initializes WebDAV server flag options
|
||||
@@ -1060,6 +1061,10 @@ func startMiniWorker() {
|
||||
// Set admin client
|
||||
workerInstance.SetAdminClient(adminClient)
|
||||
|
||||
// Start metrics server for health checks and monitoring (uses shared metrics port like other services)
|
||||
// This allows Kubernetes probes to check worker health via /health endpoint
|
||||
go stats_collect.StartMetricsServer(*miniMetricsHttpIp, *miniMetricsHttpPort)
|
||||
|
||||
// Start the worker
|
||||
err = workerInstance.Start()
|
||||
if err != nil {
|
||||
|
||||
Reference in New Issue
Block a user