seaweedFS

Author	SHA1	Message	Date
Chris Lu	076d504044	fix(admin): reduce memory usage and verbose logging for large clusters (#8927 ) * fix(admin): reduce memory usage and verbose logging for large clusters (#8919) The admin server used excessive memory and produced thousands of log lines on clusters with many volumes (e.g., 33k volumes). Three root causes: 1. Scanner duplicated all volume metrics: getVolumeHealthMetrics() created VolumeHealthMetrics objects, then convertToTaskMetrics() copied them all into identical types.VolumeHealthMetrics. Now uses the task-system type directly, eliminating the duplicate allocation and removing convertToTaskMetrics. 2. All previous task states loaded at startup: LoadTasksFromPersistence read and deserialized every .pb file from disk, logging each one. With thousands of balance tasks persisted, this caused massive startup I/O, memory usage, and log noise (including unguarded DEBUG glog.Infof per task). Now starts with an empty queue — the scanner re-detects current needs from live cluster state. Terminal tasks are purged from memory and disk when new scan results arrive. 3. Verbose per-volume/per-node logging: V(2) and V(3) logs produced thousands of lines per scan. Per-volume logs bumped to V(4), per-node/rack/disk logs bumped to V(3). Topology summary now logs counts instead of full node ID arrays. Also removes lastTopologyInfo field from MaintenanceScanner — the raw protobuf topology is returned as a local value and not retained between 30-minute scans. * fix(admin): delete stale task files at startup, add DeleteAllTaskStates Old task .pb files from previous runs were left on disk. The periodic CleanupCompletedTasks still loads all files to find completed ones — the same expensive 4GB path from the pprof profile. Now at startup, DeleteAllTaskStates removes all .pb files by scanning the directory without reading or deserializing them. The scanner will re-detect any tasks still needed from live cluster state. * fix(admin): don't persist terminal tasks to disk CompleteTask was saving failed/completed tasks to disk where they'd accumulate. The periodic cleanup only triggered for completed tasks, not failed ones. Now terminal tasks are deleted from disk immediately and only kept in memory for the current session's UI. * fix(admin): cap in-memory tasks to 100 per job type Without a limit, the task map grows unbounded — balance could create thousands of pending tasks for a cluster with many imbalanced volumes. Now AddTask rejects new tasks when a job type already has 100 in the queue. The scanner will re-detect skipped volumes on the next scan. * fix(admin): address PR review - memory-only purge, active-only capacity - purgeTerminalTasks now only cleans in-memory map (terminal tasks are already deleted from disk by CompleteTask) - Per-type capacity limit counts only active tasks (pending/assigned/ in_progress), not terminal ones - When at capacity, purge terminal tasks first before rejecting * fix(admin): fix orphaned comment, add TaskStatusCancelled to terminal switch - Move hasQueuedOrActiveTaskForVolume comment to its function definition - Add TaskStatusCancelled to the terminal state switch in CompleteTask so cancelled task files are deleted from disk	2026-04-04 18:45:57 -07:00
Chris Lu	d37b592bc4	Update object_store_users_templ.go	2026-04-04 11:52:57 -07:00
Chris Lu	d1823d3784	fix(s3): include static identities in listing operations (#8903 ) * fix(s3): include static identities in listing operations Static identities loaded from -s3.config file were only stored in the S3 API server's in-memory state. Listing operations (s3.configure shell command, aws iam list-users) queried the credential manager which only returned dynamic identities from the backend store. Register static identities with the credential manager after loading so they are included in LoadConfiguration and ListUsers results, and filtered out before SaveConfiguration to avoid persisting them to the dynamic store. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896 * fix: avoid mutating caller's config and defensive copies - SaveConfiguration: use shallow struct copy instead of mutating the caller's config.Identities field - SetStaticIdentities: skip nil entries to avoid panics - GetStaticIdentities: defensively copy PolicyNames slice to avoid aliasing the original * fix: filter nil static identities and sync on config reload - SetStaticIdentities: filter nil entries from the stored slice (not just from staticNames) to prevent panics in LoadConfiguration/ListUsers - Extract updateCredentialManagerStaticIdentities helper and call it from both startup and the grace.OnReload handler so the credential manager's static snapshot stays current after config file reloads * fix: add mutex for static identity fields and fix ListUsers for store callers - Add sync.RWMutex to protect staticIdentities/staticNames against concurrent reads during config reload - Revert CredentialManager.ListUsers to return only store users, since internal callers (e.g. DeletePolicy) look up each user in the store and fail on non-existent static entries - Merge static usernames in the filer gRPC ListUsers handler instead, via the new GetStaticUsernames method - Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was failing because DeletePolicy iterated static users that don't exist in the store * fix: show static identities in admin UI and weed shell The admin UI and weed shell s3.configure command query the filer's credential manager via gRPC, which is a separate instance from the S3 server's credential manager. Static identities were only registered on the S3 server's credential manager, so they never appeared in the filer's responses. - Add CredentialManager.LoadS3ConfigFile to parse a static S3 config file and register its identities - Add FilerOptions.s3ConfigFile so the filer can load the same static config that the S3 server uses - Wire s3ConfigFile through in weed mini and weed server modes - Merge static usernames in filer gRPC ListUsers handler - Add CredentialManager.GetStaticUsernames helper - Add sync.RWMutex to protect concurrent access to static identity fields - Avoid importing weed/filer from weed/credential (which pulled in filer store init() registrations and broke test isolation) - Add docker/compose/s3_static_users_example.json * fix(admin): make static users read-only in admin UI Static users loaded from the -s3.config file should not be editable or deletable through the admin UI since they are managed via the config file. - Add IsStatic field to ObjectStoreUser, set from credential manager - Hide edit, delete, and access key buttons for static users in the users table template - Show a "static" badge next to static user names - Return 403 Forbidden from UpdateUser and DeleteUser API handlers when the target user is a static identity * fix(admin): show details for static users GetObjectStoreUserDetails called credentialManager.GetUser which only queries the dynamic store. For static users this returned ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup fails. * fix(admin): load static S3 identities in admin server The admin server has its own credential manager (gRPC store) which is a separate instance from the S3 server's and filer's. It had no static identity data, so IsStaticIdentity returned false (edit/delete buttons shown) and GetStaticIdentity returned nil (details page failed). Pass the -s3.config file path through to the admin server and call LoadS3ConfigFile on its credential manager, matching the approach used for the filer. * fix: use protobuf is_static field instead of passing config file path The previous approach passed -s3.config file path to every component (filer, admin). This is wrong because the admin server should not need to know about S3 config files. Instead, add an is_static field to the Identity protobuf message. The field is set when static identities are serialized (in GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads configuration via GetConfiguration automatically sees which identities are static, without needing the config file. - Add is_static field (tag 8) to iam_pb.Identity proto message - Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile - Admin GetObjectStoreUsers reads identity.IsStatic from proto - Admin IsStaticUser helper loads config via gRPC to check the flag - Filer GetUser gRPC handler falls back to GetStaticIdentity - Remove s3ConfigFile from AdminOptions and NewAdminServer signature	2026-04-03 20:01:28 -07:00
Chris Lu	995dfc4d5d	chore: remove ~50k lines of unreachable dead code (#8913 ) * chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking.	2026-04-03 16:04:27 -07:00
Chris Lu	2e98902f29	fix(s3): use URL-safe secret keys for dashboard users and service accounts (#8902 ) * fix(s3): use URL-safe secret keys for admin dashboard users and service accounts The dashboard's generateSecretKey() used base64.StdEncoding which produces +, /, and = characters that break S3 signature authentication. Reuse the IAM package's GenerateSecretAccessKey() which was already fixed in #7990. Fixes #8898 * fix: handle error from GenerateSecretAccessKey instead of ignoring it	2026-04-03 11:20:28 -07:00
Chris Lu	888c32cbde	fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8885 ) * fix(admin): respect urlPrefix in S3 bucket and S3Tables navigation links (#8884) Several admin UI templates used hardcoded URLs (templ.SafeURL) instead of dash.PUrl(ctx, ...) for navigation links, causing 404 errors when the admin is deployed with --urlPrefix. Fixed in: s3_buckets.templ, s3tables_buckets.templ, s3tables_tables.templ * fix(admin): URL-escape bucketName in S3Tables navigation links Add url.PathEscape(bucketName) for consistency and correctness in s3tables_tables.templ (back-to-namespaces link) and s3tables_buckets.templ (namespace link), matching the escaping already used in the table details link.	2026-04-02 11:54:19 -07:00
Chris Lu	44d5cb8f90	Fix Admin UI master list showing gRPC port instead of HTTP port (#8869 ) * Fix Admin UI master list showing gRPC port instead of HTTP port for followers (#8867) Raft stores server addresses as gRPC addresses. The Admin UI was using these addresses directly via ToHttpAddress(), which cannot extract the HTTP port from a plain gRPC address. Use GrpcAddressToServerAddress() to properly convert gRPC addresses back to HTTP addresses. * Use httpAddress consistently as masterMap key Address review feedback: masterInfo.Address (HTTP form) was already computed but the raw address was used as the map key, causing potential key mismatches between topology and raft data.	2026-04-01 09:43:50 -07:00
Lars Lehtonen	c1acf9e479	Prune unused functions from weed/admin/dash. (#8871 ) * chore(weed/admin/dash): prune unused functions * chore(weed/admin/dash): prune test-only function	2026-04-01 09:22:49 -07:00
Chris Lu	2eaf98a7a2	Use Unix sockets for gRPC in mini mode (#8856 ) * Use Unix sockets for gRPC between co-located services in mini mode In `weed mini`, all services run in one process. Previously, inter-service gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.) went through TCP loopback. This adds a gRPC Unix socket registry in the pb package: mini mode registers a socket path per gRPC port at startup, each gRPC server additionally listens on its socket, and GrpcDial transparently routes to the socket via WithContextDialer when a match is found. Standalone commands (weed master, weed filer, etc.) are unaffected since no sockets are registered. TCP listeners are kept for external clients. * Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket Log non-expected errors from grpcServer.Serve (ignoring grpc.ErrServerStopped) and always remove the Unix socket file when Serve returns, ensuring cleanup on Stop/GracefulStop.	2026-03-30 18:18:52 -07:00
Chris Lu	a95b8396e4	plugin scheduler: run iceberg and lifecycle lanes concurrently (#8821 ) * plugin scheduler: run iceberg and lifecycle lanes concurrently The default lane serialises job types under a single admin lock because volume-management operations share global state. Iceberg and lifecycle lanes have no such constraint, so run each of their job types independently in separate goroutines. * Fix concurrent lane scheduler status * plugin scheduler: address review feedback - Extract collectDueJobTypes helper to deduplicate policy loading between locked and concurrent iteration paths. - Use atomic.Bool instead of sync.Mutex for hadJobs in the concurrent path. - Set lane loop state to "busy" before launching concurrent goroutines so the lane is not reported as idle while work runs. - Convert TestLaneRequiresLock to table-driven style. - Add TestRunLaneSchedulerIterationLockBehavior to verify the scheduler acquires the admin lock only for lanes that require it. - Fix flaky TestGetLaneSchedulerStatusShowsActiveConcurrentLaneWork by not starting background scheduler goroutines that race with the direct runJobTypeIteration call.	2026-03-29 00:06:20 -07:00
Chris Lu	8c8d21d7e2	Update plugin_lane_templ.go	2026-03-26 23:11:10 -07:00
Chris Lu	2604ec7deb	Remove min_interval_seconds from plugin workers; vacuum default to 17m (#8790 ) remove min_interval_seconds from plugin workers and default vacuum interval to 17m The worker-level min_interval_seconds was redundant with the admin-side DetectionIntervalSeconds, complicating scheduling logic. Remove it from vacuum, volume_balance, erasure_coding, and ec_balance handlers. Also change the vacuum default DetectionIntervalSeconds from 2 hours to 17 minutes to match the previous default behavior.	2026-03-26 23:04:36 -07:00
Chris Lu	cc2f790c73	feat: add per-lane scheduler status API and lane worker UI pages - GET /api/plugin/lanes returns all lanes with status and job types - GET /api/plugin/workers?lane=X filters workers by lane - GET /api/plugin/scheduler-states?lane=X filters job types by lane - GET /api/plugin/scheduler-status?lane=X returns lane-scoped status - GET /plugin/lanes/{lane}/workers renders per-lane worker page - SchedulerJobTypeState now includes a "lane" field The lane worker pages show scheduler status, job type configuration, and connected workers scoped to a single lane, with links back to the main plugin overview.	2026-03-26 19:33:42 -07:00
Chris Lu	e3e015e108	feat: introduce scheduler lanes for independent per-workload scheduling Split the single plugin scheduler loop into independent per-lane goroutines so that volume management, iceberg compaction, and lifecycle operations never block each other. Each lane has its own: - Goroutine (laneSchedulerLoop) - Wake channel for immediate scheduling - Admin lock scope (e.g. "plugin scheduler:default") - Configurable idle sleep duration - Loop state tracking Three lanes are defined: - default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script - iceberg: iceberg_maintenance - lifecycle: s3_lifecycle (new, handler coming in a later commit) Job types are mapped to lanes via a hardcoded map with LaneDefault as the fallback. The SchedulerJobTypeState and SchedulerStatus types now include a Lane field for API consumers.	2026-03-26 19:32:18 -07:00
Chris Lu	d95df76bca	feat: separate scheduler lanes for iceberg, lifecycle, and volume management (#8787 ) * feat: introduce scheduler lanes for independent per-workload scheduling Split the single plugin scheduler loop into independent per-lane goroutines so that volume management, iceberg compaction, and lifecycle operations never block each other. Each lane has its own: - Goroutine (laneSchedulerLoop) - Wake channel for immediate scheduling - Admin lock scope (e.g. "plugin scheduler:default") - Configurable idle sleep duration - Loop state tracking Three lanes are defined: - default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script - iceberg: iceberg_maintenance - lifecycle: s3_lifecycle (new, handler coming in a later commit) Job types are mapped to lanes via a hardcoded map with LaneDefault as the fallback. The SchedulerJobTypeState and SchedulerStatus types now include a Lane field for API consumers. * feat: per-lane execution reservation pools for resource isolation Each scheduler lane now maintains its own execution reservation map so that a busy volume lane cannot consume execution slots needed by iceberg or lifecycle lanes. The per-lane pool is used by default when dispatching jobs through the lane scheduler; the global pool remains as a fallback for the public DispatchProposals API. * feat: add per-lane scheduler status API and lane worker UI pages - GET /api/plugin/lanes returns all lanes with status and job types - GET /api/plugin/workers?lane=X filters workers by lane - GET /api/plugin/scheduler-states?lane=X filters job types by lane - GET /api/plugin/scheduler-status?lane=X returns lane-scoped status - GET /plugin/lanes/{lane}/workers renders per-lane worker page - SchedulerJobTypeState now includes a "lane" field The lane worker pages show scheduler status, job type configuration, and connected workers scoped to a single lane, with links back to the main plugin overview. * feat: add s3_lifecycle worker handler for object store lifecycle management Implements a full plugin worker handler for S3 lifecycle management, assigned to the new "lifecycle" scheduler lane. Detection phase: - Reads filer.conf to find buckets with TTL lifecycle rules - Creates one job proposal per bucket with active lifecycle rules - Supports bucket_filter wildcard pattern from admin config Execution phase: - Walks the bucket directory tree breadth-first - Identifies expired objects by checking TtlSec + Crtime < now - Deletes expired objects in configurable batches - Reports progress with scanned/expired/error counts - Supports dry_run mode for safe testing Configurable via admin UI: - batch_size: entries per filer listing page (default 1000) - max_deletes_per_bucket: safety cap per run (default 10000) - dry_run: detect without deleting - delete_marker_cleanup: clean expired delete markers - abort_mpu_days: abort stale multipart uploads The handler integrates with the existing PutBucketLifecycle flow which sets TtlSec on entries via filer.conf path rules. * feat: add per-lane submenu items under Workers sidebar menu Replace the single "Workers" sidebar link with a collapsible submenu containing three lane entries: - Default (volume management + admin scripts) -> /plugin - Iceberg (table compaction) -> /plugin/lanes/iceberg/workers - Lifecycle (S3 object expiration) -> /plugin/lanes/lifecycle/workers The submenu auto-expands when on any /plugin page and highlights the active lane. Icons match each lane's job type descriptor (server, snowflake, hourglass). * feat: scope plugin pages to their scheduler lane The plugin overview, configuration, detection, queue, and execution pages now filter workers, job types, scheduler states, and scheduler status to only show data for their lane. - Plugin() templ function accepts a lane parameter (default: "default") - JavaScript appends ?lane= to /api/plugin/workers, /job-types, /scheduler-states, and /scheduler-status API calls - GET /api/plugin/job-types now supports ?lane= filtering - When ?job= is provided (e.g. ?job=iceberg_maintenance), the lane is auto-derived from the job type so the page scopes correctly This ensures /plugin shows only default-lane workers and /plugin/configuration?job=iceberg_maintenance scopes to the iceberg lane. * fix: remove "Lane" from lane worker page titles and capitalize properly "lifecycle Lane Workers" -> "Lifecycle Workers" "iceberg Lane Workers" -> "Iceberg Workers" * refactor: promote lane items to top-level sidebar menu entries Move Default, Iceberg, and Lifecycle from a collapsible submenu to direct top-level items under the WORKERS heading. Removes the intermediate "Workers" parent link and collapse toggle. * admin: unify plugin lane routes and handlers * admin: filter plugin jobs and activities by lane * admin: reuse plugin UI for worker lane pages * fix: use ServerAddress.ToGrpcAddress() for filer connections in lifecycle handler ClusterContext addresses use ServerAddress format (host:port.grpcPort). Convert to the actual gRPC address via ToGrpcAddress() before dialing, and add a Ping verification after connecting. Fixes: "dial tcp: lookup tcp/8888.18888: unknown port" * fix: resolve ServerAddress gRPC port in iceberg and lifecycle filer connections ClusterContext addresses use ServerAddress format (host:httpPort.grpcPort). Both the iceberg and lifecycle handlers now detect the compound format and extract the gRPC port via ToGrpcAddress() before dialing. Plain host:port addresses (e.g. from tests) are passed through unchanged. Fixes: "dial tcp: lookup tcp/8888.18888: unknown port" * align url * Potential fix for code scanning alert no. 335: Incorrect conversion between integer types Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix: address PR review findings across scheduler lanes and lifecycle handler - Fix variable shadowing: rename loop var `w` to `worker` in GetPluginWorkersAPI to avoid shadowing the http.ResponseWriter param - Fix stale GetSchedulerStatus: aggregate loop states across all lanes instead of reading never-updated legacy schedulerLoopState - Scope InProcessJobs to lane in GetLaneSchedulerStatus - Fix AbortMPUDays=0 treated as unset: change <= 0 to < 0 so 0 disables - Propagate listing errors in lifecycle bucket walk instead of swallowing - Implement DeleteMarkerCleanup: scan for S3 delete marker entries and remove them - Implement AbortMPUDays: scan .uploads directory and remove stale multipart uploads older than the configured threshold - Fix success determination: mark job failed when result.errors > 0 even if no fatal error occurred - Add regression test for jobTypeLaneMap to catch drift from handler registrations * fix: guard against nil result in lifecycle completion and trim filer addresses - Guard result dereference in completion summary: use local vars defaulting to 0 when result is nil to prevent panic - Append trimmed filer addresses instead of originals so whitespace is not passed to the gRPC dialer * fix: propagate ctx cancellation from deleteExpiredObjects and add config logging - deleteExpiredObjects now returns a third error value when the context is canceled mid-batch; the caller stops processing further batches and returns the cancellation error to the job completion handler - readBoolConfig and readInt64Config now log unexpected ConfigValue types at V(1) for debugging, consistent with readStringConfig * fix: propagate errors in lifecycle cleanup helpers and use correct delete marker key - cleanupDeleteMarkers: return error on ctx cancellation and SeaweedList failures instead of silently continuing - abortIncompleteMPUs: log SeaweedList errors instead of discarding - isDeleteMarker: use ExtDeleteMarkerKey ("Seaweed-X-Amz-Delete-Marker") instead of ExtLatestVersionIsDeleteMarker which is for the parent entry - batchSize cap: use math.MaxInt instead of math.MaxInt32 * fix: propagate ctx cancellation from abortIncompleteMPUs and log unrecognized bool strings - abortIncompleteMPUs now returns (aborted, errors, ctxErr) matching cleanupDeleteMarkers; caller stops on cancellation or listing failure - readBoolConfig logs unrecognized string values before falling back * fix: shared per-bucket budget across lifecycle phases and allow cleanup without expired objects - Thread a shared remaining counter through TTL deletion, delete marker cleanup, and MPU abort so the total operations per bucket never exceed MaxDeletesPerBucket - Remove early return when no TTL-expired objects found so delete marker cleanup and MPU abort still run - Add NOTE on cleanupDeleteMarkers about version-safety limitation --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2026-03-26 19:28:13 -07:00
Chris Lu	67a551fd62	admin UI: add anonymous user creation checkbox (#8773 ) Add an "Anonymous" checkbox next to the username field in the Create User modal. When checked, the username is set to "anonymous" and the credential generation checkbox is disabled since anonymous users do not need keys. The checkbox is only shown when no anonymous user exists yet. The manage-access-keys button in the users table is hidden for the anonymous user.	2026-03-25 21:24:10 -07:00
Anton	7f0cf72574	admin/plugin: delete job_detail files when jobs are pruned from memory (#8722 ) * admin/plugin: delete job_detail files when jobs are pruned from memory pruneTrackedJobsLocked evicts the oldest terminal jobs from the in-memory tracker when the total exceeds maxTrackedJobsTotal (1000). However the dedicated per-job detail files in jobs/job_details/ were never removed, causing them to accumulate indefinitely on disk. Add ConfigStore.DeleteJobDetail and call it from pruneTrackedJobsLocked so that the file is cleaned up together with the in-memory entry. Deletion errors are logged at verbosity level 2 and do not abort the prune. * admin/plugin: add test for DeleteJobDetail --------- Co-authored-by: Anton Ustyugov <anton@devops> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-21 14:23:32 -07:00
Anton	90277ceed5	admin/plugin: migrate inline job details asynchronously to avoid slow startup (#8721 ) loadPersistedMonitorState performed a backward-compatibility migration that wrote every job with inline rich detail fields to a dedicated per-job detail file synchronously during startup. On deployments with many historical jobs (e.g. 1000+) stored on distributed block storage (e.g. Longhorn), each individual file write requires an fsync round-trip, making startup disproportionately slow and causing readiness/liveness probe failures. The in-memory state is populated correctly before the goroutine is started because stripTrackedJobDetailFields is still called in-place; only the disk writes are deferred. A completion log message at V(1) is emitted once the background migration finishes. Co-authored-by: Anton Ustyugov <anton@devops>	2026-03-21 14:18:42 -07:00
Anton	ae170f1fbb	admin: fix manual job run to use scheduler dispatch with capacity management and retry (#8720 ) RunPluginJobTypeAPI previously executed proposals with a naive sequential loop calling ExecutePluginJob per proposal. This had two bugs: 1. Double-lock: RunPluginJobTypeAPI held pluginLock while calling ExecutePluginJob, which tried to re-acquire the same lock for every job in the loop. 2. No capacity management: proposals were fired directly at workers without reserveScheduledExecutor, so every job beyond the worker concurrency limit received an immediate at_capacity error with no retry or backoff. Fix: add Plugin.DispatchProposals which reuses dispatchScheduledProposals - the same code path the scheduler loop uses - with executor reservation, configurable concurrency, and per-job retry with backoff. RunPluginJobTypeAPI now calls DispatchPluginProposals (a thin AdminServer wrapper) after holding pluginLock once. Co-authored-by: Anton Ustyugov <anton@devops>	2026-03-21 14:09:31 -07:00
Chris Lu	6ccda3e809	fix(s3): allow deleting the anonymous user from admin webui (#8706 ) Remove the block that prevented deleting the "anonymous" identity and stop auto-creating it when absent. If no anonymous identity exists (or it is disabled), LookupAnonymous returns not-found and both auth paths return ErrAccessDenied for anonymous requests. To enable anonymous access, explicitly create the "anonymous" user. To revoke it, delete the user like any other identity. Closes #8694	2026-03-19 18:10:20 -07:00
Jayshan Raghunandan	1f1eac4f08	feat: improve aio support for admin/volume ingress and fix UI links (#8679 ) * feat: improve allInOne mode support for admin/volume ingress and fix master UI links - Add allInOne support to admin ingress template, matching the pattern used by filer and s3 ingress templates (or-based enablement with ternary service name selection) - Add allInOne support to volume ingress template, which previously required volume.enabled even when the volume server runs within the allInOne pod - Expose admin ports in allInOne deployment and service when allInOne.admin.enabled is set - Add allInOne.admin config section to values.yaml (enabled by default, ports inherit from admin.) - Fix legacy master UI templates (master.html, masterNewRaft.html) to prefer PublicUrl over internal Url when linking to volume server UI. The new admin UI already handles this correctly. fix: revert admin allInOne changes and fix PublicUrl in admin dashboard The admin binary (`weed admin`) is a separate process that cannot run inside `weed server` (allInOne mode). Revert the admin-related allInOne helm chart changes that caused 503 errors on admin ingress. Fix bug in cluster_topology.go where VolumeServer.PublicURL was set to node.Id (internal pod address) instead of the actual public URL. Add public_url field to DataNodeInfo proto message so the topology gRPC response carries the public URL set via -volume.publicUrl flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use HTTP /dir/status to populate PublicUrl in admin dashboard The gRPC DataNodeInfo proto does not include PublicUrl, so the admin dashboard showed internal pod IPs instead of the configured public URL. Fetch PublicUrl from the master's /dir/status HTTP endpoint and apply it in both GetClusterTopology and GetClusterVolumeServers code paths. Also reverts the unnecessary proto field additions from the previous commit and cleans up a stray blank line in all-in-one-service.yml. * fix: apply PublicUrl link fix to masterNewRaft.html Match the same conditional logic already applied to master.html — prefer PublicUrl when set and different from Url. * fix: add HTTP timeout and status check to fetchPublicUrlMap Use a 5s-timeout client instead of http.DefaultClient to prevent blocking indefinitely when the master is unresponsive. Also check the HTTP status code before attempting to parse the response body. * fix: fall back to node address when PublicUrl is empty Prevents blank links in the admin dashboard when PublicUrl is not configured, such as in standalone or mixed-version clusters. * fix: log io.ReadAll error in fetchPublicUrlMap --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-18 13:20:55 -07:00
Chris Lu	a3717cd4b5	fix(admin): show anonymous user in Object Store Users UI (#8671 ) The anonymous identity was explicitly filtered out of the user listing, making it invisible in the admin console. Users could not view or edit its permissions. Attempting to recreate it failed with "already exists". Remove the anonymous skip in GetObjectStoreUsers so it appears like any other identity. Add a guard in DeleteObjectStoreUser to prevent deletion of the anonymous system identity, which would break unauthenticated S3 access. Fixes #8466 Co-authored-by: Copilot <copilot@github.com>	2026-03-16 19:54:57 -07:00
Chris Lu	7c83460b10	adjust template path	2026-03-16 15:29:12 -07:00
Chris Lu	e8914ac879	feat(admin): add -urlPrefix flag for subdirectory deployment (#8670 ) Allow the admin server to run behind a reverse proxy under a subdirectory by adding a -urlPrefix flag (e.g. -urlPrefix=/seaweedfs). Closes #8646	2026-03-16 15:26:02 -07:00
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:27:42 -07:00
Chris Lu	a838661b83	feat(plugin): EC shard balance handler for plugin worker (#8629 ) * feat(ec_balance): add TaskTypeECBalance constant and protobuf definitions Add the ec_balance task type constant to both topology and worker type systems. Define EcBalanceTaskParams, EcShardMoveSpec, and EcBalanceTaskConfig protobuf messages for EC shard balance operations. * feat(ec_balance): add configuration for EC shard balance task Config includes imbalance threshold, min server count, collection filter, disk type, and preferred tags for tag-aware placement. * feat(ec_balance): add multi-phase EC shard balance detection algorithm Implements four detection phases adapted from the ec.balance shell command: 1. Duplicate shard detection and removal proposals 2. Cross-rack shard distribution balancing 3. Within-rack node-level shard balancing 4. Global shard count equalization across nodes Detection is side-effect-free: it builds an EC topology view from ActiveTopology and generates move proposals without executing them. * feat(ec_balance): add EC shard move task execution Implements the shard move sequence using the same VolumeEcShardsCopy, VolumeEcShardsMount, VolumeEcShardsUnmount, and VolumeEcShardsDelete RPCs as the shell ec.balance command. Supports both regular shard moves and dedup-phase deletions (unmount+delete without copy). * feat(ec_balance): add task registration and scheduling Register EC balance task definition with auto-config update support. Scheduling respects max concurrent limits and worker capabilities. * feat(ec_balance): add plugin handler for EC shard balance Implements the full plugin handler with detection, execution, admin and worker config forms, proposal building, and decision trace reporting. Supports collection/DC/disk type filtering, preferred tag placement, and configurable detection intervals. Auto-registered via init() with the handler registry. * test(ec_balance): add tests for detection algorithm and plugin handler Detection tests cover: duplicate shard detection, cross-rack imbalance, within-rack imbalance, global rebalancing, topology building, collection filtering, and edge cases. Handler tests cover: config derivation with clamping, proposal building, protobuf encode/decode round-trip, fallback parameter decoding, capability, and config policy round-trip. * fix(ec_balance): address PR review feedback and fix CI test failure - Update TestWorkerDefaultJobTypes to expect 6 handlers (was 5) - Extract threshold constants (ecBalanceMinImbalanceThreshold, etc.) to eliminate magic numbers in Descriptor and config derivation - Remove duplicate ShardIdsToUint32 helper (use erasure_coding package) - Add bounds checks for int64→int/uint32 conversions to fix CodeQL integer conversion warnings * fix(ec_balance): address code review findings storage_impact.go: - Add TaskTypeECBalance case returning shard-level reservation (ShardSlots: -1/+1) instead of falling through to default which incorrectly reserves a full volume slot on target. detection.go: - Use dc:rack composite key to avoid cross-DC rack name collisions. Only create rack entries after confirming node has matching disks. - Add exceedsImbalanceThreshold check to cross-rack, within-rack, and global phases so trivial skews below the configured threshold are ignored. Dedup phase always runs since duplicates are errors. - Reserve destination capacity after each planned move (decrement destNode.freeSlots, update rackShardCount/nodeShardCount) to prevent overbooking the same destination. - Skip nodes with freeSlots <= 0 when selecting minNode in global balance to avoid proposing moves to full nodes. - Include loop index and source/target node IDs in TaskID to guarantee uniqueness across moves with the same volumeID/shardID. ec_balance_handler.go: - Fail fast with error when shard_id is absent in fallback parameter decoding instead of silently defaulting to shard 0. ec_balance_task.go: - Delegate GetProgress() to BaseTask.GetProgress() so progress updates from ReportProgressWithStage are visible to callers. - Add fail-fast guard rejecting multiple sources/targets until batch execution is implemented. Findings verified but not changed (matches existing codebase pattern in vacuum/balance/erasure_coding handlers): - register.go globalTaskDef.Config race: same unsynchronized pattern in all 4 task packages. - CreateTask using generated ID: same fmt.Sprintf pattern in all 4 task packages. * fix(ec_balance): harden parameter decoding, progress tracking, and validation ec_balance_handler.go (decodeECBalanceTaskParams): - Validate execution-critical fields (Sources[0].Node, ShardIds, Targets[0].Node, ShardIds) after protobuf deserialization. - Require source_disk_id and target_disk_id in legacy fallback path so Targets[0].DiskId is populated for VolumeEcShardsCopyRequest. - All error messages reference decodeECBalanceTaskParams and the specific missing field (TaskParams, shard_id, Targets[0].DiskId, EcBalanceTaskParams) for debuggability. ec_balance_task.go: - Track progress in ECBalanceTask.progress field, updated via reportProgress() helper called before ReportProgressWithStage(), so GetProgress() returns real stage progress instead of stale 0. - Validate: require exactly 1 source and 1 target (mirrors Execute guard), require ShardIds on both, with error messages referencing ECBalanceTask.Validate and the specific field. * fix(ec_balance): fix dedup execution path, stale topology, collection filter, timeout, and dedupeKey detection.go: - Dedup moves now set target=source so isDedupPhase() triggers the unmount+delete-only execution path instead of attempting a copy. - Apply moves to in-memory topology between phases via applyMovesToTopology() so subsequent phases see updated shard placement and don't conflict with already-planned moves. - detectGlobalImbalance now accepts allowedVids and filters both shard counting and shard selection to respect CollectionFilter. ec_balance_task.go: - Apply EcBalanceTaskParams.TimeoutSeconds to the context via context.WithTimeout so all RPC operations respect the configured timeout instead of hanging indefinitely. ec_balance_handler.go: - Include source node ID in dedupeKey so dedup deletions from different source nodes for the same shard aren't collapsed. - Clamp minServerCountRaw and minIntervalRaw lower bounds on int64 before narrowing to int, preventing undefined overflow on 32-bit. * fix(ec_balance): log warning before cancelling on progress send failure Log the error, job ID, job type, progress percentage, and stage before calling execCancel() in the progress callback so failed progress sends are diagnosable instead of silently cancelling.	2026-03-14 21:34:53 -07:00
Chris Lu	6fc0489dd8	feat(plugin): make page tabs and sub-tabs addressable by URLs (#8626 ) * feat(plugin): make page tabs and sub-tabs addressable by URLs Update the plugin page so that clicking tabs and sub-tabs pushes browser history via history.pushState(), enabling bookmarkable URLs, browser back/forward navigation, and shareable links. URL mapping: - /plugin → Overview tab - /plugin/configuration → Configuration sub-tab - /plugin/detection → Job Detection sub-tab - /plugin/queue → Job Queue sub-tab - /plugin/execution → Job Execution sub-tab Job-type-specific URLs use the ?job= query parameter (e.g., /plugin/configuration?job=vacuum) so that a specific job type tab is pre-selected on page load. Changes: - Add initialJob parameter to Plugin() template and handler - Extract ?job= query param in renderPluginPage handler - Add buildPluginURL/updateURL helpers in JavaScript - Push history state on top-tab, sub-tab, and job-type clicks - Listen for popstate to restore tab state on back/forward - Replace initial history entry on page load via replaceState * make popstate handler async with proper error handling Await loadDescriptorAndConfig so data loading completes before rendering dependent views. Log errors instead of silently swallowing them.	2026-03-13 23:02:43 -07:00
Chris Lu	baae672b6f	feat: auto-disable master vacuum when plugin worker is active (#8624 ) * feat: auto-disable master vacuum when plugin vacuum worker is active When a vacuum-capable plugin worker connects to the admin server, the admin server calls DisableVacuum on the master to prevent the automatic scheduled vacuum from conflicting with the plugin worker's vacuum. When the worker disconnects, EnableVacuum is called to restore the default behavior. A safety net in the topology refresh loop re-enables vacuum if the admin server disconnects without cleanup. * rename isAdminServerConnected to isAdminServerConnectedFunc * add 5s timeout to DisableVacuum/EnableVacuum gRPC calls Prevents the monitor goroutine from blocking indefinitely if the master is unresponsive. * track plugin ownership of vacuum disable to avoid overriding operator - Add vacuumDisabledByPlugin flag to Topology, set when DisableVacuum is called while admin server is connected (i.e., by plugin monitor) - Safety net only re-enables vacuum when it was disabled by plugin, not when an operator intentionally disabled it via shell command - EnableVacuum clears the plugin flag * extract syncVacuumState for testability, add fake toggler tests Extract the single sync step into syncVacuumState() with a vacuumToggler interface. Add TestSyncVacuumState with a fake toggler that verifies disable/enable calls on state transitions. * use atomic.Bool for isDisableVacuum and vacuumDisabledByPlugin Both fields are written by gRPC handlers and read by the vacuum goroutine, causing a data race. Use atomic.Bool with Store/Load for thread-safe access. * use explicit by_plugin field instead of connection heuristic Add by_plugin bool to DisableVacuumRequest proto so the caller declares intent explicitly. The admin server monitor sets it to true; shell commands leave it false. This prevents an operator's intentional disable from being auto-reversed by the safety net. * use setter for admin server callback instead of function parameter Move isAdminServerConnected from StartRefreshWritableVolumes parameter to Topology.SetAdminServerConnectedFunc() setter. Keeps the function signature stable and decouples the topology layer from the admin server concept. * suppress repeated log messages on persistent sync failures Add retrying parameter to syncVacuumState so the initial state transition is logged at V(0) but subsequent retries of the same transition are silent until the call succeeds. * clear plugin ownership flag on manual DisableVacuum Prevents stale plugin flag from causing incorrect auto-enable when an operator manually disables vacuum after a plugin had previously disabled it. * add by_plugin to EnableVacuumRequest for symmetric ownership tracking Plugin-driven EnableVacuum now only re-enables if the plugin was the one that disabled it. If an operator manually disabled vacuum after the plugin, the plugin's EnableVacuum is a no-op. This prevents the plugin monitor from overriding operator intent on worker disconnect. * use cancellable context for monitorVacuumWorker goroutine Replace context.Background() with a cancellable context stored as bgCancel on AdminServer. Shutdown() calls bgCancel() so monitorVacuumWorker exits cleanly via ctx.Done(). * track operator and plugin vacuum disables independently Replace single isDisableVacuum flag with two independent flags: vacuumDisabledByOperator and vacuumDisabledByPlugin. Each caller only flips its own flag. The effective disabled state is the OR of both. This prevents a plugin connect/disconnect cycle from overriding an operator's manual disable, and vice versa. * fix safety net to clear plugin flag, not operator flag The safety net should call EnableVacuumByPlugin() to clear only the plugin disable flag when the admin server disconnects. The previous call to EnableVacuum() incorrectly cleared the operator flag instead.	2026-03-13 22:49:12 -07:00
Chris Lu	a6774f0e01	add git commit hash on admin ui	2026-03-12 23:51:25 -07:00
Chris Lu	e4a77b8b16	feat(admin): support env var and security.toml for credentials (#8606 ) * feat(security): add [admin] section to security.toml scaffold Add admin credential fields (user, password, readonly.user, readonly.password) to security.toml. Via viper's WEED_ env prefix and AutomaticEnv(), these are automatically overridable as WEED_ADMIN_USER, WEED_ADMIN_PASSWORD, etc. Ref: https://github.com/seaweedfs/seaweedfs/discussions/8586 * feat(admin): support env var and security.toml fallbacks for credentials Add applyViperFallback() to read admin credentials from security.toml / WEED_* environment variables when CLI flags are not explicitly set. This allows systems like NixOS to pass secrets via env vars instead of CLI flags, which appear in process listings. Precedence: CLI flag > env var / security.toml > default value. Also change -adminUser default from "admin" to "" so that credentials are fully opt-in. Ref: https://github.com/seaweedfs/seaweedfs/discussions/8586 * feat(helm): use WEED_ env vars for admin credentials instead of CLI flags Rename SEAWEEDFS_ADMIN_USER/PASSWORD to WEED_ADMIN_USER/PASSWORD so viper picks them up natively. Remove -adminUser/-adminPassword shell expansion from command args since the Go binary now reads these directly via viper. * docs(admin): document env var and security.toml credential support Add environment variable mapping table, security.toml example, and precedence rules to the admin README. * style(security): use nested [admin.readonly] table in security.toml Use a nested TOML table instead of dotted keys for the readonly credentials. More idiomatic and easier to read; no change in how Viper parses it. * fix(admin): use util.GetViper() for env var support and fix README example applyViperFallback() was using viper.GetString() directly, which bypasses the WEED_ env prefix and AutomaticEnv setup that only happens in util.GetViper(). Switch to util.GetViper().GetString() so WEED_ADMIN_* environment variables are actually picked up. Also fix the README example to include WEED_ADMIN_USER alongside WEED_ADMIN_PASSWORD, since runAdmin() rejects an empty username when a password is set. * fix(admin): restore default adminUser to "admin" Defaulting adminUser to "" broke the common flow of setting only WEED_ADMIN_PASSWORD — runAdmin() rejects an empty username when a password is set. Restore "admin" as the default so that setting only the password works out of the box. * docs(admin): align README security.toml example with scaffold format Use nested [admin.readonly] table instead of flat dotted keys to match the format in weed/command/scaffold/security.toml. * docs(admin): remove README.md in favor of wiki page Admin documentation lives at the wiki (Admin-UI.md). Remove the in-repo README to avoid maintaining duplicate docs. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-11 17:40:24 -07:00
Chris Lu	ac579c1746	Fix plugin configuration tab layout overflow (#8596 ) Fix plugin configuration tab layout overflow (#8587) Remove h-100 from Job Scheduling Settings card, which caused it to stretch to 100% of the row height and push the Next Run card below the row boundary, overflowing into the Detection Results section.	2026-03-10 19:19:42 -07:00
Chris Lu	47cad59c70	Remove misleading Workers sub-menu items from admin sidebar (#8594 ) * Remove misleading Workers sub-menu items from admin sidebar The sidebar sub-items (Job Detection, Job Queue, Job Execution, Configuration) always navigated to the first job type's tabs (typically EC Encoding) rather than showing cross-job-type views. This was confusing as noted in #8590. Since the in-page tabs already provide this navigation, remove the redundant sidebar sub-items and keep only the top-level Workers link. Fixes #8590 * Update layout_templ.go	2026-03-10 15:55:14 -07:00
Chris Lu	b17e2b411a	Add dynamic timeouts to plugin worker vacuum gRPC calls (#8593 ) * add dynamic timeouts to plugin worker vacuum gRPC calls All vacuum gRPC calls used context.Background() with no deadline, so the plugin scheduler's execution timeout could kill a job while a large volume compact was still in progress. Use volume-size-scaled timeouts matching the topology vacuum approach: 3 min/GB for compact, 1 min/GB for check, commit, and cleanup. Fixes #8591 * scale scheduler execution timeout by volume size The scheduler's per-job execution timeout (default 240s) would kill vacuum jobs on large volumes before they finish. Three changes: 1. Vacuum detection now includes estimated_runtime_seconds in job proposals, computed as 5 min/GB of volume size. 2. The scheduler checks for estimated_runtime_seconds in job parameters and uses it as the execution timeout when larger than the default — a generic mechanism any handler can use. 3. Vacuum task gRPC calls now use the passed-in ctx as parent instead of context.Background(), so scheduler cancellation propagates to in-flight RPCs. * extend job type runtime when proposals need more time The JobTypeMaxRuntime (default 30 min) wraps both detection and execution. Its context is the parent of all per-job execution contexts, so even with per-job estimated_runtime_seconds, jobCtx would cancel everything when it expires. After detection, scan proposals for the maximum estimated_runtime_seconds. If any proposal needs more time than the remaining JobTypeMaxRuntime, create a new execution context with enough headroom. This lets large vacuum jobs complete without being killed by the job type deadline while still respecting the configured limit for normal-sized jobs. * log missing volume size metric, remove dead minimum runtime guard Add a debug log in vacuumTimeout when t.volumeSize is 0 so operators can investigate why metrics are missing for a volume. Remove the unreachable estimatedRuntimeSeconds < 180 check in buildVacuumProposal — volumeSizeGB always >= 1 (due to +1 floor), so estimatedRuntimeSeconds is always >= 300. * cap estimated runtime and fix status check context - Cap maxEstimatedRuntime and per-job timeout overrides to 8 hours to prevent unbounded timeouts from bad metrics. - Check execCtx.Err() instead of jobCtx.Err() for status reporting, since dispatch runs under execCtx which may have a longer deadline. A successful dispatch under execCtx was misreported as "timeout" when jobCtx had expired.	2026-03-10 13:48:42 -07:00
Chris Lu	00000ec006	Update s3_buckets_templ.go	2026-03-09 22:41:07 -07:00
Chris Lu	1bd7a98a4a	simplify plugin scheduler: remove configurable IdleSleepSeconds, use constant 61s The SchedulerConfig struct and its persistence/API were unnecessary indirection. Replace with a simple constant (reduced from 613s to 61s) so the scheduler re-checks for detectable job types promptly after going idle, improving the clean-install experience.	2026-03-09 22:41:03 -07:00
Chris Lu	5f85bf5e8a	Batch volume balance: run multiple moves per job (#8561 ) * proto: add BalanceMoveSpec and batch fields to BalanceTaskParams Add BalanceMoveSpec message for encoding individual volume moves, and max_concurrent_moves + repeated moves fields to BalanceTaskParams to support batching multiple volume moves in a single job. * balance handler: add batch execution with concurrent volume moves Refactor Execute() into executeSingleMove() (backward compatible) and executeBatchMoves() which runs multiple volume moves concurrently using a semaphore-bounded goroutine pool. When BalanceTaskParams.Moves is populated, the batch path is taken; otherwise the single-move path. Includes aggregate progress reporting across concurrent moves, per-move error collection, and partial failure support. * balance handler: add batch config fields to Descriptor and worker config Add max_concurrent_moves and batch_size fields to the worker config form and deriveBalanceWorkerConfig(). These control how many volume moves run concurrently within a batch job and the maximum batch size. * balance handler: group detection proposals into batch jobs When batch_size > 1, the Detect method groups detection results into batch proposals where each proposal encodes multiple BalanceMoveSpec entries in BalanceTaskParams.Moves. Single-result batches fall back to the existing single-move proposal format for backward compatibility. * admin UI: add volume balance execution plan and batch badge Add renderBalanceExecutionPlan() for rich rendering of volume balance jobs in the job detail modal. Single-move jobs show source/target/volume info; batch jobs show a moves table with all volume moves. Add batch badge (e.g., "5 moves") next to job type in the execution jobs table when the job has batch=true label. * Update plugin_templ.go * fix: detection algorithm uses greedy target instead of divergent topology scores The detection loop tracked effective volume counts via an adjustments map, but createBalanceTask independently called planBalanceDestination which used the topology's LoadCount — a separate, unadjusted source of truth. This divergence caused multiple moves to pile onto the same server. Changes: - Add resolveBalanceDestination to resolve the detection loop's greedy target (minServer) rather than independently picking a destination - Add oscillation guard: stop when max-min <= 1 since no single move can improve the balance beyond that point - Track unseeded destinations: if a target server wasn't in the initial serverVolumeCounts, add it so subsequent iterations include it - Add TestDetection_UnseededDestinationDoesNotOverload * fix: handler force_move propagation, partial failure, deterministic dedupe - Propagate ForceMove from outer BalanceTaskParams to individual move TaskParams so batch moves respect the force_move flag - Fix partial failure: mark job successful if at least one move succeeded (succeeded > 0 \|\| failed == 0) to avoid re-running already-completed moves on retry - Use SHA-256 hash for deterministic dedupe key fallback instead of time.Now().UnixNano() which is non-deterministic - Remove unused successDetails variable - Extract maxProposalStringLength constant to replace magic number 200 * admin UI: use template literals in balance execution plan rendering * fix: integration test handles batch proposals from batched detection With batch_size=20, all moves are grouped into a single proposal containing BalanceParams.Moves instead of top-level Sources/Targets. Update assertions to handle both batch and single-move proposal formats. * fix: verify volume size on target before deleting source during balance Add a pre-delete safety check that reads the volume file status on both source and target, then compares .dat file size and file count. If they don't match, the move is aborted — leaving the source intact rather than risking irreversible data loss. Also removes the redundant mountVolume call since VolumeCopy already mounts the volume on the target server. * fix: clamp maxConcurrent, serialize progress sends, validate config as int64 - Clamp maxConcurrentMoves to defaultMaxConcurrentMoves before creating the semaphore so a stale or malicious job cannot request unbounded concurrent volume moves - Extend progressMu to cover sender.SendProgress calls since the underlying gRPC stream is not safe for concurrent writes - Perform bounds checks on max_concurrent_moves and batch_size in int64 space before casting to int, avoiding potential overflow on 32-bit * fix: check disk capacity in resolveBalanceDestination Skip disks where VolumeCount >= MaxVolumeCount so the detection loop does not propose moves to a full disk that would fail at execution time. * test: rename unseeded destination test to match actual behavior The test exercises a server with 0 volumes that IS seeded from topology (matching disk type), not an unseeded destination. Rename to TestDetection_ZeroVolumeServerIncludedInBalance and fix comments. * test: tighten integration test to assert exactly one batch proposal With default batch_size=20, all moves should be grouped into a single batch proposal. Assert len(proposals)==1 and require BalanceParams with Moves, removing the legacy single-move else branch. * fix: propagate ctx to RPCs and restore source writability on abort - All helper methods (markVolumeReadonly, copyVolume, tailVolume, readVolumeFileStatus, deleteVolume) now accept a context parameter instead of using context.Background(), so Execute's ctx propagates cancellation and timeouts into every volume server RPC - Add deferred cleanup that restores the source volume to writable if any step after markVolumeReadonly fails, preventing the source from being left permanently readonly on abort - Add markVolumeWritable helper using VolumeMarkWritableRequest * fix: deep-copy protobuf messages in test recording sender Use proto.Clone in recordingExecutionSender to store immutable snapshots of JobProgressUpdate and JobCompleted, preventing assertions from observing mutations if the handler reuses message pointers. * fix: add VolumeMarkWritable and ReadVolumeFileStatus to fake volume server The balance task now calls ReadVolumeFileStatus for pre-delete verification and VolumeMarkWritable to restore writability on abort. Add both RPCs to the test fake, and drop the mountCalls assertion since BalanceTask no longer calls VolumeMount directly (VolumeCopy handles it). * fix: use maxConcurrentMovesLimit (50) for clamp, not defaultMaxConcurrentMoves defaultMaxConcurrentMoves (5) is the fallback when the field is unset, not an upper bound. Clamping to it silently overrides valid config values like 10/20/50. Introduce maxConcurrentMovesLimit (50) matching the descriptor's MaxValue and clamp to that instead. * fix: cancel batch moves on progress stream failure Derive a cancellable batchCtx from the caller's ctx. If sender.SendProgress returns an error (client disconnect, context cancelled), capture it, skip further sends, and cancel batchCtx so in-flight moves abort via their propagated context rather than running blind to completion. * fix: bound cleanup timeout and validate batch move fields - Use a 30-second timeout for the deferred markVolumeWritable cleanup instead of context.Background() which can block indefinitely if the volume server is unreachable - Validate required fields (VolumeID, SourceNode, TargetNode) before appending moves to a batch proposal, skipping invalid entries - Fall back to a single-move proposal when filtering leaves only one valid move in a batch * fix: cancel task execution on SendProgress stream failure All handler progress callbacks previously ignored SendProgress errors, allowing tasks to continue executing after the client disconnected. Now each handler creates a derived cancellable context and cancels it on the first SendProgress error, stopping the in-flight task promptly. Handlers fixed: erasure_coding, vacuum, volume_balance (single-move), and admin_script (breaks command loop on send failure). * fix: validate batch moves before scheduling in executeBatchMoves Reject empty batches, enforce a hard upper bound (100 moves), and filter out nil or incomplete move specs (missing source/target/volume) before allocating progress tracking and launching goroutines. * test: add batch balance execution integration test Tests the batch move path with 3 volumes, max concurrency 2, using fake volume servers. Verifies all moves complete with correct readonly, copy, tail, and delete RPC counts. * test: add MarkWritableCount and ReadFileStatusCount accessors Expose the markWritableCalls and readFileStatusCalls counters on the fake volume server, following the existing MarkReadonlyCount pattern. * fix: oscillation guard uses global effective counts for heterogeneous capacity The oscillation guard (max-min <= 1) previously used maxServer/minServer which are determined by utilization ratio. With heterogeneous capacity, maxServer by utilization can have fewer raw volumes than minServer, producing a negative diff and incorrectly triggering the guard. Now scans all servers' effective counts to find the true global max/min volume counts, so the guard works correctly regardless of whether utilization-based or raw-count balancing is used. * fix: admin script handler breaks outer loop on SendProgress failure The break on SendProgress error inside the shell.Commands scan only exited the inner loop, letting the outer command loop continue executing commands on a broken stream. Use a sendBroken flag to propagate the break to the outer execCommands loop.	2026-03-09 19:30:08 -07:00
Chris Lu	b991acf634	fix: paginate bucket listing in Admin UI to show all buckets (#8585 ) * fix: paginate bucket listing in Admin UI to show all buckets The Admin UI's GetS3Buckets() had a hardcoded Limit of 1000 in the ListEntries request, causing the Total Buckets count to cap at 1000 even when more buckets exist. This adds pagination to iterate through all buckets by continuing from the last entry name when a full page is returned. Fixes seaweedfs/seaweedfs#8564 * feat: add server-side pagination and sorting to S3 buckets page Add pagination controls, page size selector, and sortable column headers to the Admin UI's Object Store buckets page, following the same pattern used by the Cluster Volumes page. This ensures the UI remains responsive with thousands of buckets. - Add CurrentPage, TotalPages, PageSize, SortBy, SortOrder to S3BucketsData - Accept page/pageSize/sortBy/sortOrder query params in ShowS3Buckets handler - Sort buckets by name, owner, created, objects, logical/physical size - Paginate results server-side (default 100 per page) - Add pagination nav, page size dropdown, and sort indicators to template * Update s3_buckets_templ.go * Update object_store_users_templ.go * fix: use errors.Is(err, io.EOF) instead of string comparison Replace brittle err.Error() == "EOF" string comparison with idiomatic errors.Is(err, io.EOF) for checking stream end in bucket listing. * fix: address PR review findings for bucket pagination - Clamp page to totalPages when page exceeds total, preventing empty results with misleading pagination state - Fix sort comparator to use explicit ascending/descending comparisons with a name tie-breaker, satisfying strict weak ordering for sort.Slice - Capture SnapshotTsNs from first ListEntries response and pass it to subsequent requests for consistent pagination across pages - Replace non-focusable <th onclick> sort headers with <a> tags and reuse getSortIcon, matching the cluster_volumes accessibility pattern - Change exportBucketList() to fetch all buckets from /api/s3/buckets instead of scraping DOM rows (which now only contain the current page)	2026-03-09 18:55:47 -07:00
Chris Lu	02d3e3195c	Update object_store_users_templ.go	2026-03-09 18:34:58 -07:00
Chris Lu	470075dd90	admin/balance: fix Max Volumes display and balancer source selection (#8583 ) * admin: fix Max Volumes column always showing 0 GetClusterVolumeServers() computed DiskCapacity from diskInfo.MaxVolumeCount but never populated the MaxVolumes field on the VolumeServer struct, causing the column to always display 0. * balance: use utilization ratio for source server selection The balancer selected the source server (to move volumes FROM) by raw volume count. In clusters with heterogeneous MaxVolumeCount settings, the server with the highest capacity naturally holds the most volumes and was always picked as the source, even when it had the lowest utilization ratio. Change source selection and imbalance calculation to use utilization ratio (effectiveCount / maxVolumeCount) so servers are compared by how full they are relative to their capacity, not by absolute volume count. This matches how destination scoring already works via calculateBalanceScore().	2026-03-09 18:34:11 -07:00
Chris Lu	6dab90472b	admin: fix access key creation UX (#8579 ) * admin: remove misleading "secret key only shown once" warning The access key details modal already allows viewing both the access key and secret key at any time, so the warning about the secret key only being displayed once is incorrect and misleading. * admin: allow specifying custom access key and secret key Add optional access_key and secret_key fields to the create access key API. When provided, the specified keys are used instead of generating random ones. The UI now shows a form with optional fields when creating a new key, with a note that leaving them blank auto-generates keys. * admin: check access key uniqueness before creating Access keys must be globally unique across all users since S3 auth looks them up in a single global map. Add an explicit check using GetUserByAccessKey before creating, so the user gets a clear error ("access key is already in use") rather than a generic store error. * Update object_store_users_templ.go * admin: address review feedback for access key creation Handler: - Use decodeJSONBody/newJSONMaxReader instead of raw json.Decode to enforce request size limits and handle malformed JSON properly - Return 409 Conflict for duplicate access keys, 400 Bad Request for validation errors, instead of generic 500 Backend: - Validate access key length (4-128 chars) and secret key length (8-128 chars) when user-provided Frontend: - Extract resetCreateKeyForm() helper to avoid duplicated cleanup logic - Wire resetCreateKeyForm to accessKeysModal hidden.bs.modal event so form state is always cleared when modal is dismissed - Change secret key input to type="password" with a visibility toggle * admin: guard against nil request and handle GetUserByAccessKey errors - Add nil check for the CreateAccessKeyRequest pointer before dereferencing, defaulting to an empty request (auto-generate both keys). - Handle non-"not found" errors from GetUserByAccessKey explicitly instead of silently proceeding, so store errors (e.g. db connection failures) surface rather than being swallowed. * Update object_store_users_templ.go * admin: fix access key uniqueness check with gRPC store GetUserByAccessKey returns a gRPC NotFound status error (not the sentinel credential.ErrAccessKeyNotFound) when using the gRPC store, causing the uniqueness check to fail with a spurious error. Treat the lookup as best-effort: only reject when a user is found (err == nil). Any error (not-found via any store, connectivity issues) falls through to the store's own CreateAccessKey which enforces uniqueness definitively. * admin: fix error handling and input validation for access key creation Backend: - Remove access key value from the duplicate-key error message to avoid logging the caller-supplied identifier. Handler: - Handle empty POST body (io.EOF) as a valid request that auto-generates both keys, instead of rejecting it as malformed JSON. - Return 404 for "not found" errors (e.g. non-existent user) instead of collapsing them into a 500. Frontend: - Add minlength/maxlength attributes matching backend constraints (access key 4-128, secret key 8-128). - Call reportValidity() before submitting so invalid lengths are caught client-side without a round trip. * admin: use sentinel errors and fix GetUserByAccessKey error handling Backend (user_management.go): - Define sentinel errors (ErrAccessKeyInUse, ErrUserNotFound, ErrInvalidInput) and wrap them in returned errors so callers can use errors.Is. - Handle GetUserByAccessKey errors properly: check the sentinel credential.ErrAccessKeyNotFound first, then fall back to string matching for stores (gRPC) that return non-sentinel not-found errors. Surface unexpected errors instead of silently proceeding. Handler (user_handlers.go): - Replace fragile strings.Contains error matching with errors.Is against the new dash sentinels. Frontend (object_store_users.templ): - Add double-submit guard (isCreatingKey flag + button disabling) to prevent duplicate access key creation requests.	2026-03-09 14:03:41 -07:00
Chris Lu	55bce53953	reduce logs	2026-03-09 12:14:25 -07:00
Chris Lu	992db11d2b	iam: add IAM group management (#8560 ) * iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742). * iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration. * iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser. * iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction. * iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects. * iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies. * iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers. * iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads. * admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation. * test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow. * iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation * iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed. * iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries. * iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format. * iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response. * iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies. * iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop. * test: use t.Cleanup for bucket cleanup in group policy test * iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500. * admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation. * iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration * admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies * iam: short-circuit UpdateUser when newUserName equals current name * iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors. * ci: add weed/pb/** to S3 IAM test trigger paths * test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation. * fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation. * feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status. * fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework. * feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity. * fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element. * fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename. * fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename. * fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window. * fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket. * fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization. * fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies. * fix: use errors.Is for sql.ErrNoRows comparison in postgres group store * docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol. * fix: remove unused policyNameFromArn and strings import * fix: update service account ParentUser on user rename When renaming a user via UpdateUser, also update ParentUser references in service accounts to prevent them from becoming orphaned after the next configuration reload. * fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500. * fix: use admin S3 client for bucket cleanup in enforcement test The user S3 client may lack permissions by cleanup time since the user is removed from the group in an earlier subtest. Use the admin S3 client to ensure bucket and object cleanup always succeeds. * fix: add nil guard for group param in propagating store log calls Prevent potential nil dereference when logging group.Name in CreateGroup and UpdateGroup of PropagatingCredentialStore. * fix: validate Disabled field in UpdateGroup handlers Reject values other than "true" or "false" with InvalidInputException instead of silently treating them as false. * fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration Previously the merge started with empty group maps, dropping any static-file groups. Now seeds from existing iam.groups before overlaying dynamic config, and builds the reverse index after merging to avoid stale entries from overridden groups. * fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading Replace direct equality (==) with errors.Is() to correctly match wrapped errors, consistent with the rest of the codebase. * fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes. * fix: log cleanup errors in group integration tests Replace fire-and-forget cleanup calls with error-checked versions that log failures via t.Logf for debugging visibility. * fix: prevent duplicate group test runs in CI matrix The basic lane's -run "TestIAM" regex also matched TestIAMGroup* tests, causing them to run in both the basic and group lanes. Replace with explicit test function names. * fix: add GIN index on groups.members JSONB for membership lookups Without this index, ListGroupsForUser and membership queries require full table scans on the groups table. * fix: handle cross-directory moves in IAM config subscription When a file is moved out of an IAM directory (e.g., /etc/iam/groups), the dir variable was overwritten with NewParentPath, causing the source directory change to be missed. Now also notifies handlers about the source directory for cross-directory moves. * fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards. * fix: merge groups by name instead of blind append during filer load Match the identity loader's merge behavior: find existing group by name and replace, only append when no match exists. Prevents duplicates when legacy and multi-file configs overlap. * fix: check DeleteEntry response error when cleaning obsolete group files Capture and log resp.Error from filer DeleteEntry calls during group file cleanup, matching the pattern used in deleteGroupFile. * fix: verify source user exists before no-op check in UpdateUser Reorder UpdateUser to find the source identity first and return NoSuchEntityException if not found, before checking if the rename is a no-op. Previously a non-existent user renamed to itself would incorrectly return success. * fix: update service account parent refs on user rename in embedded IAM The embedded IAM UpdateUser handler updated group membership but not service account ParentUser fields, unlike the standalone handler. * fix: replay source-side events for all handlers on cross-dir moves Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for the source directory during cross-directory moves, so all watchers can clear caches for the moved-away resource. * fix: don't seed mergedGroups from existing iam.groups in merge Groups are always dynamic (from filer), never static (from s3.config). Seeding from iam.groups caused stale deleted groups to persist. Now only uses config.Groups from the dynamic filer config. * fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect Register t.Cleanup for the created user so it gets cleaned up even if the test fails before the inline DeleteUser call. * fix: assert UpdateGroup HTTP status in disabled group tests Add require.Equal checks for 200 status after UpdateGroup calls so the test fails immediately on API errors rather than relying on the subsequent Eventually timeout. * fix: trim whitespace from group name in filer store operations Trim leading/trailing whitespace from group.Name before validation in CreateGroup and UpdateGroup to prevent whitespace-only filenames. Also merge groups by name during multi-file load to prevent duplicates. * fix: add nil/empty group validation in gRPC store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics and invalid persistence. * fix: add nil/empty group validation in postgres store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil member access and empty-name row inserts. * fix: add name collision check in embedded IAM UpdateUser The embedded IAM handler renamed users without checking if the target name already existed, unlike the standalone handler. * fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500. * fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found". * fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency. * fix: add nil/empty group validation in memory store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil pointer dereference on map access. * fix: reorder embedded IAM UpdateUser to verify source first Find the source identity before checking for collisions, matching the standalone handler's logic. Previously a non-existent user renamed to an existing name would get EntityAlreadyExists instead of NoSuchEntity. * fix: handle same-directory renames in metadata subscription Replay a delete event for the old entry name during same-directory renames so handlers like onBucketMetadataChange can clean up stale state for the old name. * fix: abort GetGroups on non-ErrGroupNotFound errors Only skip groups that return ErrGroupNotFound. Other errors (e.g., transient backend failures) now abort the handler and return the error to the caller instead of silently producing partial results. * fix: add aria-label and title to icon-only group action buttons Add accessible labels to View and Delete buttons so screen readers and tooltips provide meaningful context. * fix: validate group name in saveGroup to prevent invalid filenames Trim whitespace and reject empty names before writing group JSON files, preventing creation of files like ".json". * fix: add /etc/iam/groups to filer subscription watched directories The groups directory was missing from the watched directories list, so S3 servers in a cluster would not detect group changes made by other servers via filer. The onIamConfigChange handler already had code to handle group directory changes but it was never triggered. * add direct gRPC propagation for group changes to S3 servers Groups now have the same dual propagation as identities and policies: direct gRPC push via propagateChange + async filer subscription. - Add PutGroup/RemoveGroup proto messages and RPCs - Add PutGroup/RemoveGroup in-memory cache methods on IAM - Add PutGroup/RemoveGroup gRPC server handlers - Update PropagatingCredentialStore to call propagateChange on group mutations * reduce log verbosity for config load summary Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof to avoid noisy output on every config reload. * admin: show user groups in view and edit user modals - Add Groups field to UserDetails and populate from credential manager - Show groups as badges in user details view modal - Add group management to edit user modal: display current groups, add to group via dropdown, remove from group via badge x button * fix: remove duplicate showAlert that broke modal-alerts.js admin.js defined showAlert(type, message) which overwrote the modal-alerts.js version showAlert(message, type), causing broken unstyled alert boxes. Remove the duplicate and swap all callers in admin.js to use the correct (message, type) argument order. * fix: unwrap groups API response in edit user modal The /api/groups endpoint returns {"groups": [...]}, not a bare array. * Update object_store_users_templ.go * test: assert AccessDenied error code in group denial tests Replace plain assert.Error checks with awserr.Error type assertion and AccessDenied code verification, matching the pattern used in other IAM integration tests. * fix: propagate GetGroups errors in ShowGroups handler getGroupsPageData was swallowing errors and returning an empty page with 200 status. Now returns the error so ShowGroups can respond with a proper error status. * fix: reject AttachGroupPolicy when credential manager is nil Previously skipped policy existence validation when credentialManager was nil, allowing attachment of nonexistent policies. Now returns a ServiceFailureException error. * fix: preserve groups during partial MergeS3ApiConfiguration updates UpsertIdentity calls MergeS3ApiConfiguration with a partial config containing only the updated identity (nil Groups). This was wiping all in-memory group state. Now only replaces groups when config.Groups is non-nil (full config reload). * fix: propagate errors from group lookup in GetObjectStoreUserDetails ListGroups and GetGroup errors were silently ignored, potentially showing incomplete group data in the UI. * fix: use DOM APIs for group badge remove button to prevent XSS Replace innerHTML with onclick string interpolation with DOM createElement + addEventListener pattern. Also add aria-label and title to the add-to-group button. * fix: snapshot group policies under RLock to prevent concurrent map access evaluateIAMPolicies was copying the map reference via groupMap := iam.groups under RLock then iterating after RUnlock, while PutGroup mutates the map in-place. Now copies the needed policy names into a slice while holding the lock. * fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers Match the nil guard pattern used by PutPolicy/DeletePolicy to prevent nil pointer dereference when IAM is not initialized.	2026-03-09 11:54:32 -07:00
Chris Lu	78a3441b30	fix: volume balance detection returns multiple tasks per run (#8559 ) * fix: volume balance detection now returns multiple tasks per run (#8551) Previously, detectForDiskType() returned at most 1 balance task per disk type, making the MaxJobsPerDetection setting ineffective. The detection loop now iterates within each disk type, planning multiple moves until the imbalance drops below threshold or maxResults is reached. Effective volume counts are adjusted after each planned move so the algorithm correctly re-evaluates which server is overloaded. * fix: factor pending tasks into destination scoring and use UnixNano for task IDs - Use UnixNano instead of Unix for task IDs to avoid collisions when multiple tasks are created within the same second - Adjust calculateBalanceScore to include LoadCount (pending + assigned tasks) in the utilization estimate, so the destination picker avoids stacking multiple planned moves onto the same target disk * test: add comprehensive balance detection tests for complex scenarios Cover multi-server convergence, max-server shifting, destination spreading, pre-existing pending task skipping, no-duplicate-volume invariant, and parameterized convergence verification across different cluster shapes and thresholds. * fix: address PR review findings in balance detection - hasMore flag: compute from len(results) >= maxResults so the scheduler knows more pages may exist, matching vacuum/EC handler pattern - Exhausted server fallthrough: when no eligible volumes remain on the current maxServer (all have pending tasks) or destination planning fails, mark the server as exhausted and continue to the next overloaded server instead of stopping the entire detection loop - Return canonical destination server ID directly from createBalanceTask instead of resolving via findServerIDByAddress, eliminating the fragile address→ID lookup for adjustment tracking - Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks with negative scores (high pending load, same rack/DC) are still selected as the best available destination - Add TestDetection_ExhaustedServerFallsThrough covering the scenario where the top server's volumes are all blocked by pre-existing tasks * test: fix computeEffectiveCounts and add len guard in no-duplicate test - computeEffectiveCounts now takes a servers slice to seed counts for all known servers (including empty ones) and uses an address→ID map from the topology spec instead of scanning metrics, so destination servers with zero initial volumes are tracked correctly - TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1 before checking duplicates, so the test actually fails if Detection regresses to returning a single task * fix: remove redundant HasAnyTask check in createBalanceTask The HasAnyTask check in createBalanceTask duplicated the same check already performed in detectForDiskType's volume selection loop. Since detection runs single-threaded (MaxDetectionConcurrency: 1), no race can occur between the two points. * fix: consistent hasMore pattern and remove double-counted LoadCount in scoring - Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check len > maxResults, and truncate — consistent truncation semantics - Remove direct LoadCount penalty in calculateBalanceScore since LoadCount is already factored into effectiveVolumeCount for utilization scoring; bump utilization weight from 40 to 50 to compensate for the removed 10-point load penalty * fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers - When MaxResults is 0 (omitted), treat as no explicit cap instead of defaulting to 1; only apply the +1 over-fetch probe when caller supplies a positive limit - Move decision trace emission after hasMore/trim so the trace accurately reflects the returned proposals - Seed serverVolumeCounts from ActiveTopology so servers that have a matching disk type but zero volumes are included in the imbalance calculation and MinServerCount check * fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order - Add early nil guard for clusterInfo in Detection to prevent panics in downstream helpers (detectForDiskType, createBalanceTask) - Change register.go DetectionFunc wrapper from maxResults=1 to 0 (no cap) so the legacy code path returns all detected tasks - Sort disk type keys before iteration so results are deterministic when maxResults spans multiple disk types (HDD/SSD) * fix: don't over-fetch in stateful detection to avoid orphaned pending tasks Detection registers planned moves in ActiveTopology via AddPendingTask, so requesting maxResults+1 would create an extra pending task that gets discarded during trim. Use len(results) >= maxResults as the hasMore signal instead, which is correct since Detection already caps internally. * fix: return explicit truncated flag from Detection instead of approximating Detection now returns (results, truncated, error) where truncated is true only when the loop stopped because it hit maxResults, not when it ran out of work naturally. This eliminates false hasMore signals when detection happens to produce exactly maxResults results by resolving the imbalance. * cleanup: simplify detection logic and remove redundancies - Remove redundant clusterInfo nil check in detectForDiskType since Detection already guards against nil clusterInfo - Remove adjustments loop for destination servers not in serverVolumeCounts — topology seeding ensures all servers with matching disk type are already present - Merge two-loop min/max calculation into a single loop: min across all servers, max only among non-exhausted servers - Replace magic number 100 with len(metrics) for minC initialization in convergence test * fix: accurate truncation flag, deterministic server order, indexed volume lookup - Track balanced flag to distinguish "hit maxResults cap" from "cluster balanced at exactly maxResults" — truncated is only true when there's genuinely more work to do - Sort servers for deterministic iteration and tie-breaking when multiple servers have equal volume counts - Pre-index volumes by server with per-server cursors to avoid O(maxResults * volumes) rescanning on each iteration - Add truncation flag assertions to RespectsMaxResults test: true when capped, false when detection finishes naturally * fix: seed trace server counts from ActiveTopology to match detection logic The decision trace was building serverVolumeCounts only from metrics, missing zero-volume servers seeded from ActiveTopology by Detection. This could cause the trace to report wrong server counts, incorrect imbalance ratios, or spurious "too few servers" messages. Pass activeTopology into the trace function and seed server counts the same way Detection does. * fix: don't exhaust server on per-volume planning failure, sort volumes by ID - When createBalanceTask returns nil, continue to the next volume on the same server instead of marking the entire server as exhausted. The failure may be volume-specific (not found in topology, pending task registration failed) and other volumes on the server may still be viable candidates. - Sort each server's volume slice by VolumeID after pre-indexing so volume selection is fully deterministic regardless of input order. * fix: use require instead of assert to prevent nil dereference panic in CORS test The test used assert.NoError (non-fatal) for GetBucketCors, then immediately accessed getResp.CORSRules. When the API returns an error, getResp is nil causing a panic. Switch to require.NoError/NotNil/Len so the test stops before dereferencing a nil response. * fix: deterministic disk tie-breaking and stronger pre-existing task test - Sort available disks by NodeID then DiskID before scoring so destination selection is deterministic when two disks score equally - Add task count bounds assertion to SkipsPreExistingPendingTasks test: with 15 of 20 volumes already having pending tasks, at most 5 new tasks should be created and at least 1 (imbalance still exists) * fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling Detection now calls ActiveTopology.GetTaskServerAdjustments() to initialize the adjustments map with source/destination deltas from existing pending and assigned balance tasks. This ensures effectiveCounts reflects in-flight moves, preventing the algorithm from planning additional moves in the same direction when prior moves already address the imbalance. Added GetTaskServerAdjustments(taskType) to ActiveTopology which iterates pending and assigned tasks, decrementing source servers and incrementing destination servers for the given task type.	2026-03-08 21:34:03 -07:00
Chris Lu	ba66411337	Update plugin_templ.go	2026-03-08 14:29:06 -07:00
Chris Lu	7808b301ef	admin: remove Scheduler Settings cards from plugin UI (#8558 ) * admin: remove Scheduler Settings cards, make Next Run full-width Remove the two "Scheduler Settings" placeholder cards from the plugin UI (overview page and scheduler tab). They only contained a text note saying detection intervals are configured per job type, which is self-evident from the per-job-type settings form. Make the "Next Run" card full-width on the overview page since it no longer shares a row with the removed card. * plugin UI: promote Next Run to top summary card row Move "Next Run" from a standalone card into the top row alongside Workers, Active Jobs, and Activities as a compact stat card.	2026-03-08 14:27:57 -07:00
Chris Lu	fa7da0f57e	template	2026-03-08 14:05:42 -07:00
Chris Lu	961c270aba	admin: expose per-job-type detection interval in plugin UI (#8552 ) * admin: expose per-job-type detection interval in plugin UI The detection_interval_seconds field was not editable in the admin UI. collectAdminSettings() silently preserved the existing value, making it impossible for users to change how often a job type checks for new work. Users would change the global "Sleep Between Iterations" setting expecting it to control job scheduling frequency, but that only controls the scheduler loop's idle polling rate. Add a "Detection Interval (s)" input to the per-job-type admin settings form so users can actually configure it. Fixes #8549 * admin: remove global Sleep Between Iterations setting Now that per-job-type detection intervals are exposed in the UI, the global IdleSleepSeconds setting is redundant and confusing. It only controlled the scheduler loop's idle polling rate, which is always overridden by earliestNextDetectionAt() when job types exist. Replace the three usages with simpler alternatives: - Scheduler loop sleep: use defaultSchedulerIdleSleep constant - Initial delay for new job types: use policy.DetectionInterval/2 (more logical since it's already per-job-type) - Status fallback: use the constant The API endpoints are kept for backward compatibility but the UI no longer exposes or calls them. * admin: restore configurable idle sleep in scheduler loop The EC integration test sets idle_sleep_seconds=1 via the scheduler config API so the scheduler wakes quickly after workers connect. The previous commit replaced this with a hardcoded 613s constant, causing the scheduler to sleep through the entire test window. Restore GetSchedulerConfig().IdleSleepDuration() in the scheduler loop and status reporting. The UI removal of the setting is still correct — the API endpoint remains for programmatic use (e.g., tests). * admin: cap first-run initial delay to 5s instead of DetectionInterval/2 The initial delay for first-run job types was set to policy.DetectionInterval/2, which creates unbounded first-run latency (e.g., 1 hour for vacuum with a 2-hour detection interval). A small fixed 5-second delay provides sufficient stagger without penalizing startup time.	2026-03-08 14:03:51 -07:00
Chris Lu	e25558e4d8	admin: fix mobile sidebar menu inaccessible in portrait mode (#8556 ) * admin: fix mobile sidebar menu inaccessible in portrait mode The hamburger button only toggled the user dropdown, leaving the sidebar navigation inaccessible on mobile devices in portrait mode. Add a dedicated sidebar toggle button (visible only on mobile), give the sidebar an id so Bootstrap collapse can target it, add a backdrop overlay for the open state, and auto-close the sidebar when a nav link is clicked. Fixes #8550 * admin: address review feedback on mobile sidebar - Remove redundant JS show/hide.bs.collapse listeners; CSS sibling selector already handles backdrop visibility - Use const instead of var for non-reassigned variables - Move inline style on user icon to CSS class * admin: add aria attributes to user-menu toggler, use CSS variable for navbar height - Add aria-controls, aria-expanded, and aria-label to the user-menu toggle button for assistive technology - Extract hard-coded 56px navbar height into --navbar-height CSS custom property used by sidebar and backdrop positioning * admin: extract hideSidebar helper, use toggler visibility for breakpoint check - Extract duplicated collapse-hide logic into a hideSidebar helper - Replace hardcoded window.innerWidth < 768 with a check on the sidebar toggler's computed display, decoupling JS from CSS breakpoints - Add aria-expanded="false" to sidebar toggle button --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-08 12:32:14 -07:00
Chris Lu	3f946fc0c0	mount: make metadata cache rebuilds snapshot-consistent (#8531 ) * filer: expose metadata events and list snapshots * mount: invalidate hot directory caches * mount: read hot directories directly from filer * mount: add sequenced metadata cache applier * mount: apply metadata responses through cache applier * mount: replay snapshot-consistent directory builds * mount: dedupe self metadata events * mount: factor directory build cleanup * mount: replace proto marshal dedup with composite key and ring buffer The dedup logic was doing a full deterministic proto.Marshal on every metadata event just to produce a dedup key. Replace with a cheap composite string key (TsNs\|Directory\|OldName\|NewName). Also replace the sliding-window slice (which leaked the backing array unboundedly) with a fixed-size ring buffer that reuses the same array. * filer: remove mutex and proto.Clone from request-scoped MetadataEventSink MetadataEventSink is created per-request and only accessed by the goroutine handling the gRPC call. The mutex and double proto.Clone (once in Record, once in Last) were unnecessary overhead on every filer write operation. Store the pointer directly instead. * mount: skip proto.Clone for caller-owned metadata events Add ApplyMetadataResponseOwned that takes ownership of the response without cloning. Local metadata events (mkdir, create, flush, etc.) are freshly constructed and never shared, so the clone is unnecessary. * filer: only populate MetadataEvent on successful DeleteEntry Avoid calling eventSink.Last() on error paths where the sink may contain a partial event from an intermediate child deletion during recursive deletes. * mount: avoid map allocation in collectDirectoryNotifications Replace the map with a fixed-size array and linear dedup. There are at most 3 directories to notify (old parent, new parent, new child if directory), so a 3-element array avoids the heap allocation on every metadata event. * mount: fix potential deadlock in enqueueApplyRequest Release applyStateMu before the blocking channel send. Previously, if the channel was full (cap 128), the send would block while holding the mutex, preventing Shutdown from acquiring it to set applyClosed. * mount: restore signature-based self-event filtering as fast path Re-add the signature check that was removed when content-based dedup was introduced. Checking signatures is O(1) on a small slice and avoids enqueuing and processing events that originated from this mount instance. The content-based dedup remains as a fallback. * filer: send snapshotTsNs only in first ListEntries response The snapshot timestamp is identical for every entry in a single ListEntries stream. Sending it in every response message wastes wire bandwidth for large directories. The client already reads it only from the first response. * mount: exit read-through mode after successful full directory listing MarkDirectoryRefreshed was defined but never called, so directories that entered read-through mode (hot invalidation threshold) stayed there permanently, hitting the filer on every readdir even when cold. Call it after a complete read-through listing finishes. * mount: include event shape and full paths in dedup key The previous dedup key only used Names, which could collapse distinct rename targets. Include the event shape (C/D/U/R), source directory, new parent path, and both entry names so structurally different events are never treated as duplicates. * mount: drain pending requests on shutdown in runApplyLoop After receiving the shutdown sentinel, drain any remaining requests from applyCh non-blockingly and signal each with errMetaCacheClosed so callers waiting on req.done are released. * mount: include IsDirectory in synthetic delete events metadataDeleteEvent now accepts an isDirectory parameter so the applier can distinguish directory deletes from file deletes. Rmdir passes true, Unlink passes false. * mount: fall back to synthetic event when MetadataEvent is nil In mknod and mkdir, if the filer response omits MetadataEvent (e.g. older filer without the field), synthesize an equivalent local metadata event so the cache is always updated. * mount: make Flush metadata apply best-effort after successful commit After filer_pb.CreateEntryWithResponse succeeds, the entry is persisted. Don't fail the Flush syscall if the local metadata cache apply fails — log and invalidate the directory cache instead. Also fall back to a synthetic event when MetadataEvent is nil. * mount: make Rename metadata apply best-effort The rename has already succeeded on the filer by the time we apply the local metadata event. Log failures instead of returning errors that would be dropped by the caller anyway. * mount: make saveEntry metadata apply best-effort with fallback After UpdateEntryWithResponse succeeds, treat local metadata apply as non-fatal. Log and invalidate the directory cache on failure. Also fall back to a synthetic event when MetadataEvent is nil. * filer_pb: preserve snapshotTsNs on error in ReadDirAllEntriesWithSnapshot Return the snapshot timestamp even when the first page fails, so callers receive the snapshot boundary when partial data was received. * filer: send snapshot token for empty directory listings When no entries are streamed, send a final ListEntriesResponse with only SnapshotTsNs so clients always receive the snapshot boundary. * mount: distinguish not-found vs transient errors in lookupEntry Return fuse.EIO for non-not-found filer errors instead of unconditionally returning ENOENT, so transient failures don't masquerade as missing entries. * mount: make CacheRemoteObject metadata apply best-effort The file content has already been cached successfully. Don't fail the read if the local metadata cache update fails. * mount: use consistent snapshot for readdir in direct mode Capture the SnapshotTsNs from the first loadDirectoryEntriesDirect call and store it on the DirectoryHandle. Subsequent batch loads pass this stored timestamp so all batches use the same snapshot. Also export DoSeaweedListWithSnapshot so mount can use it directly with snapshot passthrough. * filer_pb: fix test fake to send SnapshotTsNs only on first response Match the server behavior: only the first ListEntriesResponse in a page carries the snapshot timestamp, subsequent entries leave it zero. * Fix nil pointer dereference in ListEntries stream consumers Remove the empty-directory snapshot-only response from ListEntries that sent a ListEntriesResponse with Entry==nil, which crashed every raw stream consumer that assumed resp.Entry is always non-nil. Also add defensive nil checks for resp.Entry in all raw ListEntries stream consumers across: S3 listing, broker topic lookup, broker topic config, admin dashboard, topic retention, hybrid message scanner, Kafka integration, and consumer offset storage. * Add nil guards for resp.Entry in remaining ListEntries stream consumers Covers: S3 object lock check, MQ management dashboard (version/ partition/offset loops), and topic retention version loop. * Make applyLocalMetadataEvent best-effort in Link and Symlink The filer operations already succeeded; failing the syscall because the local cache apply failed is wrong. Log a warning and invalidate the parent directory cache instead. * Make applyLocalMetadataEvent best-effort in Mkdir/Rmdir/Mknod/Unlink The filer RPC already committed; don't fail the syscall when the local metadata cache apply fails. Log a warning and invalidate the parent directory cache to force a re-fetch on next access. * flushFileMetadata: add nil-fallback for metadata event and best-effort apply Synthesize a metadata event when resp.GetMetadataEvent() is nil (matching doFlush), and make the apply best-effort with cache invalidation on failure. * Prevent double-invocation of cleanupBuild in doEnsureVisited Add a cleanupDone guard so the deferred cleanup and inline error-path cleanup don't both call DeleteFolderChildren/AbortDirectoryBuild. * Fix comment: signature check is O(n) not O(1) * Prevent deferred cleanup after successful CompleteDirectoryBuild Set cleanupDone before returning from the success path so the deferred context-cancellation check cannot undo a published build. * Invalidate parent directory caches on rename metadata apply failure When applyLocalMetadataEvent fails during rename, invalidate the source and destination parent directory caches so subsequent accesses trigger a re-fetch from the filer. * Add event nil-fallback and cache invalidation to Link and Symlink Synthesize metadata events when the server doesn't return one, and invalidate parent directory caches on apply failure. * Match requested partition when scanning partition directories Parse the partition range format (NNNN-NNNN) and match against the requested partition parameter instead of using the first directory. * Preserve snapshot timestamp across empty directory listings Initialize actualSnapshotTsNs from the caller-requested value so it isn't lost when the server returns no entries. Re-add the server-side snapshot-only response for empty directories (all raw stream consumers now have nil guards for Entry). * Fix CreateEntry error wrapping to support errors.Is/errors.As Use errors.New + %w instead of %v for resp.Error so callers can unwrap the underlying error. * Fix object lock pagination: only advance on non-nil entries Move entriesReceived inside the nil check so nil entries don't cause repeated ListEntries calls with the same lastFileName. * Guard Attributes nil check before accessing Mtime in MQ management * Do not send nil-Entry response for empty directory listings The snapshot-only ListEntriesResponse (with Entry == nil) for empty directories breaks consumers that treat any received response as an entry (Java FilerClient, S3 listing). The Go client-side DoSeaweedListWithSnapshot already preserves the caller-requested snapshot via actualSnapshotTsNs initialization, so the server-side send is unnecessary. * Fix review findings: subscriber dedup, invalidation normalization, nil guards, shutdown race - Remove self-signature early-return in processEventFn so all events flow through the applier (directory-build buffering sees self-originated events that arrive after a snapshot) - Normalize NewParentPath in collectEntryInvalidations to avoid duplicate invalidations when NewParentPath is empty (same-directory update) - Guard resp.Entry.Attributes for nil in admin_server.go and topic_retention.go to prevent panics on entries without attributes - Fix enqueueApplyRequest race with shutdown by using select on both applyCh and applyDone, preventing sends after the apply loop exits - Add cleanupDone check to deferred cleanup in meta_cache_init.go for clarity alongside the existing guard in cleanupBuild - Add empty directory test case for snapshot consistency * Propagate authoritative metadata event from CacheRemoteObjectToLocalCluster and generate client-side snapshot for empty directories - Add metadata_event field to CacheRemoteObjectToLocalClusterResponse proto so the filer-emitted event is available to callers - Use WithMetadataEventSink in the server handler to capture the event from NotifyUpdateEvent and return it on the response - Update filehandle_read.go to prefer the RPC's metadata event over a locally fabricated one, falling back to metadataUpdateEvent when the server doesn't provide one (e.g., older filers) - Generate a client-side snapshot cutoff in DoSeaweedListWithSnapshot when the server sends no snapshot (empty directory), so callers like CompleteDirectoryBuild get a meaningful boundary for filtering buffered events * Skip directory notifications for dirs being built to prevent mid-build cache wipe When a metadata event is buffered during a directory build, applyMetadataSideEffects was still firing noteDirectoryUpdate for the building directory. If the directory accumulated enough updates to become "hot", markDirectoryReadThrough would call DeleteFolderChildren, wiping entries that EnsureVisited had already inserted. The build would then complete and mark the directory cached with incomplete data. Fix by using applyMetadataSideEffectsSkippingBuildingDirs for buffered events, which suppresses directory notifications for dirs currently in buildingDirs while still applying entry invalidations. * Add test for directory notification suppression during active build TestDirectoryNotificationsSuppressedDuringBuild verifies that metadata events targeting a directory under active EnsureVisited build do NOT fire onDirectoryUpdate for that directory. In production, this prevents markDirectoryReadThrough from calling DeleteFolderChildren mid-build, which would wipe entries already inserted by the listing. The test inserts an entry during a build, sends multiple metadata events for the building directory, asserts no notifications fired for it, verifies the entry survives, and confirms buffered events are replayed after CompleteDirectoryBuild. * Fix create invalidations, build guard, event shape, context, and snapshot error path - collectEntryInvalidations: invalidate FUSE kernel cache on pure create events (OldEntry==nil && NewEntry!=nil), not just updates and deletes - completeDirectoryBuildNow: only call markCachedFn when an active build existed (state != nil), preventing an unpopulated directory from being marked as cached - Add metadataCreateEvent helper that produces a create-shaped event (NewEntry only, no OldEntry) and use it in mkdir, mknod, symlink, and hardlink create fallback paths instead of metadataUpdateEvent which incorrectly set both OldEntry and NewEntry - applyMetadataResponseEnqueue: use context.Background() for the queued mutation so a cancelled caller context cannot abort the apply loop mid-write - DoSeaweedListWithSnapshot: move snapshot initialization before ListEntries call so the error path returns the preserved snapshot instead of 0 * Fix review findings: test loop, cache race, context safety, snapshot consistency - Fix build test loop starting at i=1 instead of i=0, missing new-0.txt verification - Re-check IsDirectoryCached after cache miss to avoid ENOENT race with markDirectoryReadThrough - Use context.Background() in enqueueAndWait so caller cancellation can't abort build/complete mid-way - Pass dh.snapshotTsNs in skip-batch loadDirectoryEntriesDirect for snapshot consistency - Prefer resp.MetadataEvent over fallback in Unlink event derivation - Add comment on MetadataEventSink.Record single-event assumption * Fix empty-directory snapshot clock skew and build cancellation race Empty-directory snapshot: Remove client-side time.Now() synthesis when the server returns no entries. Instead return snapshotTsNs=0, and in completeDirectoryBuildNow replay ALL buffered events when snapshot is 0. This eliminates the clock-skew bug where a client ahead of the filer would filter out legitimate post-list events. Build cancellation: Use context.Background() for BeginDirectoryBuild and CompleteDirectoryBuild calls in doEnsureVisited, so errgroup cancellation doesn't cause enqueueAndWait to return early and trigger cleanupBuild while the operation is still queued. * Add tests for empty-directory build replay and cancellation resilience TestEmptyDirectoryBuildReplaysAllBufferedEvents: verifies that when CompleteDirectoryBuild receives snapshotTsNs=0 (empty directory, no server snapshot), ALL buffered events are replayed regardless of their TsNs values — no clock-skew-sensitive filtering occurs. TestBuildCompletionSurvivesCallerCancellation: verifies that once CompleteDirectoryBuild is enqueued, a cancelled caller context does not prevent the build from completing. The apply loop runs with context.Background(), so the directory becomes cached and buffered events are replayed even when the caller gives up waiting. * Fix directory subtree cleanup, Link rollback, test robustness - applyMetadataResponseLocked: when a directory entry is deleted or moved, call DeleteFolderChildren on the old path so cached descendants don't leak as stale entries. - Link: save original HardLinkId/Counter before mutation. If CreateEntryWithResponse fails after the source was already updated, rollback the source entry to its original state via UpdateEntry. - TestBuildCompletionSurvivesCallerCancellation: replace fixed time.Sleep(50ms) with a deadline-based poll that checks IsDirectoryCached in a loop, failing only after 2s timeout. - TestReadDirAllEntriesWithSnapshotEmptyDirectory: assert that ListEntries was actually invoked on the mock client so the test exercises the RPC path. - newMetadataEvent: add early return when both oldEntry and newEntry are nil to avoid emitting events with empty Directory. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-07 09:19:40 -08:00
Chris Lu	f9311a3422	s3api: fix static IAM policy enforcement after reload (#8532 ) * s3api: honor attached IAM policies over legacy actions * s3api: hydrate IAM policy docs during config reload * s3api: use policy-aware auth when listing buckets * credential: propagate context through filer_etc policy reads * credential: make legacy policy deletes durable * s3api: exercise managed policy runtime loader * s3api: allow static IAM users without session tokens * iam: deny unmatched attached policies under default allow * iam: load embedded policy files from filer store * s3api: require session tokens for IAM presigning * s3api: sync runtime policies into zero-config IAM * credential: respect context in policy file loads * credential: serialize legacy policy deletes * iam: align filer policy store naming * s3api: use authenticated principals for presigning * iam: deep copy policy conditions * s3api: require request creation in policy tests * filer: keep ReadInsideFiler as the context-aware API * iam: harden filer policy store writes * credential: strengthen legacy policy serialization test * credential: forward runtime policy loaders through wrapper * s3api: harden runtime policy merging * iam: require typed already-exists errors	2026-03-06 12:35:08 -08:00

1 2 3 4 5

202 Commits