seaweedFS

Author	SHA1	Message	Date
Chris Lu	1f3df6e9ef	admin: remove Alpha badge and unused Metrics/Logs menu items (#8525 ) * admin: remove Alpha badge and unused Metrics/Logs menu items * Update layout_templ.go	2026-03-05 11:51:11 -08:00
Chris Lu	230ae9c24e	no need to set default scripts now	2026-03-04 22:27:02 -08:00
Chris Lu	b3f7472fd3	4.15	2026-03-04 22:13:57 -08:00
Chris Lu	b3620c7e14	admin: auto migrating master maintenance scripts to admin_script plugin config (#8509 ) * admin: seed admin_script plugin config from master maintenance scripts When the admin server starts, fetch the maintenance scripts configuration from the master via GetMasterConfiguration. If the admin_script plugin worker does not already have a saved config, use the master's scripts as the default value. This enables seamless migration from master.toml [master.maintenance] to the admin script plugin worker. Changes: - Add maintenance_scripts and maintenance_sleep_minutes fields to GetMasterConfigurationResponse in master.proto - Populate the new fields from viper config in master_grpc_server.go - On admin server startup, fetch the master config and seed the admin_script plugin config if no config exists yet - Strip lock/unlock commands from the master scripts since the admin script worker handles locking automatically Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on admin_script seeding - Replace TOCTOU race (separate Load+Save) with atomic SaveJobTypeConfigIfNotExists on ConfigStore and Plugin - Replace ineffective polling loop with single GetMaster call using 30s context timeout, since GetMaster respects context cancellation - Add unit tests for SaveJobTypeConfigIfNotExists (in-memory + on-disk) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply maintenance script defaults in gRPC handler The gRPC handler for GetMasterConfiguration read maintenance scripts from viper without calling SetDefault, relying on startAdminScripts having run first. If the admin server calls GetMasterConfiguration before startAdminScripts sets the defaults, viper returns empty strings and the seeding is silently skipped. Apply SetDefault in the gRPC handler itself so it is self-contained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "fix: apply maintenance script defaults in gRPC handler" This reverts commit 068a5063303f6bc34825a07bb681adfa67e6f9de. * fix: use atomic save in ensureJobTypeConfigFromDescriptor ensureJobTypeConfigFromDescriptor used a separate Load + Save, racing with seedAdminScriptFromMaster. If the descriptor defaults (empty script) were saved first, SaveJobTypeConfigIfNotExists in the seeding goroutine would see an existing config and skip, losing the master's maintenance scripts. Switch to SaveJobTypeConfigIfNotExists so both paths are atomic. Whichever wins, the other is a safe no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fetch master scripts inline during config bootstrap, not in goroutine Replace the seedAdminScriptFromMaster goroutine with a ConfigDefaultsProvider callback. When the plugin bootstraps admin_script defaults from the worker descriptor, it calls the provider which fetches maintenance scripts from the master synchronously. This eliminates the race between the seeding goroutine and the descriptor-based config bootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * skip commented lock unlock Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * reduce grpc calls --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-04 22:11:07 -08:00
Chris Lu	7799804200	4.14 Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-04 19:22:39 -08:00
Chris Lu	c19f88eef1	fix: resolve ServerAddress to NodeId in maintenance task sync (#8508 ) * fix: maintenance task topology lookup, retry, and stale task cleanup 1. Strip gRPC port from ServerAddress in SyncTask using ToHttpAddress() so task targets match topology disk keys (NodeId format). 2. Skip capacity check when topology has no disks yet (startup race where tasks are loaded from persistence before first topology update). 3. Don't retry permanent errors like "volume not found" - these will never succeed on retry. 4. Cancel all pending tasks for each task type before re-detection, ensuring stale proposals from previous cycles are cleaned up. This prevents stale tasks from blocking new detection and from repeatedly failing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less lock scope Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-04 19:20:28 -08:00
Fábio Henrique Araújo	88e8342e44	style: Reseted padding to container-fluid div in layout template (#8505 ) * style: Reseted padding to container-fluid div in layout template * address comment Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Chris Lu <chris.lu@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-04 14:24:23 -08:00
Chris Lu	df5e8210df	Implement IAM managed policy operations (#8507 ) * feat: Implement IAM managed policy operations (GetPolicy, ListPolicies, DeletePolicy, AttachUserPolicy, DetachUserPolicy) - Add response type aliases in iamapi_response.go for managed policy operations - Implement 6 handler methods in iamapi_management_handlers.go: - GetPolicy: Lookup managed policy by ARN - DeletePolicy: Remove managed policy - ListPolicies: List all managed policies - AttachUserPolicy: Attach managed policy to user, aggregating inline + managed actions - DetachUserPolicy: Detach managed policy from user - ListAttachedUserPolicies: List user's attached managed policies - Add computeAllActionsForUser() to aggregate actions from both inline and managed policies - Wire 6 new DoActions switch cases for policy operations - Add comprehensive tests for all new handlers - Fixes #8506 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback for IAM managed policy operations - Add parsePolicyArn() helper with proper ARN prefix validation, replacing fragile strings.Split parsing in GetPolicy, DeletePolicy, AttachUserPolicy, and DetachUserPolicy - DeletePolicy now detaches the policy from all users and recomputes their aggregated actions, preventing stale permissions after deletion - Set changed=true for DeletePolicy DoActions case so identity updates persist - Make PolicyId consistent: CreatePolicy now uses Hash(&policyName) matching GetPolicy and ListPolicies - Remove redundant nil map checks (Go handles nil map lookups safely) - DRY up action deduplication in computeAllActionsForUser with addUniqueActions closure - Add tests for invalid/empty ARN rejection and DeletePolicy identity cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add integration tests for managed policy lifecycle (#8506) Add two integration tests covering the user-reported use case where managed policy operations returned 500 errors: - TestS3IAMManagedPolicyLifecycle: end-to-end workflow matching the issue report — CreatePolicy, ListPolicies, GetPolicy, AttachUserPolicy, ListAttachedUserPolicies, idempotent re-attach, DeletePolicy while attached (expects DeleteConflict), DetachUserPolicy, DeletePolicy, and verification that deleted policy is gone - TestS3IAMManagedPolicyErrorCases: covers error paths — nonexistent policy/user for GetPolicy, DeletePolicy, AttachUserPolicy, DetachUserPolicy, and ListAttachedUserPolicies Also fixes DeletePolicy to reject deletion when policy is still attached to a user (AWS-compatible DeleteConflictException), and adds the 409 status code mapping for DeleteConflictException in the error response handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: nil map panic in CreatePolicy, add PolicyId test assertions - Initialize policies.Policies map in CreatePolicy if nil (prevents panic when no policies exist yet); also handle filer_pb.ErrNotFound like other callers - Add PolicyId assertions in TestGetPolicy and TestListPolicies to lock in the consistent Hash(&policyName) behavior - Remove redundant time.Sleep calls from new integration tests (startMiniCluster already blocks on waitForS3Ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PutUserPolicy and DeleteUserPolicy now preserve managed policy actions PutUserPolicy and DeleteUserPolicy were calling computeAggregatedActionsForUser (inline-only), overwriting ident.Actions and dropping managed policy actions. Both now call computeAllActionsForUser which unions inline + managed actions. Add TestManagedPolicyActionsPreservedAcrossInlineMutations regression test: attaches a managed policy, adds an inline policy (verifies both actions present), deletes the inline policy, then asserts managed policy actions still persist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: PutUserPolicy verifies user exists before persisting inline policy Previously the inline policy was written to storage before checking if the target user exists in s3cfg.Identities, leaving orphaned policy data when the user was absent. Now validates the user first, returning NoSuchEntityException immediately if not found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent stale/lost actions on computeAllActionsForUser failure - PutUserPolicy: on recomputation failure, preserve existing ident.Actions instead of falling back to only the current inline policy's actions - DeleteUserPolicy: on recomputation failure, preserve existing ident.Actions instead of assigning nil (which wiped all permissions) - AttachUserPolicy: roll back ident.PolicyNames and return error if action recomputation fails, keeping identity consistent - DetachUserPolicy: roll back ident.PolicyNames and return error if GetPolicies or action recomputation fails - Add doc comment on newTestIamApiServer noting it only sets s3ApiConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 14:18:07 -08:00
Chris Lu	10a30a83e1	s3api: add GetObjectAttributes API support (#8504 ) * s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in 55c409dec. Add unit test for ResolveS3Action verifying that the attributes query parameter takes precedence over versionId, so GET ?attributes&versionId resolves to s3:GetObjectAttributes. This covers the fix in b92c61c95. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: guard negative chunk indices and rename PartsCount field Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in buildObjectAttributesParts to prevent panics from corrupted metadata with negative index values. Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match the AWS SDK v2 Go field naming convention, while preserving the XML element name "PartsCount" via the struct tag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: reject malformed max-parts and part-number-marker headers Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain non-integer or negative values, matching ListObjectPartsHandler behavior. Previously these were silently ignored with defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 12:52:09 -08:00
Racci	9e26d6f5dd	fix: port in SNI address when using domainName instead of IP for master (#8500 )	2026-03-04 07:05:45 -08:00
Copilot	e475cbfef8	Merge branch 'master' of https://github.com/seaweedfs/seaweedfs	2026-03-04 00:41:26 -08:00
Copilot	70ed9c2a55	Update plugin_templ.go Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-04 00:41:20 -08:00
Chris Lu	45ce18266a	Disable master maintenance scripts when admin server runs (#8499 ) * Disable master maintenance scripts when admin server runs * Stop defaulting master maintenance scripts * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Clarify master scripts are disabled by default * Skip master maintenance scripts when admin server is connected * Restore default master maintenance scripts * Document admin server skip for master maintenance scripts --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 00:40:40 -08:00
Chris Lu	18ccc9b773	Plugin scheduler: sequential iterations with max runtime (#8496 ) * pb: add job type max runtime setting * plugin: default job type max runtime * plugin: redesign scheduler loop * admin ui: update scheduler settings * plugin: fix scheduler loop state name * plugin scheduler: restore backlog skip * plugin scheduler: drop legacy detection helper * admin api: require scheduler config body * admin ui: preserve detection interval on save * plugin scheduler: use job context and drain cancels * plugin scheduler: respect detection intervals * plugin scheduler: gate runs and drain queue * ec test: reuse req/resp vars * ec test: add scheduler debug logs * Adjust scheduler idle sleep and initial run delay * Clear pending job queue before scheduler runs * Log next detection time in EC integration test * Improve plugin scheduler debug logging in EC test * Expose scheduler next detection time * Log scheduler next detection time in EC test * Wake scheduler on config or worker updates * Expose scheduler sleep interval in UI * Fix scheduler sleep save value selection * Set scheduler idle sleep default to 613s * Show scheduler next run time in plugin UI --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-03 23:09:49 -08:00
Chris Lu	e1e5b4a8a6	add admin script worker (#8491 ) * admin: add plugin lock coordination * shell: allow bypassing lock checks * plugin worker: add admin script handler * mini: include admin_script in plugin defaults * admin script UI: drop name and enlarge text * admin script: add default script * admin_script: make run interval configurable * plugin: gate other jobs during admin_script runs * plugin: use last completed admin_script run * admin: backfill plugin config defaults * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * comparable to default version Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * default to run Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * format Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect pre-set noLock for fix.replication * shell: add force no-lock mode for admin scripts * volume balance worker already exists Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * admin: expose scheduler status JSON * shell: add sleep command * shell: restrict sleep syntax * Revert "shell: respect pre-set noLock for fix.replication" This reverts commit 2b14e8b82602a740d3a473c085e3b3a14f1ddbb3. * templ Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * fix import Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * less logs Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce master client logs on canceled contexts * Update mini default job type count --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-03 15:10:40 -08:00
Peter Dodd	16f2269a33	feat(filer): lazy metadata pulling (#8454 ) * Add remote storage index for lazy metadata pull Introduces remoteStorageIndex, which maintains a map of filer directory to remote storage client/location, refreshed periodically from the filer's mount mappings. Provides lazyFetchFromRemote, ensureRemoteEntryInFiler, and isRemoteBacked on S3ApiServer as integration points for handler-level work in a follow-up PR. Nothing is wired into the server yet. Made-with: Cursor * Add unit tests for remote storage index and wire field into S3ApiServer Adds tests covering isEmpty, findForPath (including longest-prefix resolution), and isRemoteBacked. Also removes a stray PR review annotation from the index file and adds the remoteStorageIdx field to S3ApiServer so the package compiles ahead of the wiring PR. Made-with: Cursor * Address review comments on remote storage index - Use filer_pb.CreateEntry helper so resp.Error is checked, not just the RPC error - Extract keepPrev closure to remove duplicated error-handling in refresh loop - Add comment explaining availability-over-consistency trade-off on filer save failure Made-with: Cursor * Move lazy metadata pull from S3 API to filer - Add maybeLazyFetchFromRemote in filer: on FindEntry miss, stat remote and CreateEntry when path is under a remote mount - Use singleflight for dedup; context guard prevents CreateEntry recursion - Availability-over-consistency: return in-memory entry if CreateEntry fails - Add longest-prefix test for nested mounts in remote_storage_test.go - Remove remoteStorageIndex, lazyFetchFromRemote, ensureRemoteEntryInFiler, doLazyFetch from s3api; filer now owns metadata operations - Add filer_lazy_remote_test.go with tests for hit, miss, not-found, CreateEntry failure, longest-prefix, and FindEntry integration Made-with: Cursor * Address review: fix context guard test, add FindMountDirectory comment, remove dead code Made-with: Cursor * Nitpicks: restore prev maker in registerStubMaker, instance-scope lazyFetchGroup, nil-check remoteEntry Made-with: Cursor * Fix remotePath when mountDir is root: ensure relPath has leading slash Made-with: Cursor * filer: decouple lazy-fetch persistence from caller context Use context.Background() inside the singleflight closure for CreateEntry so persistence is not cancelled when the winning request's context is cancelled. Fixes CreateEntry failing for all waiters when the first caller times out. Made-with: Cursor * filer: remove redundant Mode bitwise OR with zero Made-with: Cursor * filer: use bounded context for lazy-fetch persistence Replace context.Background() with context.WithTimeout(30s) and defer cancel() to prevent indefinite blocking and release resources. Made-with: Cursor * filer: use checked type assertion for singleflight result Made-with: Cursor * filer: rename persist context vars to avoid shadowing function parameter Made-with: Cursor	2026-03-03 13:01:10 -08:00
Chris Lu	a61a2affe3	Expire stuck plugin jobs (#8492 ) * Add stale job expiry and expire API * Add expire job button * Add test hook and coverage for ExpirePluginJobAPI * Document scheduler filtering side effect and reuse helper * Restore job spec proposal test * Regenerate plugin template output --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-03 01:27:25 -08:00
Chris Lu	f5c35240be	Add volume dir tags and EC placement priority (#8472 ) * Add volume dir tags to topology Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add preferred tag config for EC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prioritize EC destinations by tags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add EC placement planner tag tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor EC placement tests to reuse buildActiveTopology Remove buildActiveTopologyWithDiskTags helper function and consolidate tag setup inline in test cases. Tests now use UpdateTopology to apply tags after topology creation, reusing the existing buildActiveTopology function rather than duplicating its logic. All tag scenario tests pass: - TestECPlacementPlannerPrefersTaggedDisks - TestECPlacementPlannerFallsBackWhenTagsInsufficient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate normalizeTagList into shared util package Extract normalizeTagList from three locations (volume.go, detection.go, erasure_coding_handler.go) into new weed/util/tag.go as exported NormalizeTagList function. Replace all duplicate implementations with imports and calls to util.NormalizeTagList. This improves code reuse and maintainability by centralizing tag normalization logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PreferredTags to EC config persistence Add preferred_tags field to ErasureCodingTaskConfig protobuf with field number 5. Update GetConfigSpec to include preferred_tags field in the UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize from protobuf with defensive copy to prevent external mutation. This allows EC preferred tags to be persisted and restored across worker restarts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add defensive copy for Tags slice in DiskLocation Copy the incoming tags slice in NewDiskLocation instead of storing by reference. This prevents external callers from mutating the DiskLocation.Tags slice after construction, improving encapsulation and preventing unexpected changes to disk metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add doc comment to buildCandidateSets method Document the tiered candidate selection and fallback behavior. Explain that for a planner with preferredTags, it accumulates disks matching each tag in order into progressively larger tiers, emits a candidate set once a tier reaches shardsNeeded, and finally falls back to the full candidates set if preferred-tag tiers are insufficient. This clarifies the intended semantics for future maintainers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Apply final PR review fixes 1. Update parseVolumeTags to replicate single tag entry to all folders instead of leaving some folders with nil tags. This prevents nil pointer dereferences when processing folders without explicit tags. 2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match the pattern used in FromTaskPolicy, preventing external mutation of the returned TaskPolicy. 3. Add clarifying comment in buildCandidateSets explaining that the shardsNeeded <= 0 branch is a defensive check for direct callers, since selectDestinations guarantees shardsNeeded > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nil pointer dereference in parseVolumeTags Ensure all folder tags are initialized to either normalized tags or empty slices, not nil. When multiple tag entries are provided and there are more folders than entries, remaining folders now get empty slices instead of nil, preventing nil pointer dereference in downstream code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NormalizeTagList to return empty slice instead of nil Change NormalizeTagList to always return a non-nil slice. When all tags are empty or whitespace after normalization, return an empty slice instead of nil. This prevents nil pointer dereferences in downstream code that expects a valid (possibly empty) slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nil safety check for v.tags pointer Add a safety check to handle the case where v.tags might be nil, preventing a nil pointer dereference. If v.tags is nil, use an empty string instead. This is defensive programming to prevent panics in edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add volume.tags flag to weed server and weed mini commands Add the volume.tags CLI option to both the 'weed server' and 'weed mini' commands. This allows users to specify disk tags when running the combined server modes, just like they can with 'weed volume'. The flag uses the same format and description as the volume command: comma-separated tag groups per data dir with ':' separators (e.g. fast:ssd,archive). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-01 10:22:00 -08:00
Chris Lu	2dd3944819	Respect -minFreeSpace during ec.decode (#8467 ) * shell: add ec.decode ignoreMinFreeSpace flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: respect minFreeSpace in ec.decode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: rename ec.decode minFreeSpace flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: error when ec.decode has no shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: select ec.decode target with zero shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * shell: adjust free counts across ec.decode Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * unused * Update weed/shell/command_ec_decode.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-02-27 23:54:30 -08:00
Chris Lu	7354fa87f1	refactor ec shard distribution (#8465 ) * refactor ec shard distribution * fix shard assignment merge and mount errors * fix mount error aggregation scope * make WithFields compatible and wrap errors	2026-02-27 17:21:13 -08:00
Chris Lu	e8946e59ca	fix(s3api): correctly extract host header port in extractHostHeader (#8464 ) * Prevent concurrent maintenance tasks per volume * fix panic * fix(s3api): correctly extract host header port when X-Forwarded-Port is present * test(s3api): add test cases for misreported X-Forwarded-Port	2026-02-27 13:41:45 -08:00
Chris Lu	b9e560dcf1	Prevent overlapping maintenance tasks per volume (#8463 ) * Prevent concurrent maintenance tasks per volume * fix panic	2026-02-27 13:14:52 -08:00
Chris Lu	4f647e1036	Worker set its working directory (#8461 ) * set working directory * consolidate to worker directory * working directory * correct directory name * refactoring to use wildcard matcher * simplify * cleaning ec working directory * fix reference * clean * adjust test	2026-02-27 12:22:21 -08:00
Chris Lu	cf3b7b3ad7	adjust weight	2026-02-26 19:46:38 -08:00
Chris Lu	09a1ace53a	adjust display name	2026-02-26 19:33:52 -08:00
Chris Lu	c73e65ad5e	Add customizable plugin display names and weights (#8459 ) * feat: add customizable plugin display names and weights - Add weight field to JobTypeCapability proto message - Modify ListKnownJobTypes() to return JobTypeInfo with display names and weights - Modify ListPluginJobTypes() to return JobTypeInfo instead of string - Sort plugins by weight (descending) then alphabetically - Update admin API to return enriched job type metadata - Update plugin UI template to display names instead of IDs - Consolidate API by reusing existing function names instead of suffixed variants * perf: optimize plugin job type capability lookup and add null-safe parsing - Pre-calculate job type capabilities in a map to reduce O(nm) nested loops to O(n+m) lookup time in ListKnownJobTypes() - Add parseJobTypeItem() helper function for null-safe job type item parsing - Refactor plugin.templ to use parseJobTypeItem() in all job type access points (hasJobType, applyInitialNavigation, ensureActiveNavigation, renderTopTabs) - Deterministic capability resolution by using first worker's capability templ * refactor: use parseJobTypeItem helper consistently in plugin.templ Replace duplicated job type extraction logic at line 1296-1298 with parseJobTypeItem() helper function for consistency and maintainability. * improve: prefer richer capability metadata and add null-safety checks - Improve capability selection in ListKnownJobTypes() to prefer capabilities with non-empty DisplayName and higher Weight across all workers instead of first-wins approach. Handles mixed-version clusters better. - Add defensive null checks in renderJobTypeSummary() to safely access parseJobTypeItem() result before property access - Ensures malformed or missing entries won't break the rendering pipeline * fix: preserve existing DisplayName when merging capabilities Fix capability merge logic to respect existing DisplayName values: - If existing has DisplayName but candidate doesn't, preserve existing - If existing doesn't have DisplayName but candidate does, use candidate - Only use Weight comparison if DisplayName status is equal - Prevents higher-weight capabilities with empty DisplayName from overriding capabilities with non-empty DisplayName	2026-02-26 19:20:48 -08:00
Chris Lu	8eba7ba5b2	feat: drop table location mapping support (#8458 ) * feat: drop table location mapping support Disable external metadata locations for S3 Tables and remove the table location mapping index entirely. Table metadata must live under the table bucket paths, so lookups no longer use mapping directories. Changes: - Remove mapping lookup and cache from bucket path resolution - Reject metadataLocation in CreateTable and UpdateTable - Remove mapping helpers and tests * compile * refactor * fix: accept metadataLocation in S3 Tables API requests We removed the external table location mapping feature, but still need to accept and store metadataLocation values from clients like Trino. The mapping feature was an internal implementation detail that mapped external buckets to internal table paths. The metadataLocation field itself is part of the S3 Tables API and should be preserved. * fmt * fix: handle MetadataLocation in UpdateTable requests Mirror handleCreateTable behavior by updating metadata.MetadataLocation when req.MetadataLocation is provided in UpdateTable requests. This ensures table metadata location can be updated, not just set during creation.	2026-02-26 16:36:24 -08:00
Chris Lu	641351da78	fix: table location mappings to /etc/s3tables (#8457 ) * fix: move table location mappings to /etc/s3tables to avoid bucket name validation Fixes #8362 - table location mappings were stored under /buckets/.table-location-mappings which fails bucket name validation because it starts with a dot. Moving them to /etc/s3tables resolves the migration error for upgrades. Changes: - Table location mappings now stored under /etc/s3tables - Ensure parent /etc directory exists before creating /etc/s3tables - Normal writes go to new location only (no legacy compatibility) - Removed bucket name validation exception for old location * refactor: simplify lookupTableLocationMapping by removing redundant mappingPath parameter The mappingPath function parameter was redundant as the path can be derived from mappingDir and bucket using path.Join. This simplifies the code and reduces the risk of path mismatches between parameters.	2026-02-26 15:35:13 -08:00
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 14:20:42 -08:00
Chris Lu	9b6fc49946	Chart createBuckets config #8368 : Add TTL, Object Lock, and Versioning support (#8375 ) * Chart createBuckets config #8368: Add TTL, Object Lock, and Versioning support * Update weed/shell/command_s3_bucket_versioning.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * address comments * address comments * go fmt * fix failures are still treated like “bucket not found” --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-26 11:56:10 -08:00
Lars Lehtonen	0fac6e39ea	weed/s3api/s3tables: fix dropped errors (#8456 ) * weed/s3api/s3tables: fix dropped errors * enhance errors * fail fast when listing tables --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 11:12:10 -08:00
Chris Lu	453310b057	Add plugin worker integration tests for erasure coding (#8450 ) * test: add plugin worker integration harness * test: add erasure coding detection integration tests * test: add erasure coding execution integration tests * ci: add plugin worker integration workflow * test: extend fake volume server for vacuum and balance * test: expand erasure coding detection topologies * test: add large erasure coding detection topology * test: add vacuum plugin worker integration tests * test: add volume balance plugin worker integration tests * ci: run plugin worker tests per worker * fixes * erasure coding: stop after placement failures * erasure coding: record hasMore when early stopping * erasure coding: relax large topology expectations	2026-02-25 22:11:41 -08:00
Chris Lu	d2b92938ee	Make EC detection context aware (#8449 ) * Make EC detection context aware * Update register.go * Speed up EC detection planning * Add tests for EC detection planner * optimizations detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results. erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen. detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit). * use working directory * fix compilation * fix compilation * rename * go vet * fix getenv * address comments, fix error	2026-02-25 18:02:35 -08:00
Chris Lu	7f6e58b791	Fix SFTP file upload failures with JWT filer tokens (#8448 ) * Fix SFTP file upload failures with JWT filer tokens (issue #8425) When JWT authentication is enabled for filer operations via jwt.filer_signing.* configuration, SFTP server file upload requests were rejected because they lacked JWT authorization headers. Changes: - Added JWT signing key and expiration fields to SftpServer struct - Modified putFile() to generate and include JWT tokens in upload requests - Enhanced SFTPServiceOptions with JWT configuration fields - Updated SFTP command startup to load and pass JWT config to service This allows SFTP uploads to authenticate with JWT-enabled filers, consistent with how other SeaweedFS components (S3 API, file browser) handle filer auth. Fixes #8425 * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 14:30:21 -08:00
Chris Lu	b9fa05153a	Allow multipart upload operations when s3:PutObject is authorized (#8445 ) * Allow multipart upload operations when s3:PutObject is authorized Multipart upload is an implementation detail of putting objects, not a separate permission. When a policy grants s3:PutObject, it should implicitly allow: - s3:CreateMultipartUpload - s3:UploadPart - s3:CompleteMultipartUpload - s3:AbortMultipartUpload - s3:ListParts This fixes a compatibility issue where clients like PyArrow that use multipart uploads by default would fail even though the role had s3:PutObject permission. The session policy intersection still applies - both the identity-based policy AND session policy must allow s3:PutObject for multipart operations to work. Implementation: - Added constants for S3 multipart action strings - Added multipartActionSet to efficiently check if action is multipart-related - Updated MatchesAction method to implicitly grant multipart when PutObject allowed * Update weed/s3api/policy_engine/types.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Add s3:ListMultipartUploads to multipart action set Include s3:ListMultipartUploads in the multipartActionSet so that listing multipart uploads is implicitly granted when s3:PutObject is authorized. ListMultipartUploads is a critical part of the multipart upload workflow, allowing clients to query in-progress uploads before completing them. Changes: - Added s3ListMultipartUploads constant definition - Included s3ListMultipartUploads in multipartActionSet initialization - Existing references to multipartActionSet automatically now cover ListMultipartUploads All policy engine tests pass (0.351s execution time) * Refactor: reuse multipart action constants from s3_constants package Remove duplicate constant definitions from policy_engine/types.go and import the canonical definitions from s3api/s3_constants/s3_actions.go instead. This eliminates duplication and ensures a single source of truth for multipart action strings: - ACTION_CREATE_MULTIPART_UPLOAD - ACTION_UPLOAD_PART - ACTION_COMPLETE_MULTIPART - ACTION_ABORT_MULTIPART - ACTION_LIST_PARTS - ACTION_LIST_MULTIPART_UPLOADS All policy engine tests pass (0.350s execution time) * Fix S3_ACTION_LIST_MULTIPART_UPLOADS constant value Move S3_ACTION_LIST_MULTIPART_UPLOADS from bucket operations to multipart operations section and change value from 's3:ListBucketMultipartUploads' to 's3:ListMultipartUploads' to match the action strings used in policy_engine and s3_actions.go. This ensures consistent action naming across all S3 constant definitions. * refactor names * Fix S3 action constant mismatches and MatchesAction early return bug Fix two critical issues in policy engine: 1. S3Actions map had incorrect multipart action mappings: - 'ListMultipartUploads' was 's3:ListMultipartUploads' (should be 's3:ListBucketMultipartUploads') - 'ListParts' was 's3:ListParts' (should be 's3:ListMultipartUploadParts') These mismatches caused authorization checks to fail for list operations 2. CompiledStatement.MatchesAction() had early return bug: - Previously returned true immediately upon first direct action match - This prevented scanning remaining matchers for s3:PutObject permission - Now scans ALL matchers before returning, tracking both direct match and PutObject grant - Ensures multipart operations inherit s3:PutObject authorization even when explicitly requested action doesn't match (e.g., s3:ListMultipartUploadParts) Changes: - Track matchedAction flag to defer Fix two critical issues in policy engine: 1. S3Actions map had incorrect multipart action mappings: - 'ListMultipartUploads' was 's3:ListMultipartUplPer 1. S3Actions map had incorrect multiparAll - 'ListMultipartUploads(0.334s execution time) * Refactor S3Actions map to use s3_constants Replace hardcoded action strings in the S3Actions map with references to canonical S3_ACTION_* constants from s3_constants/s3_action_strings.go. Benefits: - Single source of truth for S3 action values - Eliminates string duplication across codebase - Ensures consistency between policy engine and middleware - Reduces maintenance burden when action strings need updates All policy engine tests pass (0.334s execution time) * Remove unused S3Actions map The S3Actions map in types.go was never referenced anywhere in the codebase. All action mappings are handled by GetActionMappings() in integration.go instead. This removes 42 lines of dead code. * Fix test: reload configuration function must also reload IAM state TestEmbeddedIamAttachUserPolicyRefreshesIAM was failing because the test's reloadConfigurationFunc only updated mockConfig but didn't reload the actual IAM state. When AttachUserPolicy calls refreshIAMConfiguration(), it would use the test's incomplete reload function instead of the real LoadS3ApiConfigurationFromCredentialManager(). Fixed by making the test's reloadConfigurationFunc also call e.iam.LoadS3ApiConfigurationFromCredentialManager() so lookupByIdentityName() sees the updated policy attachments. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 12:31:04 -08:00
Chris Lu	d5e71eb0d8	Revert "s3api: preserve Host header port in signature verification (#8434 )" This reverts commit `98d89ffad7`.	2026-02-25 12:28:44 -08:00
Chris Lu	8c0c7248b3	Refresh IAM config after policy attachments (#8439 ) * Refresh IAM cache after policy attachments * error handling	2026-02-25 10:30:05 -08:00
Chris Lu	e3decd2e3b	go fmt	2026-02-25 10:25:44 -08:00
Chris Lu	7296f51f48	Merge branch 'master' of https://github.com/seaweedfs/seaweedfs	2026-02-25 10:25:31 -08:00
Chris Lu	a3cb7fa8cc	go fmt	2026-02-25 10:25:23 -08:00
Peter Dodd	0910252e31	feat: add statfile remote storage (#8443 ) * feat: add statfile; add error for remote storage misses * feat: statfile implementations for storage providers * test: add unit tests for StatFile method across providers Add comprehensive unit tests for the StatFile implementation covering: - S3: interface compliance and error constant accessibility - Azure: interface compliance, error constants, and field population - GCS: interface compliance, error constants, error detection, and field population Also fix variable shadowing issue in S3 and Azure StatFile implementations where named return parameters were being shadowed by local variable declarations. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address StatFile review feedback - Use errors.New for ErrRemoteObjectNotFound sentinel - Fix S3 HeadObject 404 detection to use awserr.Error code check - Remove hollow field-population tests that tested nothing - Remove redundant stdlib error detection tests - Trim verbose doc comment on ErrRemoteObjectNotFound Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address second round of StatFile review feedback - Rename interface assertion tests to TestXxxRemoteStorageClientImplementsInterface - Delegate readFileRemoteEntry to StatFile in all three providers - Revert S3 404 detection to RequestFailure.StatusCode() check - Fix double-slash in GCS error message format string - Add storage type prefix to S3 error message for consistency Co-authored-by: Cursor <cursoragent@cursor.com> * fix: comments --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-25 10:24:06 -08:00
Chris Lu	b565a0cc86	Adds volume.merge command with deduplication and disk-based backend (#8441 ) * Enhance volume.merge command with deduplication and disk-based backend * Fix copyVolume function call with correct argument order and missing bool parameter * Revert "Fix copyVolume function call with correct argument order and missing bool parameter" This reverts commit 7b4a190643576fec11f896b26bcad03dd02da2f7. * Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures * Optimize memory usage with watermark approach for duplicate detection * Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging * Replace temporary file with in-memory buffer for needle blob serialization * test(volume.merge): Add comprehensive unit and integration tests Add 7 unit tests covering: - Ordering by timestamp - Cross-stream duplicate deduplication - Empty stream handling - Complex multi-stream deduplication - Single stream passthrough - Large needle ID support - LastModified fallback when timestamp unavailable Add 2 integration validation tests: - TestMergeWorkflowValidation: Documents 9-stage merge workflow - TestMergeEdgeCaseHandling: Validates 10 edge case handling All tests passing (9/9) * fix(volume.merge): Use time window for deduplication to handle clock skew The same needle ID can have different timestamps on different servers due to clock skew and replication lag. Needles with the same ID within a 5-second time window are now treated as duplicates (same write with timestamp variance). Key changes: - Add mergeDeduplicationWindowNs constant (5 seconds) - Replace exact timestamp matching with time window comparison - Use windowInitialized flag to properly detect window transitions - Add TestMergeNeedleStreamsTimeWindowDeduplication test This ensures that replicated writes with slight timestamp differences are properly deduplicated during merge, while separate updates to the same file ID (outside the window) are preserved. All tests passing (10/10) * test: Add volume.merge integration tests with 5 comprehensive test cases * test: integration tests for volume.merge command * Fix integration tests: use TripleVolumeCluster for volume.merge testing - Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers - Rebuilt weed binary with volume.merge command compiled in - Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster - Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd - All 5 integration tests now pass: - TestVolumeMergeBasic - TestVolumeMergeReadonly - TestVolumeMergeRestore - TestVolumeMergeTailNeedles - TestVolumeMergeDivergentReplicas * Refactor test framework: use parameterized server count instead of hardcoded - Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter - Replaced hardcoded volumePort0/1/2 with slices for flexible server count - Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3) - Made directory creation, port allocation, and server startup loop-based - Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count - All 5 integration tests continue to pass with new parameterized cluster framework - Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly * Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster - Made DualVolumeCluster a type alias for MultiVolumeCluster - Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2) - Removed duplicate code from cluster_dual.go (now just 17 lines) - All existing tests using StartDualVolumeCluster continue to work without changes - Backward compatible: existing code continues to use the old function signatures - Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster - Enables unified cluster management across all test suites * Address PR review comments: improve error handling and clean up code - Replace parse error swallow with proper error return - Log cleanup and restoration errors instead of silently discarding them - Remove unused offset field from memoryBackendFile struct - Fix WriteAt buffer truncation bug to preserve trailing bytes - All unit tests passing (10/10) - Code compiles successfully * Fix PR review findings: test improvements and code quality - Add timeout to runWeedShell to prevent hanging - Add server 1 readonly status verification in tests - Assert merge fails when replicas writable (not just log output) - Replace sleep with polling for writable restoration check - Fix WriteAt stale data snapshot bug in memoryBackendFile - Fix startVolume error logging to show current server log - Fix volumePubPorts double assignment in port allocation - Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows - Fix misleading dedup window comment Unit tests: 10/10 passing Binary: Compiles successfully * Fix test assumption: merge command marks volumes readonly automatically TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the merge command is designed to mark volumes readonly as part of its operation. Fixed test to verify merge succeeds on writable volumes and properly restores writable state afterward. Removed redundant Test 2 code that duplicated the new behavior. * fmt * Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates The dedup map previously used only NeedleId as key, causing same-stream overwrites to be incorrectly skipped as duplicates. Changed to track which stream first processed each needle ID in the current window: - Cross-stream duplicates (same ID from different streams, within window) are skipped - Same-stream duplicates (overwrites from same stream) are kept - Map now stores: needleId -> streamIndex of first occurrence in window Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream overwrites are preserved while cross-stream duplicates are skipped. All unit tests passing (11/11) Binary compiles successfully	2026-02-25 10:12:09 -08:00
Chris Lu	da4edb5fe6	Fix live volume move tail timestamp (#8440 ) * Improve move tail timestamp * Add move tail timestamp integration test * Simulate traffic during move	2026-02-24 20:07:26 -08:00
Chris Lu	27e763222a	Fix inline user policy retrieval (#8437 ) * Fix IAM inline user policy retrieval * fmt * Persist inline user policies to avoid loss on server restart - Use s3ApiConfig.PutPolicies/GetPolicies for persistent storage instead of non-persistent global map - Remove unused global policyDocuments map - Update PutUserPolicy to store policies in persistent storage - Update GetUserPolicy to read from persistent storage - Update DeleteUserPolicy to clean up persistent storage - Add mock IamS3ApiConfig for testing - Improve test to verify policy statements are not merged or lost * Fix inline policy key collision and action aggregation * Improve error handling and optimize inline policy management - GetUserPolicy: Propagate GetPolicies errors instead of silently falling through - DeleteUserPolicy: Return error immediately on GetPolicies failure - computeAggregatedActionsForUser: Add optional Policies parameter for I/O optimization - PutUserPolicy: Reuse fetched policies to avoid redundant GetPolicies call - Improve logging with clearer messages about best-effort aggregation - Update test to use exact action string matching instead of substring checks All 15 tests pass with no regressions. * Add per-user policy index for O(1) lookup performance - Extend Policies struct with InlinePolicies map[userName]map[policyName] - Add getOrCreateUserPolicies() helper for safe user map management - Update computeAggregatedActionsForUser to use direct user map access - Update PutUserPolicy, GetUserPolicy, DeleteUserPolicy for new structure - Performance: O(1) user lookups instead of O(all_policies) iteration - Eliminates string prefix matching loop - All tests pass; backward compatible with managed policies * Fix DeleteUserPolicy to validate user existence before storage modification Refactor DeleteUserPolicy handler to check user existence early: - First iterate s3cfg.Identities to verify user exists - Return NoSuchEntity error immediately if user not found - Only then proceed with GetPolicies and policy deletion - Capture reference to found identity for direct update This ensures consistency: if user doesn't exist, storage is not modified. Previously the code would delete from storage first and check identity afterwards, potentially leaving orphaned policies. Benefits: - Fail-fast validation before storage operations - No orphaned policies in storage if validation fails - Atomic from logical perspective - Direct identity reference eliminates redundant loop - All error paths preserved and tested All 15 tests pass; no functional changes to behavior. * Fix GetUserPolicy to return NoSuchEntity when inline policy not found When InlinePolicies[userName] exists but does not contain policyName, the handler now immediately returns NoSuchEntity error instead of falling through to the reconstruction logic. Changes: - Add else clause after userPolicies[policyName] lookup - Return IamError(NoSuchEntityException, "policy not found") immediately - Prevents incorrect fallback to reconstructing ident.Actions - Ensures explicit error when policy explicitly doesn't exist This improves error semantics: - Policy exists in stored inline policies → return error (not reconstruct) - Policy doesn't exist in stored inline policies → try reconstruction (backward compat) - Storage error → return service failure error All 15 tests pass; no behavioral changes to existing error or success paths.	2026-02-24 18:01:17 -08:00
Anton	427c975ff3	fix(plugin/worker): make VacuumHandler report MaxExecutionConcurrency from worker startup flag (#8435 ) * fix(plugin/worker): make VacuumHandler report MaxExecutionConcurrency from worker startup flag Previously, MaxExecutionConcurrency was hardcoded to 2 in VacuumHandler.Capability(). The scheduler's schedulerWorkerExecutionLimit() takes the minimum of the UI-configured PerWorkerExecutionConcurrency and the worker-reported capability limit, so the hardcoded value silently capped each worker to 2 concurrent vacuum executions regardless of the --max-execute flag passed at worker startup. Pass maxExecutionConcurrency into NewVacuumHandler() and wire it through buildPluginWorkerHandler/buildPluginWorkerHandlers so the capability reflects the actual worker configuration. The default falls back to 2 when the value is unset or zero. * Update weed/command/worker_runtime.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Anton Ustyugov <anton@devops> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-24 15:13:00 -08:00
Chris Lu	ce4940b441	fix filer link on dashboard	2026-02-24 14:21:27 -08:00
Anton	b4c7d42a06	fix(admin): release mutex before disk I/O in maintenance queue; remove per-request LoadAllTaskStates (#8433 ) * fix(admin): release mutex before disk I/O in maintenance queue saveTaskState performs synchronous BoltDB writes. Calling it while holding mq.mutex.Lock() in AddTask, GetNextTask, and CompleteTask blocks all readers (GetTasks via RLock) for the full disk write duration on every task state change. During a maintenance scan AddTasksFromResults calls AddTask for every volume — potentially hundreds of times — meaning the write lock is held almost continuously. The HTTP handler for /maintenance calls GetTasks which blocks on RLock, exceeding the 30s timeout and returning 408 to the browser. Fix: update in-memory state (mq.tasks, mq.pendingTasks) under the lock as before, then unlock before calling saveTaskState. In-memory state is the authoritative source; persistence is crash-recovery only and does not require lock protection during the write. * fix(admin): add mutex to ConfigPersistence to synchronize tasks/ filesystem ops saveTaskState is now called outside mq.mutex, meaning SaveTaskState, LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks can be invoked concurrently from multiple goroutines. ConfigPersistence had no internal synchronization, creating races on the tasks/ directory: - concurrent os.WriteFile + os.ReadFile on the same .pb file could yield a partial read and unmarshal error - LoadAllTaskStates (ReadDir + per-file ReadFile) could see a directory entry for a file being written or deleted concurrently - CleanupCompletedTasks (LoadAllTaskStates + DeleteTaskState) could race with SaveTaskState on the same file Fix: add tasksMu sync.Mutex to ConfigPersistence, acquired at the top of SaveTaskState, LoadTaskState, LoadAllTaskStates, DeleteTaskState, and CleanupCompletedTasks. Extract private Locked helpers so that CleanupCompletedTasks (which holds tasksMu) can call them internally without deadlocking. --------- Co-authored-by: Anton Ustyugov <anton@devops>	2026-02-24 13:41:41 -08:00
Chris Lu	cba69f4593	Update layout_templ.go	2026-02-24 13:22:12 -08:00
Chris Lu	98d89ffad7	s3api: preserve Host header port in signature verification (#8434 ) Avoid stripping default ports (80/443) from the Host header in extractHostHeader. This fixes SignatureDoesNotMatch errors when SeaweedFS is accessed via a proxy (like Kong Ingress) that explicitly includes the port in the Host header or X-Forwarded-Host, which S3 clients sign. Also cleaned up unused variables and logic after refactoring.	2026-02-24 13:09:40 -08:00
Xiao Wei	9fa95dd2c6	fix: unload leveldb not take effect (#8431 )	2026-02-24 07:32:13 -08:00

1 2 3 4 5 ...

8381 Commits