seaweedFS

Author	SHA1	Message	Date
Chris Lu	efbed39e25	S3: map canned ACL to file permissions and add configurable default file mode (#8886 ) * S3: map canned ACL to file permissions and add configurable default file mode S3 uploads were hardcoded to 0660 regardless of ACL headers. Now the X-Amz-Acl header maps to Unix file permissions per-object: - public-read, authenticated-read, bucket-owner-read → 0644 - public-read-write → 0666 - private, bucket-owner-full-control → 0660 Also adds -defaultFileMode / -s3.defaultFileMode flag to set a server-wide default when no ACL header is present. Closes #8874 * Address review feedback for S3 file mode feature - Extract hardcoded 0660 to defaultFileMode constant - Change parseDefaultFileMode to return error instead of calling Fatalf - Add -s3.defaultFileMode flag to filer.go and mini.go (was missing) - Add doc comment to S3Options about updating all four flag sites - Add TestResolveFileMode with 10 test cases covering ACL mapping, server default, and priority ordering	2026-04-02 11:51:54 -07:00
Chris Lu	7d5cbfd547	s3: support s3:x-amz-server-side-encryption policy condition (#8806 ) * s3: support s3:x-amz-server-side-encryption policy condition (#7680) - Normalize x-amz-server-side-encryption header values to canonical form (aes256 → AES256, aws:kms mixed-case → aws:kms) so StringEquals conditions work regardless of client capitalisation - Exempt UploadPart and UploadPartCopy from SSE Null conditions: these actions inherit SSE from the initial CreateMultipartUpload request and do not re-send the header, so Deny/Null("true") should not block them - Add sse_condition_test.go covering StringEquals, Null, case-insensitive normalisation, and multipart continuation action exemption * s3: address review comments on SSE condition support - Replace "inherited" sentinel in injectSSEForMultipart with "AES256" so that StringEquals/Null conditions evaluate against a meaningful value; add TODO noting that KMS multipart uploads need the actual algorithm looked up from the upload state - Rewrite TestSSECaseInsensitiveNormalization to drive normalisation through EvaluatePolicyForRequest with a real http.Request so regressions in the production code path are caught; split into AES256 and aws:kms variants to cover both normalisation branches s3: plumb real inherited SSE from multipart upload state into policy eval Instead of injecting a static "AES256" sentinel for UploadPart/UploadPartCopy, look up the actual SSE algorithm from the stored CreateMultipartUpload entry and pass it through the evaluation chain. Changes: - PolicyEvaluationArgs gains InheritedSSEAlgorithm string; set by the BucketPolicyEngine wrapper for multipart continuation actions - injectSSEForMultipart(conditions, inheritedSSE) now accepts the real algorithm; empty string means no SSE → Null("true") fires correctly - IsMultipartContinuationAction exported so the s3api wrapper can use it - BucketPolicyEngine gets a MultipartSSELookup callback (set by S3ApiServer) that fetches the upload entry and reads SeaweedFSSSEKMSKeyID / SeaweedFSSSES3Encryption to determine the algorithm - S3ApiServer.getMultipartSSEAlgorithm implements the lookup via getEntry - Tests updated: three multipart cases (AES256, aws:kms, no-SSE-must-deny) plus UploadPartCopy coverage	2026-03-27 23:15:01 -07:00
Chris Lu	0adb78bc6b	s3api: make conditional mutations atomic and AWS-compatible (#8802 ) * s3api: serialize conditional write finalization * s3api: add conditional delete mutation checks * s3api: enforce destination conditions for copy * s3api: revalidate multipart completion under lock * s3api: rollback failed put finalization hooks * s3api: report delete-marker version deletions * s3api: fix copy destination versioning edge cases * s3api: make versioned multipart completion idempotent * test/s3: cover conditional mutation regressions * s3api: rollback failed copy version finalization * s3api: resolve suspended delete conditions via latest entry * s3api: remove copy test null-version injection * s3api: reject out-of-order multipart completions * s3api: preserve multipart replay version metadata * s3api: surface copy destination existence errors * s3api: simplify delete condition target resolution * test/s3: make conditional delete assertions order independent * test/s3: add distributed lock gateway integration * s3api: fail closed multipart versioned completion * s3api: harden copy metadata and overwrite paths * s3api: create delete markers for suspended deletes * s3api: allow duplicate multipart completion parts	2026-03-27 19:22:26 -07:00
Chris Lu	992db11d2b	iam: add IAM group management (#8560 ) * iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742). * iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration. * iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser. * iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction. * iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects. * iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies. * iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers. * iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads. * admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation. * test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow. * iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation * iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed. * iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries. * iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format. * iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response. * iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies. * iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop. * test: use t.Cleanup for bucket cleanup in group policy test * iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500. * admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation. * iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration * admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies * iam: short-circuit UpdateUser when newUserName equals current name * iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors. * ci: add weed/pb/** to S3 IAM test trigger paths * test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation. * fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation. * feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status. * fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework. * feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity. * fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element. * fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename. * fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename. * fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window. * fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket. * fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization. * fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies. * fix: use errors.Is for sql.ErrNoRows comparison in postgres group store * docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol. * fix: remove unused policyNameFromArn and strings import * fix: update service account ParentUser on user rename When renaming a user via UpdateUser, also update ParentUser references in service accounts to prevent them from becoming orphaned after the next configuration reload. * fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500. * fix: use admin S3 client for bucket cleanup in enforcement test The user S3 client may lack permissions by cleanup time since the user is removed from the group in an earlier subtest. Use the admin S3 client to ensure bucket and object cleanup always succeeds. * fix: add nil guard for group param in propagating store log calls Prevent potential nil dereference when logging group.Name in CreateGroup and UpdateGroup of PropagatingCredentialStore. * fix: validate Disabled field in UpdateGroup handlers Reject values other than "true" or "false" with InvalidInputException instead of silently treating them as false. * fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration Previously the merge started with empty group maps, dropping any static-file groups. Now seeds from existing iam.groups before overlaying dynamic config, and builds the reverse index after merging to avoid stale entries from overridden groups. * fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading Replace direct equality (==) with errors.Is() to correctly match wrapped errors, consistent with the rest of the codebase. * fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes. * fix: log cleanup errors in group integration tests Replace fire-and-forget cleanup calls with error-checked versions that log failures via t.Logf for debugging visibility. * fix: prevent duplicate group test runs in CI matrix The basic lane's -run "TestIAM" regex also matched TestIAMGroup* tests, causing them to run in both the basic and group lanes. Replace with explicit test function names. * fix: add GIN index on groups.members JSONB for membership lookups Without this index, ListGroupsForUser and membership queries require full table scans on the groups table. * fix: handle cross-directory moves in IAM config subscription When a file is moved out of an IAM directory (e.g., /etc/iam/groups), the dir variable was overwritten with NewParentPath, causing the source directory change to be missed. Now also notifies handlers about the source directory for cross-directory moves. * fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards. * fix: merge groups by name instead of blind append during filer load Match the identity loader's merge behavior: find existing group by name and replace, only append when no match exists. Prevents duplicates when legacy and multi-file configs overlap. * fix: check DeleteEntry response error when cleaning obsolete group files Capture and log resp.Error from filer DeleteEntry calls during group file cleanup, matching the pattern used in deleteGroupFile. * fix: verify source user exists before no-op check in UpdateUser Reorder UpdateUser to find the source identity first and return NoSuchEntityException if not found, before checking if the rename is a no-op. Previously a non-existent user renamed to itself would incorrectly return success. * fix: update service account parent refs on user rename in embedded IAM The embedded IAM UpdateUser handler updated group membership but not service account ParentUser fields, unlike the standalone handler. * fix: replay source-side events for all handlers on cross-dir moves Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for the source directory during cross-directory moves, so all watchers can clear caches for the moved-away resource. * fix: don't seed mergedGroups from existing iam.groups in merge Groups are always dynamic (from filer), never static (from s3.config). Seeding from iam.groups caused stale deleted groups to persist. Now only uses config.Groups from the dynamic filer config. * fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect Register t.Cleanup for the created user so it gets cleaned up even if the test fails before the inline DeleteUser call. * fix: assert UpdateGroup HTTP status in disabled group tests Add require.Equal checks for 200 status after UpdateGroup calls so the test fails immediately on API errors rather than relying on the subsequent Eventually timeout. * fix: trim whitespace from group name in filer store operations Trim leading/trailing whitespace from group.Name before validation in CreateGroup and UpdateGroup to prevent whitespace-only filenames. Also merge groups by name during multi-file load to prevent duplicates. * fix: add nil/empty group validation in gRPC store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics and invalid persistence. * fix: add nil/empty group validation in postgres store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil member access and empty-name row inserts. * fix: add name collision check in embedded IAM UpdateUser The embedded IAM handler renamed users without checking if the target name already existed, unlike the standalone handler. * fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500. * fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found". * fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency. * fix: add nil/empty group validation in memory store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil pointer dereference on map access. * fix: reorder embedded IAM UpdateUser to verify source first Find the source identity before checking for collisions, matching the standalone handler's logic. Previously a non-existent user renamed to an existing name would get EntityAlreadyExists instead of NoSuchEntity. * fix: handle same-directory renames in metadata subscription Replay a delete event for the old entry name during same-directory renames so handlers like onBucketMetadataChange can clean up stale state for the old name. * fix: abort GetGroups on non-ErrGroupNotFound errors Only skip groups that return ErrGroupNotFound. Other errors (e.g., transient backend failures) now abort the handler and return the error to the caller instead of silently producing partial results. * fix: add aria-label and title to icon-only group action buttons Add accessible labels to View and Delete buttons so screen readers and tooltips provide meaningful context. * fix: validate group name in saveGroup to prevent invalid filenames Trim whitespace and reject empty names before writing group JSON files, preventing creation of files like ".json". * fix: add /etc/iam/groups to filer subscription watched directories The groups directory was missing from the watched directories list, so S3 servers in a cluster would not detect group changes made by other servers via filer. The onIamConfigChange handler already had code to handle group directory changes but it was never triggered. * add direct gRPC propagation for group changes to S3 servers Groups now have the same dual propagation as identities and policies: direct gRPC push via propagateChange + async filer subscription. - Add PutGroup/RemoveGroup proto messages and RPCs - Add PutGroup/RemoveGroup in-memory cache methods on IAM - Add PutGroup/RemoveGroup gRPC server handlers - Update PropagatingCredentialStore to call propagateChange on group mutations * reduce log verbosity for config load summary Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof to avoid noisy output on every config reload. * admin: show user groups in view and edit user modals - Add Groups field to UserDetails and populate from credential manager - Show groups as badges in user details view modal - Add group management to edit user modal: display current groups, add to group via dropdown, remove from group via badge x button * fix: remove duplicate showAlert that broke modal-alerts.js admin.js defined showAlert(type, message) which overwrote the modal-alerts.js version showAlert(message, type), causing broken unstyled alert boxes. Remove the duplicate and swap all callers in admin.js to use the correct (message, type) argument order. * fix: unwrap groups API response in edit user modal The /api/groups endpoint returns {"groups": [...]}, not a bare array. * Update object_store_users_templ.go * test: assert AccessDenied error code in group denial tests Replace plain assert.Error checks with awserr.Error type assertion and AccessDenied code verification, matching the pattern used in other IAM integration tests. * fix: propagate GetGroups errors in ShowGroups handler getGroupsPageData was swallowing errors and returning an empty page with 200 status. Now returns the error so ShowGroups can respond with a proper error status. * fix: reject AttachGroupPolicy when credential manager is nil Previously skipped policy existence validation when credentialManager was nil, allowing attachment of nonexistent policies. Now returns a ServiceFailureException error. * fix: preserve groups during partial MergeS3ApiConfiguration updates UpsertIdentity calls MergeS3ApiConfiguration with a partial config containing only the updated identity (nil Groups). This was wiping all in-memory group state. Now only replaces groups when config.Groups is non-nil (full config reload). * fix: propagate errors from group lookup in GetObjectStoreUserDetails ListGroups and GetGroup errors were silently ignored, potentially showing incomplete group data in the UI. * fix: use DOM APIs for group badge remove button to prevent XSS Replace innerHTML with onclick string interpolation with DOM createElement + addEventListener pattern. Also add aria-label and title to the add-to-group button. * fix: snapshot group policies under RLock to prevent concurrent map access evaluateIAMPolicies was copying the map reference via groupMap := iam.groups under RLock then iterating after RUnlock, while PutGroup mutates the map in-place. Now copies the needed policy names into a slice while holding the lock. * fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers Match the nil guard pattern used by PutPolicy/DeletePolicy to prevent nil pointer dereference when IAM is not initialized.	2026-03-09 11:54:32 -07:00
Chris Lu	d89eb8267f	s3: use url.PathUnescape for X-Amz-Copy-Source header (#8545 ) * s3: use url.PathUnescape for X-Amz-Copy-Source header (#8544) The X-Amz-Copy-Source header is a URL-encoded path, not a query string. Using url.QueryUnescape incorrectly converts literal '+' characters to spaces, which can cause object key mismatches during copy operations. Switch to url.PathUnescape in CopyObjectHandler, CopyObjectPartHandler, and pathToBucketObjectAndVersion to correctly handle special characters like '!', '+', and other RFC 3986 sub-delimiters that S3 clients may percent-encode (e.g. '!' as %21). * s3: add path validation to CopyObjectPartHandler CopyObjectPartHandler was missing the validateTableBucketObjectPath checks that CopyObjectHandler has, allowing potential path traversal in the source bucket/object of copy part requests. * s3: fix case-sensitive HeadersRegexp for copy source routing The HeadersRegexp for X-Amz-Copy-Source used `%2F` which only matched uppercase hex encoding. RFC 3986 allows both `%2F` and `%2f`, so clients sending lowercase percent-encoding would bypass the copy handler and hit PutObjectHandler instead. Add (?i) flag for case-insensitive matching. Also add test coverage for the versionId branch in pathToBucketObjectAndVersion and for lowercase %2f routing.	2026-03-07 11:10:02 -08:00
Chris Lu	540fc97e00	s3/iam: reuse one request id per request (#8538 ) * request_id: add shared request middleware * s3err: preserve request ids in responses and logs * iam: reuse request ids in XML responses * sts: reuse request ids in XML responses * request_id: drop legacy header fallback * request_id: use AWS-style request id format * iam: fix AWS-compatible XML format for ErrorResponse and field ordering - ErrorResponse uses bare <RequestId> at root level instead of <ResponseMetadata> wrapper, matching the AWS IAM error response spec - Move CommonResponse to last field in success response structs so <ResponseMetadata> serializes after result elements - Add randomness to request ID generation to avoid collisions - Add tests for XML ordering and ErrorResponse format * iam: remove duplicate error_response_test.go Test is already covered by responses_test.go. * address PR review comments - Guard against typed nil pointers in SetResponseRequestID before interface assertion (CodeRabbit) - Use regexp instead of strings.Index in test helpers for extracting request IDs (Gemini) * request_id: prevent spoofing, fix nil-error branch, thread reqID to error writers - Ensure() now always generates a server-side ID, ignoring client-sent x-amz-request-id headers to prevent request ID spoofing. Uses a private context key (contextKey{}) instead of the header string. - writeIamErrorResponse in both iamapi and embedded IAM now accepts reqID as a parameter instead of calling Ensure() internally, ensuring a single request ID per request lifecycle. - The nil-iamError branch in writeIamErrorResponse now writes a 500 Internal Server Error response instead of returning silently. - Updated tests to set request IDs via context (not headers) and added tests for spoofing prevention and context reuse. * sts: add request-id consistency assertions to ActionInBody tests * test: update admin test to expect server-generated request IDs The test previously sent a client x-amz-request-id header and expected it echoed back. Since Ensure() now ignores client headers to prevent spoofing, update the test to verify the server returns a non-empty server-generated request ID instead. * iam: add generic WithRequestID helper alongside reflection-based fallback Add WithRequestID[T] that uses generics to take the address of a value type, satisfying the pointer receiver on SetRequestId without reflection. The existing SetResponseRequestID is kept for the two call sites that operate on interface{} (from large action switches where the concrete type varies at runtime). Generics cannot replace reflection there since Go cannot infer type parameters from interface{}. * Remove reflection and generics from request ID setting Call SetRequestId directly on concrete response types in each switch branch before boxing into interface{}, eliminating the need for WithRequestID (generics) and SetResponseRequestID (reflection). * iam: return pointer responses in action dispatch * Fix IAM error handling consistency and ensure request IDs on all responses - UpdateUser/CreatePolicy error branches: use writeIamErrorResponse instead of s3err.WriteErrorResponse to preserve IAM formatting and request ID - ExecuteAction: accept reqID parameter and generate one if empty, ensuring every response carries a RequestId regardless of caller * Clean up inline policies on DeleteUser and UpdateUser rename DeleteUser: remove InlinePolicies[userName] from policy storage before removing the identity, so policies are not orphaned. UpdateUser: move InlinePolicies[userName] to InlinePolicies[newUserName] when renaming, so GetUserPolicy/DeleteUserPolicy work under the new name. Both operations persist the updated policies and return an error if the storage write fails, preventing partial state.	2026-03-06 15:22:39 -08:00
Chris Lu	10a30a83e1	s3api: add GetObjectAttributes API support (#8504 ) * s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in 55c409dec. Add unit test for ResolveS3Action verifying that the attributes query parameter takes precedence over versionId, so GET ?attributes&versionId resolves to s3:GetObjectAttributes. This covers the fix in b92c61c95. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: guard negative chunk indices and rename PartsCount field Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in buildObjectAttributesParts to prevent panics from corrupted metadata with negative index values. Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match the AWS SDK v2 Go field naming convention, while preserving the XML element name "PartsCount" via the struct tag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: reject malformed max-parts and part-number-marker headers Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain non-integer or negative values, matching ListObjectPartsHandler behavior. Previously these were silently ignored with defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 12:52:09 -08:00
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 14:20:42 -08:00
Chris Lu	d1fecdface	Fix IAM defaults and S3Tables IAM regression (#8374 ) * Fix IAM defaults and s3tables identities * Refine S3Tables identity tests * Clarify identity tests	2026-02-18 18:20:03 -08:00
Chris Lu	eda4a000cc	Revert "Fix IAM defaults and s3tables identities" This reverts commit `bf71fe0039`.	2026-02-18 16:23:13 -08:00
Chris Lu	bf71fe0039	Fix IAM defaults and s3tables identities	2026-02-18 16:21:48 -08:00
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	2026-02-16 13:59:13 -08:00
Chris Lu	e863767ac7	cleanup(iam): final removal of temporary debug logging from STS and S3 API	2026-02-14 22:15:06 -08:00
Chris Lu	cf8e383e1e	STS: Fallback to Caller Identity when RoleArn is missing in AssumeRole (#8345 ) * s3api: make RoleArn optional in AssumeRole * s3api: address PR feedback for optional RoleArn * iam: add configurable default role for AssumeRole * S3 STS: Use caller identity when RoleArn is missing - Fallback to PrincipalArn/Context in AssumeRole if RoleArn is empty - Handle User ARNs in prepareSTSCredentials - Fix PrincipalArn generation for env var credentials * Test: Add unit test for AssumeRole caller identity fallback * fix(s3api): propagate admin permissions to assumed role session when using caller identity fallback * STS: Fix is_admin propagation and optimize IAM policy evaluation for assumed roles - Restore is_admin propagation via JWT req_ctx - Optimize IsActionAllowed to skip role lookups for admin sessions - Ensure session policies are still applied for downscoping - Remove debug logging - Fix syntax errors in cleanup * fix(iam): resolve STS policy bypass for admin sessions - Fixed IsActionAllowed in iam_manager.go to correctly identify and validate internal STS tokens, ensuring session policies are enforced. - Refactored VerifyActionPermission in auth_credentials.go to properly handle session tokens and avoid legacy authorization short-circuits. - Added debug logging for better tracing of policy evaluation and session validation.	2026-02-14 22:00:59 -08:00
Chris Lu	7799915e50	Fix IAM identity loss on S3 restart migration (#8343 ) * Fix IAM reload after legacy config migration Handle legacy identity.json metadata events by reloading from the credential manager instead of parsing event content, and watch the correct /etc/iam multi-file directories so identity changes are applied. Add regression tests for legacy deletion and /etc/iam/identities change events. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix auth_credentials_subscribe_test helper to not pollute global memory store The SaveConfiguration call was affecting other tests. Use local credential manager and ReplaceS3ApiConfiguration instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix IAM event watching: subscribe to IAM directories and improve directory matching - Add /etc/iam and its subdirectories (identities, policies, service_accounts) to directoriesToWatch - Fix directory matching to avoid false positives from sibling directories - Use exact match or prefix with trailing slash instead of plain HasPrefix - Prevents matching hypothetical /etc/iam/identities_backup directory This ensures IAM config change events are actually delivered to the handler. * fix tests --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-02-13 22:49:27 -08:00
Chris Lu	49a64f50f1	Add session policy support to IAM (#8338 ) * Add session policy support to IAM - Implement policy evaluation for session tokens in policy_engine.go - Add session_policy field to session claims for tracking applied policies - Update STS service to include session policies in token generation - Add IAM integration tests for session policy validation - Update IAM manager to support policy attachment to sessions - Extend S3 API STS endpoint to handle session policy restrictions * fix: optimize session policy evaluation and add documentation * sts: add NormalizeSessionPolicy helper for inline session policies * sts: support inline session policies for AssumeRoleWithWebIdentity and credential-based flows * s3api: parse and normalize Policy parameter for STS HTTP handlers * tests: add session policy unit tests and integration tests for inline policy downscoping * tests: add s3tables STS inline policy integration * iam: handle user principals and validate tokens * sts: enforce inline session policy size limit * tests: harden s3tables STS integration config * iam: clarify principal policy resolution errors * tests: improve STS integration endpoint selection	2026-02-13 13:58:22 -08:00
Chris Lu	4e1065e485	Fix: preserve request body for STS signature verification (#8324 ) * Fix: preserve request body for STS signature verification - Save and restore request body in UnifiedPostHandler after ParseForm() - This allows STS handler to verify signatures correctly - Fixes 'invalid AWS signature: 53' error (ErrContentSHA256Mismatch) - ParseForm() consumes the body, so we need to restore it for downstream handlers * Improve error handling in UnifiedPostHandler - Add http.MaxBytesReader to limit body size to 10 MiB (iamRequestBodyLimit) - Add proper error handling for io.ReadAll failures - Log errors when body reading fails - Prevents DoS attacks from oversized request bodies - Addresses code review feedback	2026-02-12 13:28:12 -08:00
Chris Lu	c1a9263e37	Fix STS AssumeRole with POST body param (#8320 ) * Fix STS AssumeRole with POST body param and add integration test * Add STS integration test to CI workflow * Address code review feedback: fix HPP vulnerability and style issues * Refactor: address code review feedback - Fix HTTP Parameter Pollution vulnerability in UnifiedPostHandler - Refactor permission check logic for better readability - Extract test helpers to testutil/docker.go to reduce duplication - Clean up imports and simplify context setting * Add SigV4-style test variant for AssumeRole POST body routing - Added ActionInBodyWithSigV4Style test case to validate real-world scenario - Test confirms routing works correctly for AWS SigV4-signed requests - Addresses code review feedback about testing with SigV4 signatures * Fix: always set identity in context when non-nil - Ensure UnifiedPostHandler always calls SetIdentityInContext when identity is non-nil - Only call SetIdentityNameInContext when identity.Name is non-empty - This ensures downstream handlers (embeddedIam.DoActions) always have access to identity - Addresses potential issue where empty identity.Name would skip context setting	2026-02-12 12:04:07 -08:00
Chris Lu	be6b5db65a	s3: fix health check endpoints returning 404 for HEAD requests #8243 (#8248 ) * Fix disk errors handling in vacuum compaction When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can: 1. Drop the faulty volume replica 2. Rebuild the replica from healthy copies Changes: - Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError - Preserve error wrapping using %w format instead of %v so EIO propagates correctly - The existing heartbeat logic will detect lastIoError and remove the bad volume Fixes issue #8237 * error * s3: fix health check endpoints returning 404 for HEAD requests #8243	2026-02-08 19:08:10 -08:00
Chris Lu	1274cf038c	s3: enforce authentication and JSON error format for Iceberg REST Catalog (#8192 ) * s3: enforce authentication and JSON error format for Iceberg REST Catalog * s3/iceberg: align error exception types with OpenAPI spec examples * s3api: refactor AuthenticateRequest to return identity object * s3/iceberg: propagate full identity object to request context * s3/iceberg: differentiate NotAuthorizedException and ForbiddenException * s3/iceberg: reject requests if authenticator is nil to prevent auth bypass * s3/iceberg: refactor Auth middleware to build context incrementally and use switch for error mapping * s3api: update misleading comment for authRequestWithAuthType * s3api: return ErrAccessDenied if IAM is not configured to prevent auth bypass * s3/iceberg: optimize context update in Auth middleware * s3api: export CanDo for external authorization use * s3/iceberg: enforce identity-based authorization in all API handlers * s3api: fix compilation errors by updating internal CanDo references * s3/iceberg: robust identity validation and consistent action usage in handlers * s3api: complete CanDo rename across tests and policy engine integration * s3api: fix integration tests by allowing admin access when auth is disabled and explicit gRPC ports * duckdb * create test bucket	2026-02-03 11:55:12 -08:00
Chris Lu	f1e27b8f30	s3: change s3 tables to use RESTful API (#8169 ) * s3: refactor s3 tables to use RESTful API * test/s3tables: guard empty namespaces * s3api: document tag parsing and validate get-table * s3api: limit S3Tables REST body size * Update weed/s3api/s3api_tables.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3tables/handler.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * s3api: accept encoded table bucket ARNs * s3api: validate namespaces and close body * s3api: match encoded table bucket ARNs * s3api: scope table bucket ARN routes * s3api: dedupe table bucket request builders * test/s3tables: allow list tables without namespace * s3api: validate table params and tag ARN * s3api: tighten tag handling and get-table params * s3api: loosen tag ARN route matching * Fix S3 Tables REST routing and tests * Adjust S3 Tables request parsing * Gate S3 Tables target routing * Avoid double decoding namespaces --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-30 10:37:34 -08:00
Chris Lu	6a12438351	s3api: register S3 Tables routes in API server - Add S3 Tables route registration in s3api_server.go registerRouter method - Enable S3 Tables API operations to be routed through S3 API server - Routes handled by s3api_tables.go integration layer - Minimal changes to existing S3 API structure	2026-01-28 00:55:39 -08:00
Chris Lu	551a31e156	Implement IAM propagation to S3 servers (#8130 ) * Implement IAM propagation to S3 servers - Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC - Add Policy management RPCs to S3 proto and S3ApiServer - Update CredentialManager to use PropagatingCredentialStore when MasterClient is available - Wire FilerServer to enable propagation * Implement parallel IAM propagation and fix S3 cluster registration - Parallelized IAM change propagation with 10s timeout. - Refined context usage in PropagatingCredentialStore. - Added S3Type support to cluster node management. - Enabled S3 servers to register with gRPC address to the master. - Ensured IAM configuration reload after policy updates via gRPC. * Optimize IAM propagation with direct in-memory cache updates * Secure IAM propagation: Use metadata to skip persistence only on propagation * pb: refactor IAM and S3 services for unidirectional IAM propagation - Move SeaweedS3IamCache service from iam.proto to s3.proto. - Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto. - Enforce that S3 servers only use the synchronization interface. * pb: regenerate Go code for IAM and S3 services Updated generated code following the proto refactoring of IAM synchronization services. * s3api: implement read-only mode for Embedded IAM API - Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP. - Enable read-only mode by default in S3ApiServer. - Handle AccessDenied error in writeIamErrorResponse. - Embed SeaweedS3IamCacheServer in S3ApiServer. * credential: refactor PropagatingCredentialStore for unidirectional IAM flow - Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers. - Propagate full Identity object via PutIdentity for consistency. - Remove redundant propagation of specific user/account/policy management RPCs. - Add timeout context for propagation calls. * s3api: implement SeaweedS3IamCacheServer for unidirectional sync - Update S3ApiServer to implement the cache synchronization gRPC interface. - Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates. - Register SeaweedS3IamCacheServer in command/s3.go. - Remove registration for the legacy and now empty SeaweedS3 service. * s3api: update tests for read-only IAM and propagation - Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode. - Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests. - Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior. * s3api: add back temporary debug logs for IAM updates Log IAM updates received via: - gRPC propagation (PutIdentity, PutPolicy, etc.) - Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager) - Core identity management (UpsertIdentity, RemoveIdentity) * IAM: finalize propagation fix with reduced logging and clarified architecture * Allow configuring IAM read-only mode for S3 server integration tests * s3api: add defensive validation to UpsertIdentity * s3api: fix log message to reference correct IAM read-only flag * test/s3/iam: ensure WaitForS3Service checks for IAM write permissions * test: enable writable IAM in Makefile for integration tests * IAM: add GetPolicy/ListPolicies RPCs to s3.proto * S3: add GetBucketPolicy and ListBucketPolicies helpers * S3: support storing generic IAM policies in IdentityAccessManagement * S3: implement IAM policy RPCs using IdentityAccessManagement * IAM: fix stale user identity on rename propagation	2026-01-26 22:59:43 -08:00
Chris Lu	6394e2f6a5	Fix IAM OIDC role mapping and OIDC claims in trust policy (#8104 ) * Fix IAM OIDC role mapping and OIDC claims in trust policy * Address PR review: Add config safety checks and refactor tests	2026-01-23 21:35:26 -08:00
Chris Lu	f6318edbc9	Refactor Admin UI to use unified IAM storage and add MultipleFileStore (#8101 ) * Refactor Admin UI to use unified IAM storage and add MultipleFileStore * Address PR feedback: fix renames, error handling, and sync logic in FilerMultipleStore * Address refined PR feedback: safe rename order, rollback logic, and structural sync refinement * Optimize LoadConfiguration: use streaming callback for memory efficiency * Refactor UpdateUser: log rollback failures during rename * Implement PolicyManager for FilerMultipleStore * include the filer_multiple backend configuration * Implement cross-S3 synchronization and proper shutdown for all IAM backends * Extract Admin UI refactoring to a separate PR	2026-01-23 20:12:59 -08:00
Chris Lu	d664ca5ed3	fix: IAM authentication with AWS Signature V4 and environment credentials (#8099 ) * fix: IAM authentication with AWS Signature V4 and environment credentials Three key fixes for authenticated IAM requests to work: 1. Fix request body consumption before signature verification - iamMatcher was calling r.ParseForm() which consumed POST body - This broke AWS Signature V4 verification on subsequent reads - Now only check query string in matcher, preserving body for verification - File: weed/s3api/s3api_server.go 2. Preserve environment variable credentials across config reloads - After IAM mutations, config reload overwrote env var credentials - Extract env var loading into loadEnvironmentVariableCredentials() - Call after every config reload to persist credentials - File: weed/s3api/auth_credentials.go 3. Add authenticated IAM tests and test infrastructure - New TestIAMAuthenticated suite with AWS SDK + Signature V4 - Dynamic port allocation for independent test execution - Flag reset to prevent state leakage between tests - CI workflow to run S3 and IAM tests separately - Files: test/s3/example/, .github/workflows/s3-example-integration-tests.yml All tests pass: - TestIAMCreateUser (unauthenticated) - TestIAMAuthenticated (with AWS Signature V4) - S3 integration tests fmt * chore: rename test/s3/example to test/s3/normal * simplify: CI runs all integration tests in single job * Update s3-example-integration-tests.yml * ci: run each test group separately to avoid raft registry conflicts	2026-01-23 16:27:42 -08:00
Chris Lu	ee3813787e	feat(s3api): Implement S3 Policy Variables (#8039 ) * feat: Add AWS IAM Policy Variables support to S3 API Implements policy variables for dynamic access control in bucket policies. Supported variables: - aws:username - Extracted from principal ARN - aws:userid - User identifier (same as username in SeaweedFS) - aws:principaltype - IAMUser, IAMRole, or AssumedRole - jwt:* - Any JWT claim (e.g., jwt:preferred_username, jwt:sub) Key changes: - Added PolicyVariableRegex to detect ${...} patterns - Extended CompiledStatement with DynamicResourcePatterns, DynamicPrincipalPatterns, DynamicActionPatterns - Added Claims field to PolicyEvaluationArgs for JWT claim access - Implemented SubstituteVariables() for variable replacement from context and JWT claims - Implemented extractPrincipalVariables() for ARN parsing - Updated EvaluateConditions() to support variable substitution - Comprehensive unit and integration tests Resolves #8037 * feat: Add LDAP and PrincipalAccount variable support Completes future enhancements for policy variables: - Added ldap:* variable support for LDAP claims - ldap:username - LDAP username from claims - ldap:dn - LDAP distinguished name from claims - ldap:* - Any LDAP claim - Added aws:PrincipalAccount extraction from ARN - Extracts account ID from principal ARN - Available as ${aws:PrincipalAccount} in policies Updated SubstituteVariables() to check LDAP claims Updated extractPrincipalVariables() to extract account ID Added comprehensive tests for new variables * feat(s3api): implement IAM policy variables core logic and optimization * feat(s3api): integrate policy variables with S3 authentication and handlers * test(s3api): add integration tests for policy variables * cleanup: remove unused policy conversion files * Add S3 policy variables integration tests and path support - Add comprehensive integration tests for policy variables - Test username isolation, JWT claims, LDAP claims - Add support for IAM paths in principal ARN parsing - Add tests for principals with paths * Fix IAM Role principal variable extraction IAM Roles should not have aws:userid or aws:PrincipalAccount according to AWS behavior. Only IAM Users and Assumed Roles should have these variables. Fixes TestExtractPrincipalVariables test failures. * Security fixes and bug fixes for S3 policy variables SECURITY FIXES: - Prevent X-SeaweedFS-Principal header spoofing by clearing internal headers at start of authentication (auth_credentials.go) - Restrict policy variable substitution to safe allowlist to prevent client header injection (iam/policy/policy_engine.go) - Add core policy validation before storing bucket policies BUG FIXES: - Remove unused sid variable in evaluateStatement - Fix LDAP claim lookup to check both prefixed and unprefixed keys - Add ValidatePolicy call in PutBucketPolicyHandler These fixes prevent privilege escalation via header injection and ensure only validated identity claims are used in policy evaluation. * Additional security fixes and code cleanup SECURITY FIXES: - Fixed X-Forwarded-For spoofing by only trusting proxy headers from private/localhost IPs (s3_iam_middleware.go) - Changed context key from "sourceIP" to "aws:SourceIp" for proper policy variable substitution CODE IMPROVEMENTS: - Kept aws:PrincipalAccount for IAM Roles to support condition evaluations - Removed redundant STS principaltype override - Removed unused service variable - Cleaned up commented-out debug logging statements - Updated tests to reflect new IAM Role behavior These changes prevent IP spoofing attacks and ensure policy variables work correctly with the safe allowlist. * Add security documentation for ParseJWTToken Added comprehensive security comments explaining that ParseJWTToken is safe despite parsing without verification because: - It's only used for routing to the correct verification method - All code paths perform cryptographic verification before trusting claims - OIDC tokens: validated via validateExternalOIDCToken - STS tokens: validated via ValidateSessionToken Enhanced function documentation with clear security warnings about proper usage to prevent future misuse. * Fix IP condition evaluation to use aws:SourceIp key Fixed evaluateIPCondition in IAM policy engine to use "aws:SourceIp" instead of "sourceIP" to match the updated extractRequestContext. This fixes the failing IP-restricted role test where IP-based policy conditions were not being evaluated correctly. Updated all test cases to use the correct "aws:SourceIp" key. * Address code review feedback: optimize and clarify PERFORMANCE IMPROVEMENT: - Optimized expandPolicyVariables to use regexp.ReplaceAllStringFunc for single-pass variable substitution instead of iterating through all safe variables. This improves performance from O(nm) to O(m) where n is the number of safe variables and m is the pattern length. CODE CLARITY: - Added detailed comment explaining LDAP claim fallback mechanism (checks both prefixed and unprefixed keys for compatibility) - Enhanced TODO comment for trusted proxy configuration with rationale and recommendations for supporting cloud load balancers, CDNs, and complex network topologies All tests passing. Address Copilot code review feedback BUG FIXES: - Fixed type switch for int/int32/int64 - separated into individual cases since interface type switches only match the first type in multi-type cases - Fixed grammatically incorrect error message in types.go CODE QUALITY: - Removed duplicate Resource/NotResource validation (already in ValidateStatement) - Added comprehensive comment explaining isEnabled() logic and security implications - Improved trusted proxy NOTE comment to be more concise while noting limitations All tests passing. * Fix test failures after extractSourceIP security changes Updated tests to work with the security fix that only trusts X-Forwarded-For/X-Real-IP headers from private IP addresses: - Set RemoteAddr to 127.0.0.1 in tests to simulate trusted proxy - Changed context key from "sourceIP" to "aws:SourceIp" - Added test case for untrusted proxy (public RemoteAddr) - Removed invalid ValidateStatement call (validation happens in ValidatePolicy) All tests now passing. * Address remaining Gemini code review feedback CODE SAFETY: - Deep clone Action field in CompileStatement to prevent potential data races if the original policy document is modified after compilation TEST CLEANUP: - Remove debug logging (fmt.Fprintf) from engine_notresource_test.go - Remove unused imports in engine_notresource_test.go All tests passing. * Fix insecure JWT parsing in IAM auth flow SECURITY FIX: - Renamed ParseJWTToken to ParseUnverifiedJWTToken with explicit security warnings. - Refactored AuthenticateJWT to use the trusted SessionInfo returned by ValidateSessionToken instead of relying on unverified claims from the initial parse. - Refactored ValidatePresignedURLWithIAM to reuse the robust AuthenticateJWT logic, removing duplicated and insecure manual token parsing. This ensures all identity information (Role, Principal, Subject) used for authorization decisions is derived solely from cryptographically verified tokens. * Security: Fix insecure JWT claim extraction in policy engine - Refactored EvaluatePolicy to accept trusted claims from verified Identity instead of parsing unverified tokens - Updated AuthenticateJWT to populate Claims in IAMIdentity from verified sources (SessionInfo/ExternalIdentity) - Updated s3api_server and handlers to pass claims correctly - Improved isPrivateIP to support IPv6 loopback, link-local, and ULA - Fixed flaky distributed_session_consistency test with retry logic * fix(iam): populate Subject in STSSessionInfo to ensure correct identity propagation This fixes the TestS3IAMAuthentication/valid_jwt_token_authentication failure by ensuring the session subject (sub) is correctly mapped to the internal SessionInfo struct, allowing bucket ownership validation to succeed. * Optimized isPrivateIP * Create s3-policy-tests.yml * fix tests * fix tests * tests(s3/iam): simplify policy to resource-based \ (step 1) * tests(s3/iam): add explicit Deny NotResource for isolation (step 2) * fixes * policy: skip resource matching for STS trust policies to allow AssumeRole evaluation * refactor: remove debug logging and hoist policy variables for performance * test: fix TestS3IAMBucketPolicyIntegration cleanup to handle per-subtest object lifecycle * test: fix bucket name generation to comply with S3 63-char limit * test: skip TestS3IAMPolicyEnforcement until role setup is implemented * test: use weed mini for simpler test server deployment Replace 'weed server' with 'weed mini' for IAM tests to avoid port binding issues and simplify the all-in-one server deployment. This improves test reliability and execution time. * security: prevent allocation overflow in policy evaluation Add maxPoliciesForEvaluation constant to cap the number of policies evaluated in a single request. This prevents potential integer overflow when allocating slices for policy lists that may be influenced by untrusted input. Changes: - Add const maxPoliciesForEvaluation = 1024 to set an upper bound - Validate len(policies) < maxPoliciesForEvaluation before appending bucket policy - Use append() instead of make([]string, len+1) to avoid arithmetic overflow - Apply fix to both IsActionAllowed policy evaluation paths	2026-01-16 11:12:28 -08:00
Feng Shao	8abcdc6d00	use "s" flag of regexp to let . match \n (#8024 ) * use "s" flag of regexp to let . match \n the partten "/{object:.+}" cause the mux failed to match URI of object with new line char, and the request fall thru into bucket handlers. * refactor --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-01-14 13:49:49 -08:00
Chris Lu	06391701ed	Add AssumeRole and AssumeRoleWithLDAPIdentity STS actions (#8003 ) * test: add integration tests for AssumeRole and AssumeRoleWithLDAPIdentity STS actions - Add s3_sts_assume_role_test.go with comprehensive tests for AssumeRole: * Parameter validation (missing RoleArn, RoleSessionName, invalid duration) * AWS SigV4 authentication with valid/invalid credentials * Temporary credential generation and usage - Add s3_sts_ldap_test.go with tests for AssumeRoleWithLDAPIdentity: * Parameter validation (missing LDAP credentials, RoleArn) * LDAP authentication scenarios (valid/invalid credentials) * Integration with LDAP server (when configured) - Update Makefile with new test targets: * test-sts: run all STS tests * test-sts-assume-role: run AssumeRole tests only * test-sts-ldap: run LDAP STS tests only * test-sts-suite: run tests with full service lifecycle - Enhance setup_all_tests.sh: * Add OpenLDAP container setup for LDAP testing * Create test LDAP users (testuser, ldapadmin) * Set LDAP environment variables for tests * Update cleanup to remove LDAP container - Fix setup_keycloak.sh: * Enable verbose error logging for realm creation * Improve error diagnostics Tests use fail-fast approach (t.Fatal) when server not configured, ensuring clear feedback when infrastructure is missing. * feat: implement AssumeRole and AssumeRoleWithLDAPIdentity STS actions Implement two new STS actions to match MinIO's STS feature set: AssumeRole Implementation: - Add handleAssumeRole with full AWS SigV4 authentication - Integrate with existing IAM infrastructure via verifyV4Signature - Validate required parameters (RoleArn, RoleSessionName) - Validate DurationSeconds (900-43200 seconds range) - Generate temporary credentials with expiration - Return AWS-compatible XML response AssumeRoleWithLDAPIdentity Implementation: - Add handleAssumeRoleWithLDAPIdentity handler (stub) - Validate LDAP-specific parameters (LDAPUsername, LDAPPassword) - Validate common STS parameters (RoleArn, RoleSessionName, DurationSeconds) - Return proper error messages for missing LDAP provider - Ready for LDAP provider integration Routing Fixes: - Add explicit routes for AssumeRole and AssumeRoleWithLDAPIdentity - Prevent IAM handler from intercepting authenticated STS requests - Ensure proper request routing priority Handler Infrastructure: - Add IAM field to STSHandlers for SigV4 verification - Update NewSTSHandlers to accept IAM reference - Add STS-specific error codes and response types - Implement writeSTSErrorResponse for AWS-compatible errors The AssumeRole action is fully functional and tested. AssumeRoleWithLDAPIdentity requires LDAP provider implementation. * fix: update IAM matcher to exclude STS actions from interception Update the IAM handler matcher to check for STS actions (AssumeRole, AssumeRoleWithWebIdentity, AssumeRoleWithLDAPIdentity) and exclude them from IAM handler processing. This allows STS requests to be handled by the STS fallback handler even when they include AWS SigV4 authentication. The matcher now parses the form data to check the Action parameter and returns false for STS actions, ensuring they are routed to the correct handler. Note: This is a work-in-progress fix. Tests are still showing some routing issues that need further investigation. * fix: address PR review security issues for STS handlers This commit addresses all critical security issues from PR review: Security Fixes: - Use crypto/rand for cryptographically secure credential generation instead of time.Now().UnixNano() (fixes predictable credentials) - Add sts:AssumeRole permission check via VerifyActionPermission to prevent unauthorized role assumption - Generate proper session tokens using crypto/rand instead of placeholder strings Code Quality Improvements: - Refactor DurationSeconds parsing into reusable parseDurationSeconds() helper function used by all three STS handlers - Create generateSecureCredentials() helper for consistent and secure temporary credential generation - Fix iamMatcher to check query string as fallback when Action not found in form data LDAP Provider Implementation: - Add go-ldap/ldap/v3 dependency - Create LDAPProvider implementing IdentityProvider interface with full LDAP authentication support (connect, bind, search, groups) - Update ProviderFactory to create real LDAP providers - Wire LDAP provider into AssumeRoleWithLDAPIdentity handler Test Infrastructure: - Add LDAP user creation verification step in setup_all_tests.sh * fix: address PR feedback (Round 2) - config validation & provider improvements - Implement `validateLDAPConfig` in `ProviderFactory` - Improve `LDAPProvider.Initialize`: - Support `connectionTimeout` parsing (string/int/float) from config map - Warn if `BindDN` is present but `BindPassword` is empty - Improve `LDAPProvider.GetUserInfo`: - Add fallback to `searchUserGroups` if `memberOf` returns no groups (consistent with Authenticate) * fix: address PR feedback (Round 3) - LDAP connection improvements & build fix - Improve `LDAPProvider` connection handling: - Use `net.Dialer` with configured timeout for connection establishment - Enforce TLS 1.2+ (`MinVersion: tls.VersionTLS12`) for both LDAPS and StartTLS - Fix build error in `s3api_sts.go` (format verb for ErrorCode) * fix: address PR feedback (Round 4) - LDAP hardening, Authz check & Routing fix - LDAP Provider Hardening: - Prevent re-initialization - Enforce single user match in `GetUserInfo` (was explicit only in Authenticate) - Ensure connection closure if StartTLS fails - STS Handlers: - Add robust provider detection using type assertion - Security: Implement authorization check (`VerifyActionPermission`) after LDAP authentication - Routing: - Update tests to reflect that STS actions are handled by STS handler, not generic IAM * fix: address PR feedback (Round 5) - JWT tokens, ARN formatting, PrincipalArn CRITICAL FIXES: - Replace standalone credential generation with STS service JWT tokens - handleAssumeRole now generates proper JWT session tokens - handleAssumeRoleWithLDAPIdentity now generates proper JWT session tokens - Session tokens can be validated across distributed instances - Fix ARN formatting in responses - Extract role name from ARN using utils.ExtractRoleNameFromArn() - Prevents malformed ARNs like "arn:aws:sts::assumed-role/arn:aws:iam::..." - Add configurable AccountId for federated users - Add AccountId field to STSConfig (defaults to "111122223333") - PrincipalArn now uses configured account ID instead of hardcoded "aws" - Enables proper trust policy validation IMPROVEMENTS: - Sanitize LDAP authentication error messages (don't leak internal details) - Remove duplicate comment in provider detection - Add utils import for ARN parsing utilities * feat: implement LDAP connection pooling to prevent resource exhaustion PERFORMANCE IMPROVEMENT: - Add connection pool to LDAPProvider (default size: 10 connections) - Reuse LDAP connections across authentication requests - Prevent file descriptor exhaustion under high load IMPLEMENTATION: - connectionPool struct with channel-based connection management - getConnection(): retrieves from pool or creates new connection - returnConnection(): returns healthy connections to pool - createConnection(): establishes new LDAP connection with TLS support - Close(): cleanup method to close all pooled connections - Connection health checking (IsClosing()) before reuse BENEFITS: - Reduced connection overhead (no TCP handshake per request) - Better resource utilization under load - Prevents "too many open files" errors - Non-blocking pool operations (creates new conn if pool empty) * fix: correct TokenGenerator access in STS handlers CRITICAL FIX: - Make TokenGenerator public in STSService (was private tokenGenerator) - Update all references from Config.TokenGenerator to TokenGenerator - Remove TokenGenerator from STSConfig (it belongs in STSService) This fixes the "NotImplemented" errors in distributed and Keycloak tests. The issue was that Round 5 changes tried to access Config.TokenGenerator which didn't exist - TokenGenerator is a field in STSService, not STSConfig. The TokenGenerator is properly initialized in STSService.Initialize() and is now accessible for JWT token generation in AssumeRole handlers. * fix: update tests to use public TokenGenerator field Following the change to make TokenGenerator public in STSService, this commit updates the test files to reference the correct public field name. This resolves compilation errors in the IAM STS test suite. * fix: update distributed tests to use valid Keycloak users Updated s3_iam_distributed_test.go to use 'admin-user' and 'read-user' which exist in the standard Keycloak setup provided by setup_keycloak.sh. This resolves 'unknown test user' errors in distributed integration tests. * fix: ensure iam_config.json exists in setup target for CI The GitHub Actions workflow calls 'make setup' which was not creating iam_config.json, causing the server to start without IAM integration enabled (iamIntegration = nil), resulting in NotImplemented errors. Now 'make setup' copies iam_config.local.json to iam_config.json if it doesn't exist, ensuring IAM is properly configured in CI. * fix(iam/ldap): fix connection pool race and rebind corruption - Add atomic 'closed' flag to connection pool to prevent racing on Close() - Rebind authenticated user connections back to service account before returning to pool - Close connections on error instead of returning potentially corrupted state to pool * fix(iam/ldap): populate standard TokenClaims fields in ValidateToken - Set Subject, Issuer, Audience, IssuedAt, and ExpiresAt to satisfy the interface - Use time.Time for timestamps as required by TokenClaims struct - Default to 1 hour TTL for LDAP tokens * fix(s3api): include account ID in STS AssumedRoleUser ARN - Consistent with AWS, include the account ID in the assumed-role ARN - Use the configured account ID from STS service if available, otherwise default to '111122223333' - Apply to both AssumeRole and AssumeRoleWithLDAPIdentity handlers - Also update .gitignore to ignore IAM test environment files * refactor(s3api): extract shared STS credential generation logic - Move common logic for session claims and credential generation to prepareSTSCredentials - Update handleAssumeRole and handleAssumeRoleWithLDAPIdentity to use the helper - Remove stale comments referencing outdated line numbers * feat(iam/ldap): make pool size configurable and add audience support - Add PoolSize to LDAPConfig (default 10) - Add Audience to LDAPConfig to align with OIDC validation - Update initialization and ValidateToken to use new fields * update tests * debug * chore(iam): cleanup debug prints and fix test config port * refactor(iam): use mapstructure for LDAP config parsing * feat(sts): implement strict trust policy validation for AssumeRole * test(iam): refactor STS tests to use AWS SDK signer * test(s3api): implement ValidateTrustPolicyForPrincipal in MockIAMIntegration * fix(s3api): ensure IAM matcher checks query string on ParseForm error * fix(sts): use crypto/rand for secure credentials and extract constants * fix(iam): fix ldap connection leaks and add insecure warning * chore(iam): improved error wrapping and test parameterization * feat(sts): add support for LDAPProviderName parameter * Update weed/iam/ldap/ldap_provider.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_sts.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(sts): use STSErrSTSNotReady when LDAP provider is missing * fix(sts): encapsulate TokenGenerator in STSService and add getter --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-12 10:45:24 -08:00
Chris Lu	e3db95e0c1	Fix: Route unauthenticated specific STS requests to STS handler correctly (#7920 ) * Fix STS Access Denied for AssumeRoleWithWebIdentity (Issue #7917) * Fix logging regression: ensure IAM status is logged even if STS is enabled * Address PR feedback: fix duplicate log, clarify comments, add comprehensive routing tests * Add edge case test: authenticated STS action routes to IAM (auth precedence)	2025-12-30 16:34:05 -08:00
Chris Lu	7a18c3a16f	Fix critical authentication bypass vulnerability (#7912 ) (#7915 ) * Fix critical authentication bypass vulnerability (#7912) The isRequestPostPolicySignatureV4() function was incorrectly returning true for ANY POST request with multipart/form-data content type, causing all such requests to bypass authentication in authRequest(). This allowed unauthenticated access to S3 API endpoints, as reported in issue #7912 where any credentials (or no credentials) were accepted. The fix removes isRequestPostPolicySignatureV4() entirely, preventing authTypePostPolicy from ever being set. PostPolicy signature verification is still properly handled in PostPolicyBucketHandler via doesPolicySignatureMatch(). Fixes #7912 * add AuthPostPolicy * refactor * Optimizing Auth Credentials * Update auth_credentials.go * Update auth_credentials.go	2025-12-30 12:40:59 -08:00
Chris Lu	ae9a943ef6	IAM: Add Service Account Support (#7744 ) (#7901 ) * iam: add ServiceAccount protobuf schema Add ServiceAccount message type to iam.proto with support for: - Unique ID and parent user linkage - Optional expiration timestamp - Separate credentials (access key/secret) - Action restrictions (subset of parent) - Enable/disable status This is the first step toward implementing issue #7744 (IAM Service Account Support). * iam: add service account response types Add IAM API response types for service account operations: - ServiceAccountInfo struct for marshaling account details - CreateServiceAccountResponse - DeleteServiceAccountResponse - ListServiceAccountsResponse - GetServiceAccountResponse - UpdateServiceAccountResponse Also add type aliases in iamapi package for backwards compatibility. Part of issue #7744 (IAM Service Account Support). * iam: implement service account API handlers Add CRUD operations for service accounts: - CreateServiceAccount: Creates service account with ABIA key prefix - DeleteServiceAccount: Removes service account and parent linkage - ListServiceAccounts: Lists all or filtered by parent user - GetServiceAccount: Retrieves service account details - UpdateServiceAccount: Modifies status, description, expiration Service accounts inherit parent user's actions by default and support optional expiration timestamps. Part of issue #7744 (IAM Service Account Support). * sts: add AssumeRoleWithWebIdentity HTTP endpoint Add STS API HTTP endpoint for AWS SDK compatibility: - Create s3api_sts.go with HTTP handlers matching AWS STS spec - Support AssumeRoleWithWebIdentity action with JWT token - Return XML response with temporary credentials (AccessKeyId, SecretAccessKey, SessionToken) matching AWS format - Register STS route at POST /?Action=AssumeRoleWithWebIdentity This enables AWS SDKs (boto3, AWS CLI, etc.) to obtain temporary S3 credentials using OIDC/JWT tokens. Part of issue #7744 (IAM Service Account Support). * test: add service account and STS integration tests Add integration tests for new IAM features: s3_service_account_test.go: - TestServiceAccountLifecycle: Create, Get, List, Update, Delete - TestServiceAccountValidation: Error handling for missing params s3_sts_test.go: - TestAssumeRoleWithWebIdentityValidation: Parameter validation - TestAssumeRoleWithWebIdentityWithMockJWT: JWT token handling Tests skip gracefully when SeaweedFS is not running or when IAM features are not configured. Part of issue #7744 (IAM Service Account Support). * iam: address code review comments - Add constants for service account ID and key lengths - Use strconv.ParseInt instead of fmt.Sscanf for better error handling - Allow clearing descriptions by checking key existence in url.Values - Replace magic numbers (12, 20, 40) with named constants Addresses review comments from gemini-code-assist[bot] * test: add proper error handling in service account tests Use require.NoError(t, err) for io.ReadAll and xml.Unmarshal to prevent silent failures and ensure test reliability. Addresses review comment from gemini-code-assist[bot] * test: add proper error handling in STS tests Use require.NoError(t, err) for io.ReadAll and xml.Unmarshal to prevent silent failures and ensure test reliability. Repeated this fix throughout the file. Addresses review comment from gemini-code-assist[bot] in PR #7901. * iam: address additional code review comments - Specific error code mapping for STS service errors - Distinguish between Sender and Receiver error types in STS responses - Add nil checks for credentials in List/GetServiceAccount - Validate expiration date is in the future - Improve integration test error messages (include response body) - Add credential verification step in service account tests Addresses remaining review comments from gemini-code-assist[bot] across multiple files. * iam: fix shared slice reference in service account creation Copy parent's actions to create an independent slice for the service account instead of sharing the underlying array. This prevents unexpected mutations when the parent's actions are modified later. Addresses review comment from coderabbitai[bot] in PR #7901. * iam: remove duplicate unused constant Removed redundant iamServiceAccountKeyPrefix as ServiceAccountKeyPrefix is already defined and used. Addresses remaining cleanup task. * sts: document limitation of string-based error mapping Added TODO comment explaining that the current string-based error mapping approach is fragile and should be replaced with typed errors from the STS service in a future refactoring. This addresses the architectural concern raised in code review while deferring the actual implementation to a separate PR to avoid scope creep in the current service account feature addition. * iam: fix remaining review issues - Add future-date validation for expiration in UpdateServiceAccount - Reorder tests so credential verification happens before deletion - Fix compilation error by using correct JWT generation methods Addresses final review comments from coderabbitai[bot]. * iam: fix service account access key length The access key IDs were incorrectly generated with 24 characters instead of the AWS-standard 20 characters. This was caused by generating 20 random characters and then prepending the 4-character ABIA prefix. Fixed by subtracting the prefix length from AccessKeyLength, so the final key is: ABIA (4 chars) + random (16 chars) = 20 chars total. This ensures compatibility with S3 clients that validate key length. * test: add comprehensive service account security tests Added comprehensive integration tests for service account functionality: - TestServiceAccountS3Access: Verify SA credentials work for S3 operations - TestServiceAccountExpiration: Test expiration date validation and enforcement - TestServiceAccountInheritedPermissions: Verify parent-child relationship - TestServiceAccountAccessKeyFormat: Validate AWS-compatible key format (ABIA prefix, 20 char length) These tests ensure SeaweedFS service accounts are compatible with AWS conventions and provide robust security coverage. * iam: remove unused UserAccessKeyPrefix constant Code cleanup to remove unused constants. * iam: remove unused iamCommonResponse type alias Code cleanup to remove unused type aliases. * iam: restore and use UserAccessKeyPrefix constant Restored UserAccessKeyPrefix constant and updated s3api tests to use it instead of hardcoded strings for better maintainability and consistency. * test: improve error handling in service account security tests Added explicit error checking for io.ReadAll and xml.Unmarshal in TestServiceAccountExpiration to ensure failures are reported correctly and cleanup is performed only when appropriate. Also added logging for failed responses. * test: use t.Cleanup for reliable resource cleanup Replaced defer with t.Cleanup to ensure service account cleanup runs even when require.NoError fails. Also switched from manual error checking to require.NoError for more idiomatic testify usage. * iam: add CreatedBy field and optimize identity lookups - Added createdBy parameter to CreateServiceAccount to track who created each service account - Extract creator identity from request context using GetIdentityNameFromContext - Populate created_by field in ServiceAccount protobuf - Added findIdentityByName helper function to optimize identity lookups - Replaced nested loops with O(n) helper function calls in CreateServiceAccount and DeleteServiceAccount This addresses code review feedback for better auditing and performance. * iam: prevent user deletion when service accounts exist Following AWS IAM behavior, prevent deletion of users that have active service accounts. This ensures explicit cleanup and prevents orphaned service account resources with invalid ParentUser references. Users must delete all associated service accounts before deleting the parent user, providing safer resource management. * sts: enhance TODO with typed error implementation guidance Updated TODO comment with detailed implementation approach for replacing string-based error matching with typed errors using errors.Is(). This provides a clear roadmap for a follow-up PR to improve error handling robustness and maintainability. * iam: add operational limits for service account creation Added AWS IAM-compatible safeguards to prevent resource exhaustion: - Maximum 100 service accounts per user (LimitExceededException) - Maximum 1000 character description length (InvalidInputException) These limits prevent accidental or malicious resource exhaustion while not impacting legitimate use cases. * iam: add missing operational limit constants Added MaxServiceAccountsPerUser and MaxDescriptionLength constants that were referenced in the previous commit but not defined. * iam: enforce service account expiration during authentication CRITICAL SECURITY FIX: Expired service account credentials were not being rejected during authentication, allowing continued access after expiration. Changes: - Added Expiration field to Credential struct - Populate expiration when loading service accounts from configuration - Check expiration in all authentication paths (V2 and V4 signatures) - Return ErrExpiredToken for expired credentials This ensures expired service accounts are properly rejected at authentication time, matching AWS IAM behavior and preventing unauthorized access. * iam: fix error code for expired service account credentials Use ErrAccessDenied instead of non-existent ErrExpiredToken for expired service account credentials. This provides appropriate access denial for expired credentials while maintaining AWS-compatible error responses. * iam: fix remaining ErrExpiredToken references Replace all remaining instances of non-existent ErrExpiredToken with ErrAccessDenied for expired service account credentials. * iam: apply AWS-standard key format to user access keys Updated CreateAccessKey to generate AWS-standard 20-character access keys with AKIA prefix for regular users, matching the format used for service accounts. This ensures consistency across all access key types and full AWS compatibility. - Access keys: AKIA + 16 random chars = 20 total (was 21 chars, no prefix) - Secret keys: 40 random chars (was 42, now matches AWS standard) - Uses AccessKeyLength and UserAccessKeyPrefix constants * sts: replace fragile string-based error matching with typed errors Implemented robust error handling using typed errors and errors.Is() instead of fragile strings.Contains() matching. This decouples the HTTP layer from service implementation details and prevents errors from being miscategorized if error messages change. Changes: - Added typed error variables to weed/iam/sts/constants.go: * ErrTypedTokenExpired * ErrTypedInvalidToken * ErrTypedInvalidIssuer * ErrTypedInvalidAudience * ErrTypedMissingClaims - Updated STS service to wrap provider authentication errors with typed errors - Replaced strings.Contains() with errors.Is() in HTTP layer for error checking - Removed TODO comment as the improvement is now implemented This makes error handling more maintainable and reliable. * sts: eliminate all string-based error matching with provider-level typed errors Completed the typed error implementation by adding provider-level typed errors and updating provider implementations to return them. This eliminates ALL fragile string matching throughout the entire error handling stack. Changes: - Added typed error definitions to weed/iam/providers/errors.go: * ErrProviderTokenExpired * ErrProviderInvalidToken * ErrProviderInvalidIssuer * ErrProviderInvalidAudience * ErrProviderMissingClaims - Updated OIDC provider to wrap JWT validation errors with typed provider errors - Replaced strings.Contains() with errors.Is() in STS service for error mapping - Complete error chain: Provider -> STS -> HTTP layer, all using errors.Is() This provides: - Reliable error classification independent of error message content - Type-safe error checking throughout the stack - No order-dependent string matching - Maintainable error handling that won't break with message changes * oidc: use jwt.ErrTokenExpired instead of string matching Replaced the last remaining string-based error check with the JWT library's exported typed error. This makes the error detection independent of error message content and more robust against library updates. Changed from: strings.Contains(errMsg, "expired") To: errors.Is(err, jwt.ErrTokenExpired) This completes the elimination of ALL string-based error matching throughout the entire authentication stack. * iam: add description length validation to UpdateServiceAccount Fixed inconsistency where UpdateServiceAccount didn't validate description length against MaxDescriptionLength, allowing operational limits to be bypassed during updates. Now validates that updated descriptions don't exceed 1000 characters, matching the validation in CreateServiceAccount. * iam: refactor expiration check into helper method Extracted duplicated credential expiration check logic into a helper method to reduce code duplication and improve maintainability. Added Credential.isCredentialExpired() method and replaced 5 instances of inline expiration checks across auth_signature_v2.go and auth_signature_v4.go. * iam: address critical Copilot security and consistency feedback Fixed three critical issues identified by Copilot code review: 1. SECURITY: Prevent loading disabled service account credentials - Added check to skip disabled service accounts during credential loading - Disabled accounts can no longer authenticate 2. Add DurationSeconds validation for STS AssumeRoleWithWebIdentity - Enforce AWS-compatible range: 900-43200 seconds (15 min - 12 hours) - Returns proper error for out-of-range values 3. Fix expiration update consistency in UpdateServiceAccount - Added key existence check like Description field - Allows explicit clearing of expiration by setting to empty string - Distinguishes between "not updating" and "clearing expiration" * sts: remove unused durationSecondsStr variable Fixed build error from unused variable after refactoring duration parsing. * iam: address remaining Copilot feedback and remove dead code Completed remaining Copilot code review items: 1. Remove unused getPermission() method (dead code) - Method was defined but never called anywhere 2. Improve slice modification safety in DeleteServiceAccount - Replaced append-with-slice-operations with filter pattern - Avoids potential issues from mutating slice during iteration 3. Fix route registration order - Moved STS route registration BEFORE IAM route - Prevents IAM route from intercepting STS requests - More specific route (with query parameter) now registered first * iam: improve expiration validation and test cleanup robustness Addressed additional Copilot feedback: 1. Make expiration validation more explicit - Added explicit check for negative values - Added comment clarifying that 0 is allowed to clear expiration - Improves code readability and intent 2. Fix test cleanup order in s3_service_account_test.go - Track created service accounts in a slice - Delete all service accounts before deleting parent user - Prevents DeleteConflictException during cleanup - More robust cleanup even if test fails mid-execution Note: s3_service_account_security_test.go already had correct cleanup order due to LIFO defer execution. * test: remove redundant variable assignments Removed duplicate assignments of createdSAId, createdAccessKeyId, and createdSecretAccessKey on lines 148-150 that were already assigned on lines 132-134.	2025-12-29 20:17:23 -08:00
Chris Lu	8d6bcddf60	Add S3 volume encryption support with -s3.encryptVolumeData flag (#7890 ) * Add S3 volume encryption support with -s3.encryptVolumeData flag This change adds volume-level encryption support for S3 uploads, similar to the existing -filer.encryptVolumeData option. Each chunk is encrypted with its own auto-generated CipherKey when the flag is enabled. Changes: - Add -s3.encryptVolumeData flag to weed s3, weed server, and weed mini - Wire Cipher option through S3ApiServer and ChunkedUploadOption - Add integration tests for multi-chunk range reads with encryption - Tests verify encryption works across chunk boundaries Usage: weed s3 -encryptVolumeData weed server -s3 -s3.encryptVolumeData weed mini -s3.encryptVolumeData Integration tests: go test -v -tags=integration -timeout 5m ./test/s3/sse/... * Add GitHub Actions CI for S3 volume encryption tests - Add test-volume-encryption target to Makefile that starts server with -s3.encryptVolumeData - Add s3-volume-encryption job to GitHub Actions workflow - Tests run with integration build tag and 10m timeout - Server logs uploaded on failure for debugging * Fix S3 client credentials to use environment variables The test was using hardcoded credentials "any"/"any" but the Makefile sets AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY to "some_access_key1"/ "some_secret_key1". Updated getS3Client() to read from environment variables with fallback to "any"/"any" for manual testing. * Change bucket creation errors from skip to fatal Tests should fail, not skip, when bucket creation fails. This ensures that credential mismatches and other configuration issues are caught rather than silently skipped. * Make copy and multipart test jobs fail instead of succeed Changed exit 0 to exit 1 for s3-sse-copy-operations and s3-sse-multipart jobs. These jobs document known limitations but should fail to ensure the issues are tracked and addressed, not silently ignored. * Hardcode S3 credentials to match Makefile Changed from environment variables to hardcoded credentials "some_access_key1"/"some_secret_key1" to match the Makefile configuration. This ensures tests work reliably. * fix Double Encryption * fix Chunk Size Mismatch * Added IsCompressed * is gzipped * fix copying * only perform HEAD request when len(cipherKey) > 0 * Revert "Make copy and multipart test jobs fail instead of succeed" This reverts commit bc34a7eb3c103ae7ab2000da2a6c3925712eb226. * fix security vulnerability * fix security * Update s3api_object_handlers_copy.go * Update s3api_object_handlers_copy.go * jwt to get content length	2025-12-27 00:09:14 -08:00
G-OD	504b258258	s3: fix remote object not caching (#7790 ) * s3: fix remote object not caching * s3: address review comments for remote object caching - Fix leading slash in object name by using strings.TrimPrefix - Return cached entry from CacheRemoteObjectToLocalCluster to get updated local chunk locations - Reuse existing helper function instead of inline gRPC call * s3/filer: add singleflight deduplication for remote object caching - Add singleflight.Group to FilerServer to deduplicate concurrent cache operations - Wrap CacheRemoteObjectToLocalCluster with singleflight to ensure only one caching operation runs per object when multiple clients request the same file - Add early-return check for already-cached objects - S3 API calls filer gRPC with timeout and graceful fallback on error - Clear negative bucket cache when bucket is created via weed shell - Add integration tests for remote cache with singleflight deduplication This benefits all clients (S3, HTTP, Hadoop) accessing remote-mounted objects by preventing redundant cache operations and improving concurrent access performance. Fixes: https://github.com/seaweedfs/seaweedfs/discussions/7599 * fix: data race in concurrent remote object caching - Add mutex to protect chunks slice from concurrent append - Add mutex to protect fetchAndWriteErr from concurrent read/write - Fix incorrect error check (was checking assignResult.Error instead of parseErr) - Rename inner variable to avoid shadowing fetchAndWriteErr * fix: address code review comments - Remove duplicate remote caching block in GetObjectHandler, keep only singleflight version - Add mutex protection for concurrent chunk slice and error access (data race fix) - Use lazy initialization for S3 client in tests to avoid panic during package load - Fix markdown linting: add language specifier to code fence, blank lines around tables - Add 'all' target to Makefile as alias for test-with-server - Remove unused 'util' import * style: remove emojis from test files * fix: add defensive checks and sort chunks by offset - Add nil check and type assertion check for singleflight result - Sort chunks by offset after concurrent fetching to maintain file order * fix: improve test diagnostics and path normalization - runWeedShell now returns error for better test diagnostics - Add all targets to .PHONY in Makefile (logs-primary, logs-remote, health) - Strip leading slash from normalizedObject to avoid double slashes in path --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2025-12-16 12:41:04 -08:00
Chris Lu	f5c666052e	feat: add S3 bucket size and object count metrics (#7776 ) * feat: add S3 bucket size and object count metrics Adds periodic collection of bucket size metrics: - SeaweedFS_s3_bucket_size_bytes: logical size (deduplicated across replicas) - SeaweedFS_s3_bucket_physical_size_bytes: physical size (including replicas) - SeaweedFS_s3_bucket_object_count: object count (deduplicated) Collection runs every 1 minute via background goroutine that queries filer Statistics RPC for each bucket's collection. Also adds Grafana dashboard panels for: - S3 Bucket Size (logical vs physical) - S3 Bucket Object Count * address PR comments: fix bucket size metrics collection 1. Fix collectCollectionInfoFromMaster to use master VolumeList API - Now properly queries master for topology info - Uses WithMasterClient to get volume list from master - Correctly calculates logical vs physical size based on replication 2. Return error when filerClient is nil to trigger fallback - Changed from 'return nil, nil' to 'return nil, error' - Ensures fallback to filer stats is properly triggered 3. Implement pagination in listBucketNames - Added listBucketPageSize constant (1000) - Uses StartFromFileName for pagination - Continues fetching until fewer entries than limit returned 4. Handle NewReplicaPlacementFromByte error and prevent division by zero - Check error return from NewReplicaPlacementFromByte - Default to 1 copy if error occurs - Add explicit check for copyCount == 0 * simplify bucket size metrics: remove filer fallback, align with quota enforcement - Remove fallback to filer Statistics RPC - Use only master topology for collection info (same as s3.bucket.quota.enforce) - Updated comments to clarify this runs the same collection logic as quota enforcement - Simplified code by removing collectBucketSizeFromFilerStats * use s3a.option.Masters directly instead of querying filer * address PR comments: fix dashboard overlaps and improve metrics collection Grafana dashboard fixes: - Fix overlapping panels 55 and 59 in grafana_seaweedfs.json (moved 59 to y=30) - Fix grid collision in k8s dashboard (moved panel 72 to y=48) - Aggregate bucket metrics with max() by (bucket) for multi-instance S3 gateways Go code improvements: - Add graceful shutdown support via context cancellation - Use ticker instead of time.Sleep for better shutdown responsiveness - Distinguish EOF from actual errors in stream handling * improve bucket size metrics: multi-master failover and proper error handling - Initial delay now respects context cancellation using select with time.After - Use WithOneOfGrpcMasterClients for multi-master failover instead of hardcoding Masters[0] - Properly propagate stream errors instead of just logging them (EOF vs real errors) * improve bucket size metrics: distributed lock and volume ID deduplication - Add distributed lock (LiveLock) so only one S3 instance collects metrics at a time - Add IsLocked() method to LiveLock for checking lock status - Fix deduplication: use volume ID tracking instead of dividing by copyCount - Previous approach gave wrong results if replicas were missing - Now tracks seen volume IDs and counts each volume only once - Physical size still includes all replicas for accurate disk usage reporting * rename lock to s3.leader * simplify: remove StartBucketSizeMetricsCollection wrapper function * fix data race: use atomic operations for LiveLock.isLocked field - Change isLocked from bool to int32 - Use atomic.LoadInt32/StoreInt32 for all reads/writes - Sync shared isLocked field in StartLongLivedLock goroutine * add nil check for topology info to prevent panic * fix bucket metrics: use Ticker for consistent intervals, fix pagination logic - Use time.Ticker instead of time.After for consistent interval execution - Fix pagination: count all entries (not just directories) for proper termination - Update lastFileName for all entries to prevent pagination issues * address PR comments: remove redundant atomic store, propagate context - Remove redundant atomic.StoreInt32 in StartLongLivedLock (AttemptToLock already sets it) - Propagate context through metrics collection for proper cancellation on shutdown - collectAndUpdateBucketSizeMetrics now accepts ctx - collectCollectionInfoFromMaster uses ctx for VolumeList RPC - listBucketNames uses ctx for ListEntries RPC	2025-12-15 19:23:25 -08:00
Chris Lu	60649460b2	fix: default policy storeType to memory when not specified (#7754 ) When loading IAM config from JSON, if the policy section exists but storeType is not specified, default to 'memory' instead of 'filer'. This ensures that policies defined in JSON config files are properly loaded into memory for test environments and standalone setups that don't rely on the filer for policy persistence.	2025-12-14 21:16:02 -08:00
Chris Lu	f41925b60b	Embed IAM API into S3 server (#7740 ) * Embed IAM API into S3 server This change simplifies the S3 and IAM deployment by embedding the IAM API directly into the S3 server, following the patterns used by MinIO and Ceph RGW. Changes: - Add -iam flag to S3 server (enabled by default) - Create embedded IAM API handler in s3api package - Register IAM routes (POST to /) in S3 server when enabled - Deprecate standalone 'weed iam' command with warning Benefits: - Single binary, single port for both S3 and IAM APIs - Simpler deployment and configuration - Shared credential manager between S3 and IAM - Backward compatible: 'weed iam' still works with deprecation warning Usage: - weed s3 -port=8333 # S3 + IAM on same port (default) - weed s3 -iam=false # S3 only, disable embedded IAM - weed iam -port=8111 # Deprecated, shows warning * Fix nil pointer panic: add s3.iam flag to weed server command The enableIam field was not initialized when running S3 via 'weed server', causing a nil pointer dereference when checking s3opt.enableIam. Fix nil pointer panic: add s3.iam flag to weed filer command The enableIam field was not initialized when running S3 via 'weed filer -s3', causing a nil pointer dereference when checking s3opt.enableIam. Add integration tests for embedded IAM API Tests cover: - CreateUser, ListUsers, GetUser, UpdateUser, DeleteUser - CreateAccessKey, DeleteAccessKey, ListAccessKeys - CreatePolicy, PutUserPolicy, GetUserPolicy - Implicit username extraction from authorization header - Full user lifecycle workflow test These tests validate the embedded IAM API functionality that was added in the S3 server, ensuring IAM operations work correctly when served from the same port as S3. * Security: Use crypto/rand for IAM credential generation SECURITY FIX: Replace math/rand with crypto/rand for generating access keys and secret keys. Using math/rand is not cryptographically secure and can lead to predictable credentials. This change: 1. Replaces math/rand with crypto/rand in both: - weed/s3api/s3api_embedded_iam.go (embedded IAM) - weed/iamapi/iamapi_management_handlers.go (standalone IAM) 2. Removes the seededRand variable that was initialized with time-based seed (predictable) 3. Updates StringWithCharset/iamStringWithCharset to: - Use crypto/rand.Int() for secure random index generation - Return an error for proper error handling 4. Updates CreateAccessKey to handle the new error return 5. Updates DoActions handlers to propagate errors properly * Fix critical bug: DeleteUserPolicy was deleting entire user instead of policy BUG FIX: DeleteUserPolicy was incorrectly deleting the entire user identity from s3cfg.Identities instead of just clearing the user's inline policy (Actions). Before (wrong): s3cfg.Identities = append(s3cfg.Identities[:i], s3cfg.Identities[i+1:]...) After (correct): ident.Actions = nil Also: - Added proper iamDeleteUserPolicyResponse / DeleteUserPolicyResponse types - Fixed return type from iamPutUserPolicyResponse to iamDeleteUserPolicyResponse Affected files: - weed/s3api/s3api_embedded_iam.go (embedded IAM) - weed/iamapi/iamapi_management_handlers.go (standalone IAM) - weed/iamapi/iamapi_response.go (response types) * Add tests for DeleteUserPolicy to prevent regression Added two tests: 1. TestEmbeddedIamDeleteUserPolicy - Verifies that: - User is NOT deleted (identity still exists) - Credentials are NOT deleted - Only Actions (policy) are cleared to nil 2. TestEmbeddedIamDeleteUserPolicyUserNotFound - Verifies: - Returns 404 when user doesn't exist These tests ensure the bug fixed in the previous commit (deleting user instead of policy) doesn't regress. * Fix race condition: Add mutex lock to IAM DoActions The DoActions function performs a read-modify-write operation on the shared IAM configuration without any locking. This could lead to race conditions and data loss if multiple requests modify the IAM config concurrently. Added mutex lock at the start of DoActions in both: - weed/s3api/s3api_embedded_iam.go (embedded IAM) - weed/iamapi/iamapi_management_handlers.go (standalone IAM) The lock protects the entire read-modify-write cycle: 1. GetS3ApiConfiguration (read) 2. Modify s3cfg based on action 3. PutS3ApiConfiguration (write) * Fix action comparison and document CreatePolicy limitation 1. Replace reflect.DeepEqual with order-independent string slice comparison - Added iamStringSlicesEqual/stringSlicesEqual helper functions - Prevents duplicate policy statements when actions are in different order 2. Document CreatePolicy limitation in embedded IAM - Added TODO comment explaining that managed policies are not persisted - Users should use PutUserPolicy for inline policies 3. Fix deadlock in standalone IAM's CreatePolicy - Removed nested lock acquisition (DoActions already holds the lock) Files changed: - weed/s3api/s3api_embedded_iam.go - weed/iamapi/iamapi_management_handlers.go * Add rate limiting to embedded IAM endpoint Apply circuit breaker rate limiting to the IAM endpoint to prevent abuse. Also added request tracking for IAM operations. The IAM endpoint now follows the same pattern as other S3 endpoints: - track() for request metrics - s3a.iam.Auth() for authentication - s3a.cb.Limit() for rate limiting * Fix handleImplicitUsername to properly look up username from AccessKeyId According to AWS spec, when UserName is not specified in an IAM request, IAM should determine the username implicitly based on the AccessKeyId signing the request. Previously, the code incorrectly extracted s[2] (region field) from the SigV4 credential string as the username. This fix: 1. Extracts the AccessKeyId from s[0] of the credential string 2. Looks up the AccessKeyId in the credential store using LookupByAccessKey 3. Uses the identity's Name field as the username if found Also: - Added exported LookupByAccessKey wrapper method to IdentityAccessManagement - Updated tests to verify correct access key lookup behavior - Applied fix to both embedded IAM and standalone IAM implementations * Fix CreatePolicy to not trigger unnecessary save CreatePolicy validates the policy document and returns metadata but does not actually store the policy (SeaweedFS uses inline policies attached via PutUserPolicy). However, 'changed' was left as true, triggering an unnecessary save operation. Set changed = false after successful CreatePolicy validation in both embedded IAM and standalone IAM implementations. * Improve embedded IAM test quality - Remove unused mock types (mockCredentialManager, mockEmbeddedIamApi) - Use proto.Clone instead of proto.Merge for proper deep copy semantics - Replace brittle regex-based XML error extraction with proper XML unmarshalling - Remove unused regexp import - Add state and field assertions to tests: - CreateUser: verify username in response and user persisted in config - ListUsers: verify response contains expected users - GetUser: verify username in response - CreatePolicy: verify policy metadata in response - PutUserPolicy: verify actions were attached to user - CreateAccessKey: verify credentials in response and persisted in config * Remove shared test state and improve executeEmbeddedIamRequest - Remove package-level embeddedIamApi variable to avoid shared test state - Update executeEmbeddedIamRequest to accept API instance as parameter - Only call xml.Unmarshal when v != nil, making nil-v cases explicit - Return unmarshal error properly instead of always returning it - Update all tests to create their own EmbeddedIamApiForTest instance - Each test now has isolated state, preventing test interdependencies * Add comprehensive test coverage for embedded IAM Added tests for previously uncovered functions: - iamStringSlicesEqual: 0% → 100% - iamMapToStatementAction: 40% → 100% - iamMapToIdentitiesAction: 30% → 70% - iamHash: 100% - iamStringWithCharset: 85.7% - GetPolicyDocument: 75% → 100% - CreatePolicy: 91.7% → 100% - DeleteUser: 83.3% → 100% - GetUser: 83.3% → 100% - ListAccessKeys: 55.6% → 88.9% New test cases for helper functions, error handling, and edge cases. * Document IAM code duplication and reference GitHub issue #7747 Added comments to both IAM implementations noting the code duplication and referencing the tracking issue for future refactoring: - weed/s3api/s3api_embedded_iam.go (embedded IAM) - weed/iamapi/iamapi_management_handlers.go (standalone IAM) See: https://github.com/seaweedfs/seaweedfs/issues/7747 * Implement granular IAM authorization for self-service operations Previously, all IAM actions required ACTION_ADMIN permission, which was overly restrictive. This change implements AWS-like granular permissions: Self-service operations (allowed without admin for own resources): - CreateAccessKey (on own user) - DeleteAccessKey (on own user) - ListAccessKeys (on own user) - GetUser (on own user) - UpdateAccessKey (on own user) Admin-only operations: - CreateUser, DeleteUser, UpdateUser - PutUserPolicy, GetUserPolicy, DeleteUserPolicy - CreatePolicy - ListUsers - Operations on other users The new AuthIam middleware: 1. Authenticates the request (signature verification) 2. Parses the IAM Action and target UserName 3. For self-service actions, allows if user is operating on own resources 4. For all other actions or operations on other users, requires admin * Fix misleading comment in standalone IAM CreatePolicy The comment incorrectly stated that CreatePolicy only validates the policy document. In the standalone IAM server, CreatePolicy actually persists the policy via iama.s3ApiConfig.PutPolicies(). The changed flag is false because it doesn't modify s3cfg.Identities, not because nothing is stored. * Simplify IAM auth and add RequestId to responses - Remove redundant ACTION_ADMIN fallback in AuthIam: The action parameter in authRequest is for permission checking, not signature verification. If auth fails with ACTION_READ, it will fail with ACTION_ADMIN too. - Add SetRequestId() call before writing IAM responses for AWS compatibility. All IAM response structs embed iamCommonResponse which has SetRequestId(). * Address code review feedback for IAM implementation 1. auth_credentials.go: Add documentation warning that LookupByAccessKey returns internal pointers that should not be mutated. 2. iamapi_management_handlers.go & s3api_embedded_iam.go: Add input guards for StringWithCharset/iamStringWithCharset when length <= 0 or charset is empty to avoid runtime errors from rand.Int. 3. s3api_embedded_iam_test.go: Don't ignore xml.Marshal errors in test DoActions handler. Return proper error response if marshaling fails. 4. s3api_embedded_iam_test.go: Use obviously fake access key IDs (AKIATESTFAKEKEY) to avoid CI secret scanner false positives. Address code review feedback for IAM implementation (batch 2) 1. iamapi/iamapi_management_handlers.go: - Redact Authorization header log (security: avoid exposing signature) - Add nil-guard for iama.iam before LookupByAccessKey call 2. iamapi/iamapi_test.go: - Replace real-looking access keys with obviously fake ones (AKIATESTFAKEKEY) to avoid CI secret scanner false positives 3. s3api/s3api_embedded_iam.go - CreateUser: - Validate UserName is not empty (return ErrCodeInvalidInputException) - Check for duplicate users (return ErrCodeEntityAlreadyExistsException) 4. s3api/s3api_embedded_iam.go - CreateAccessKey: - Return ErrCodeNoSuchEntityException if user doesn't exist - Removed implicit user creation behavior 5. s3api/s3api_embedded_iam.go - getActions: - Fix S3 ARN parsing for bucket/path patterns - Handle mybucket, mybucket/, mybucket/path/* correctly - Return error if no valid actions found in policy 6. s3api/s3api_embedded_iam.go - handleImplicitUsername: - Redact Authorization header log - Add nil-guard for e.iam 7. s3api/s3api_embedded_iam.go - DoActions: - Reload in-memory IAM maps after credential mutations - Call LoadS3ApiConfigurationFromCredentialManager after save 8. s3api/auth_credentials.go - AuthSignatureOnly: - Add new signature-only authentication method - Bypasses S3 authorization checks for IAM operations - Used by AuthIam to properly separate signature verification from IAM-specific permission checks * Fix nil pointer dereference and error handling in IAM 1. AuthIam: Add nil check for identity after AuthSignatureOnly - AuthSignatureOnly can return nil identity with ErrNone for authTypePostPolicy or authTypeStreamingUnsigned - Now returns ErrAccessDenied if identity is nil 2. writeIamErrorResponse: Add missing error code cases - ErrCodeEntityAlreadyExistsException -> HTTP 409 Conflict - ErrCodeInvalidInputException -> HTTP 400 Bad Request 3. UpdateUser: Use consistent error handling - Changed from direct ErrInvalidRequest to writeIamErrorResponse - Now returns correct HTTP status codes based on error type * Add IAM config reload to standalone IAM server after mutations Match the behavior of embedded IAM (s3api_embedded_iam.go) by reloading the in-memory identity maps after persisting configuration changes. This ensures newly created access keys are visible to LookupByAccessKey immediately without requiring a server restart. * Minor improvements to test helpers and log masking 1. iamapi_test.go: Update mustMarshalJSON to use t.Helper() and t.Fatal() instead of panic() for better test diagnostics 2. s3api_embedded_iam.go: Mask access key in 'not found' log message to avoid exposing full access key IDs in logs * Mask access key in standalone IAM log message for consistency Match the embedded IAM version by masking the access key ID in the 'not found' log message (show only first 4 chars).	2025-12-14 16:02:06 -08:00
Chris Lu	d80d8be012	fix(s3): start KeepConnectedToMaster for filer discovery with filerGroup (#7732 ) Fixes #7721 When S3 server is configured with a filerGroup, it creates a MasterClient to enable dynamic filer discovery. However, the KeepConnectedToMaster() background goroutine was never started, causing GetMaster() to block indefinitely in WaitUntilConnected(). This resulted in the log message: WaitUntilConnected still waiting for master connection (attempt N)... being logged repeatedly every ~20 seconds. The fix adds the missing 'go masterClient.KeepConnectedToMaster(ctx)' call to properly establish the connection to master servers. Also adds unit tests to verify WaitUntilConnected respects context cancellation.	2025-12-13 12:10:15 -08:00
Chris Lu	d6d893c8c3	s3: add s3:ExistingObjectTag condition support for bucket policies (#7677 ) * s3: add s3:ExistingObjectTag condition support in policy engine Add support for s3:ExistingObjectTag/<tag-key> condition keys in bucket policies, allowing access control based on object tags. Changes: - Add ObjectEntry field to PolicyEvaluationArgs (entry.Extended metadata) - Update EvaluateConditions to handle s3:ExistingObjectTag/<key> format - Extract tag value from entry metadata using X-Amz-Tagging-<key> prefix This enables policies like: { "Condition": { "StringEquals": { "s3:ExistingObjectTag/status": ["public"] } } } Fixes: https://github.com/seaweedfs/seaweedfs/issues/7447 * s3: update EvaluatePolicy to accept object entry for tag conditions Update BucketPolicyEngine.EvaluatePolicy to accept objectEntry parameter (entry.Extended metadata) for evaluating tag-based policy conditions. Changes: - Add objectEntry parameter to EvaluatePolicy method - Update callers in auth_credentials.go and s3api_bucket_handlers.go - Pass nil for objectEntry in auth layer (entry fetched later in handlers) For tag-based conditions to work, handlers should call EvaluatePolicy with the object's entry.Extended after fetching the entry from filer. * s3: add tests for s3:ExistingObjectTag policy conditions Add comprehensive tests for object tag-based policy conditions: - TestExistingObjectTagCondition: Basic tag matching scenarios - Matching/non-matching tag values - Missing tags, no tags, empty tags - Multiple tags with one matching - TestExistingObjectTagConditionMultipleTags: Multiple tag conditions - Both tags match - Only one tag matches - TestExistingObjectTagDenyPolicy: Deny policies with tag conditions - Default allow without tag - Deny when specific tag present * s3: document s3:ExistingObjectTag support and feature status Update policy engine documentation: - Add s3:ExistingObjectTag/<tag-key> to supported condition keys - Add 'Object Tag-Based Access Control' section with examples - Add 'Feature Status' section with implemented and planned features Planned features for future implementation: - s3:RequestObjectTag/<key> - s3:RequestObjectTagKeys - s3:x-amz-server-side-encryption - Cross-account access * Implement tag-based policy re-check in handlers - Add checkPolicyWithEntry helper to S3ApiServer for handlers to re-check policy after fetching object entry (for s3:ExistingObjectTag conditions) - Add HasPolicyForBucket method to policy engine for efficient check - Integrate policy re-check in GetObjectHandler after entry is fetched - Integrate policy re-check in HeadObjectHandler after entry is fetched - Update auth_credentials.go comments to explain two-phase evaluation - Update documentation with supported operations for tag-based conditions This implements 'Approach 1' where handlers re-check the policy with the object entry after fetching it, allowing tag-based conditions to be properly evaluated. * Add integration tests for s3:ExistingObjectTag conditions - Add TestCheckPolicyWithEntry: tests checkPolicyWithEntry helper with various tag scenarios (matching tags, non-matching tags, empty entry, nil entry) - Add TestCheckPolicyWithEntryNoPolicyForBucket: tests early return when no policy - Add TestCheckPolicyWithEntryNilPolicyEngine: tests nil engine handling - Add TestCheckPolicyWithEntryDenyPolicy: tests deny policies with tag conditions - Add TestHasPolicyForBucket: tests HasPolicyForBucket method These tests cover the Phase 2 policy evaluation with object entry metadata, ensuring tag-based conditions are properly evaluated. * Address code review nitpicks - Remove unused extractObjectTags placeholder function (engine.go) - Add clarifying comment about s3:ExistingObjectTag/<key> evaluation - Consolidate duplicate tag-based examples in README - Factor out tagsToEntry helper to package level in tests * Address code review feedback - Fix unsafe type assertions in GetObjectHandler and HeadObjectHandler when getting identity from context (properly handle type assertion failure) - Extract getConditionContextValue helper to eliminate duplicated logic between EvaluateConditions and EvaluateConditionsLegacy - Ensure consistent handling of missing condition keys (always return empty slice) * Fix GetObjectHandler to match HeadObjectHandler pattern Add safety check for nil objectEntryForSSE before tag-based policy evaluation, ensuring tag-based conditions are always evaluated rather than silently skipped if entry is unexpectedly nil. Addresses review comment from Copilot. * Fix HeadObject action name in docs for consistency Change 'HeadObject' to 's3:HeadObject' to match other action names. * Extract recheckPolicyWithObjectEntry helper to reduce duplication Move the repeated identity extraction and policy re-check logic from GetObjectHandler and HeadObjectHandler into a shared helper method. * Add validation for empty tag key in s3:ExistingObjectTag condition Prevent potential issues with malformed policies containing s3:ExistingObjectTag/ (empty tag key after slash).	2025-12-09 09:48:13 -08:00
Chris Lu	f5c0bcafa3	s3: fix ListBuckets not showing buckets created by authenticated users (#7648 ) * s3: fix ListBuckets not showing buckets created by authenticated users Fixes #7647 ## Problem Users with proper Admin permissions could create buckets but couldn't list them. The issue occurred because ListBucketsHandler was not wrapped with the Auth middleware, so the authenticated identity was never set in the request context. ## Root Cause - PutBucketHandler uses iam.Auth() middleware which sets identity in context - ListBucketsHandler did NOT use iam.Auth() middleware - Without the middleware, GetIdentityNameFromContext() returned empty string - Bucket ownership checks failed because no identity was present ## Changes 1. Wrap ListBucketsHandler with iam.Auth() middleware (s3api_server.go) 2. Update ListBucketsHandler to get identity from context (s3api_bucket_handlers.go) 3. Add lookupByIdentityName() helper method (auth_credentials.go) 4. Add comprehensive test TestListBucketsIssue7647 (s3api_bucket_handlers_test.go) ## Testing - All existing tests pass (1348 tests in s3api package) - New test TestListBucketsIssue7647 validates the fix - Verified admin users can see their created buckets - Verified admin users can see all buckets - Verified backward compatibility maintained * s3: fix ListBuckets for JWT/Keycloak authentication The previous fix broke JWT/Keycloak authentication because JWT identities are created on-the-fly and not stored in the iam.identities list. The lookupByIdentityName() would return nil for JWT users. Solution: Store the full Identity object in the request context, not just the name. This allows ListBucketsHandler to retrieve the complete identity for all authentication types (SigV2, SigV4, JWT, Anonymous). Changes: - Add SetIdentityInContext/GetIdentityFromContext in s3_constants/header.go - Update Auth middleware to store full identity in context - Update ListBucketsHandler to retrieve identity from context first, with fallback to lookup for backward compatibility * s3: optimize lookupByIdentityName to O(1) using map Address code review feedback: Use a map for O(1) lookups instead of O(N) linear scan through identities list. Changes: - Add nameToIdentity map to IdentityAccessManagement struct - Populate map in loadS3ApiConfiguration (consistent with accessKeyIdent pattern) - Update lookupByIdentityName to use map lookup instead of loop This improves performance when many identities are configured and aligns with the existing pattern used for accessKeyIdent lookups. * s3: address code review feedback on nameToIdentity and logging Address two code review points: 1. Wire nameToIdentity into env-var fallback path - The AWS env-var fallback in NewIdentityAccessManagementWithStore now populates nameToIdentity map along with accessKeyIdent - Keeps all identity lookup maps in sync - Avoids potential issues if handlers rely on lookupByIdentityName 2. Improve access key lookup logging - Reduce log verbosity: V(1) -> V(2) for failed lookups - Truncate access keys in logs (show first 4 chars + **) - Include key length for debugging - Prevents credential exposure in production logs - Reduces log noise from misconfigured clients fmt * s3: refactor truncation logic and improve error handling Address additional code review feedback: 1. DRY principle: Extract key truncation logic into local function - Define truncate() helper at function start - Reuse throughout lookupByAccessKey - Eliminates code duplication 2. Enhanced security: Mask very short access keys - Keys <= 4 chars now show as '***' instead of full key - Prevents any credential exposure even for short keys - Consistent masking across all log statements 3. Improved robustness: Add warning log for type assertion failure - Log unexpected type when identity context object is wrong type - Helps debug potential middleware or context issues - Better production diagnostics 4. Documentation: Add comment about future optimization opportunity - Note potential for lightweight identity view in context - Suggests credential-free view for better data minimization - Documents design decision for future maintainers	2025-12-08 01:24:12 -08:00
chrislu	5167bbd2a9	Remove deprecated allowEmptyFolder CLI option The allowEmptyFolder option is no longer functional because: 1. The code that used it was already commented out 2. Empty folder cleanup is now handled asynchronously by EmptyFolderCleaner The CLI flags are kept for backward compatibility but marked as deprecated and ignored. This removes: - S3ApiServerOption.AllowEmptyFolder field - The actual usage in s3api_object_handlers_list.go - Helm chart values and template references - References in test Makefiles and docker-compose files	2025-12-06 21:54:12 -08:00
Chris Lu	edf0ef7a80	Filer, S3: Feature/add concurrent file upload limit (#7554 ) * Support multiple filers for S3 and IAM servers with automatic failover This change adds support for multiple filer addresses in the 'weed s3' and 'weed iam' commands, enabling high availability through automatic failover. Key changes: - Updated S3ApiServerOption.Filer to Filers ([]pb.ServerAddress) - Updated IamServerOption.Filer to Filers ([]pb.ServerAddress) - Modified -filer flag to accept comma-separated addresses - Added getFilerAddress() helper methods for backward compatibility - Updated all filer client calls to support multiple addresses - Uses pb.WithOneOfGrpcFilerClients for automatic failover Usage: weed s3 -filer=localhost:8888,localhost:8889 weed iam -filer=localhost:8888,localhost:8889 The underlying FilerClient already supported multiple filers with health tracking and automatic failover - this change exposes that capability through the command-line interface. * Add filer discovery: treat initial filers as seeds and discover peers from master Enhances FilerClient to automatically discover additional filers in the same filer group by querying the master server. This allows users to specify just a few seed filers, and the client will discover all other filers in the cluster. Key changes to wdclient/FilerClient: - Added MasterClient, FilerGroup, and DiscoveryInterval fields - Added thread-safe filer list management with RWMutex - Implemented discoverFilers() background goroutine - Uses cluster.ListExistingPeerUpdates() to query master for filers - Automatically adds newly discovered filers to the list - Added Close() method to clean up discovery goroutine New FilerClientOption fields: - MasterClient: enables filer discovery from master - FilerGroup: specifies which filer group to discover - DiscoveryInterval: how often to refresh (default 5 minutes) Usage example: masterClient := wdclient.NewMasterClient(...) filerClient := wdclient.NewFilerClient( []pb.ServerAddress{"localhost:8888"}, // seed filers grpcDialOption, dataCenter, &wdclient.FilerClientOption{ MasterClient: masterClient, FilerGroup: "my-group", }, ) defer filerClient.Close() The initial filers act as seeds - the client discovers and adds all other filers in the same group from the master. Discovered filers are added dynamically without removing existing ones (relying on health checks for unavailable filers). * Address PR review comments: implement full failover for IAM operations Critical fixes based on code review feedback: 1. IAM API Failover (Critical): - Replace pb.WithGrpcFilerClient with pb.WithOneOfGrpcFilerClients in: * GetS3ApiConfigurationFromFiler() * PutS3ApiConfigurationToFiler() * GetPolicies() * PutPolicies() - Now all IAM operations support automatic failover across multiple filers 2. Validation Improvements: - Add validation in NewIamApiServerWithStore() to require at least one filer - Add validation in NewS3ApiServerWithStore() to require at least one filer - Add warning log when no filers configured for credential store 3. Error Logging: - Circuit breaker now logs when config load fails instead of silently ignoring - Helps operators understand why circuit breaker limits aren't applied 4. Code Quality: - Use ToGrpcAddress() for filer address in credential store setup - More consistent with rest of codebase and future-proof These changes ensure IAM operations have the same high availability guarantees as S3 operations, completing the multi-filer failover implementation. * Fix IAM manager initialization: remove code duplication, add TODO for HA Addresses review comment on s3api_server.go:145 Changes: - Remove duplicate code for getting first filer address - Extract filerAddr variable once and reuse - Add TODO comment documenting the HA limitation for IAM manager - Document that loadIAMManagerFromConfig and NewS3IAMIntegration need updates to support multiple filers for full HA Note: This is a known limitation when using filer-backed IAM stores. The interfaces need to be updated to accept multiple filer addresses. For now, documenting this limitation clearly. * Document credential store HA limitation with TODO Addresses review comment on auth_credentials.go:149 Changes: - Add TODO comment documenting that SetFilerClient interface needs update for multi-filer support - Add informative log message indicating HA limitation - Document that this is a known limitation for filer-backed credential stores The SetFilerClient interface currently only accepts a single filer address. To properly support HA, the credential store interfaces need to be updated to handle multiple filer addresses. * Track current active filer in FilerClient for better HA Add GetCurrentFiler() method to FilerClient that returns the currently active filer based on the filerIndex which is updated on successful operations. This provides better availability than always using the first filer. Changes: - Add FilerClient.GetCurrentFiler() method that returns current active filer - Update S3ApiServer.getFilerAddress() to use FilerClient's current filer - Add fallback to first filer if FilerClient not yet initialized - Document IAM limitation (doesn't have FilerClient access) Benefits: - Single-filer operations (URLs, ReadFilerConf, etc.) now use the currently active/healthy filer - Better distribution and failover behavior - FilerClient's round-robin and health tracking automatically determines which filer to use * Document ReadFilerConf HA limitation in lifecycle handlers Addresses review comment on s3api_bucket_handlers.go:880 Add comment documenting that ReadFilerConf uses the current active filer from FilerClient (which is better than always using first filer), but doesn't have built-in multi-filer failover. Add TODO to update filer.ReadFilerConf to support multiple filers for complete HA. For now, it uses the currently active/healthy filer tracked by FilerClient which provides reasonable availability. * Document multipart upload URL HA limitation Addresses review comment on s3api_object_handlers_multipart.go:442 Add comment documenting that part upload URLs point to the current active filer (tracked by FilerClient), which is better than always using the first filer but still creates a potential point of failure if that filer becomes unavailable during upload. Suggest TODO solutions: - Use virtual hostname/load balancer for filers - Have S3 server proxy uploads to healthy filers Current behavior provides reasonable availability by using the currently active/healthy filer rather than being pinned to first filer. * Document multipart completion Location URL limitation Addresses review comment on filer_multipart.go:187 Add comment documenting that the Location URL in CompleteMultipartUpload response points to the current active filer (tracked by FilerClient). Note that clients should ideally use the S3 API endpoint rather than this direct URL. If direct access is attempted and the specific filer is unavailable, the request will fail. Current behavior uses the currently active/healthy filer rather than being pinned to the first filer, providing better availability. * Make credential store use current active filer for HA Update FilerEtcStore to use a function that returns the current active filer instead of a fixed address, enabling high availability. Changes: - Add SetFilerAddressFunc() method to FilerEtcStore - Store uses filerAddressFunc instead of fixed filerGrpcAddress - withFilerClient() calls the function to get current active filer - Keep SetFilerClient() for backward compatibility (marked deprecated) - Update S3ApiServer to pass FilerClient.GetCurrentFiler to store Benefits: - Credential store now uses currently active/healthy filer - Automatic failover when filer becomes unavailable - True HA for credential operations - Backward compatible with old SetFilerClient interface This addresses the credential store limitation - no longer pinned to first filer, uses FilerClient's tracked current active filer. * Clarify multipart URL comments: filer address not used for uploads Update comments to reflect that multipart upload URLs are not actually used for upload traffic - uploads go directly to volume servers. Key clarifications: - genPartUploadUrl: Filer address is parsed out, only path is used - CompleteMultipartUpload Location: Informational field per AWS S3 spec - Actual uploads bypass filer proxy and go directly to volume servers The filer address in these URLs is NOT a HA concern because: 1. Part uploads: URL is parsed for path, upload goes to volume servers 2. Location URL: Informational only, clients use S3 endpoint This addresses the observation that S3 uploads don't go through filers, only metadata operations do. * Remove filer address from upload paths - pass path directly Eliminate unnecessary filer address from upload URLs by passing file paths directly instead of full URLs that get immediately parsed. Changes: - Rename genPartUploadUrl() → genPartUploadPath() (returns path only) - Rename toFilerUrl() → toFilerPath() (returns path only) - Update putToFiler() to accept filePath instead of uploadUrl - Remove URL parsing code (no longer needed) - Remove net/url import (no longer used) - Keep old function names as deprecated wrappers for compatibility Benefits: - Cleaner code - no fake URL construction/parsing - No dependency on filer address for internal operations - More accurate naming (these are paths, not URLs) - Eliminates confusion about HA concerns This completely removes the filer address from upload operations - it was never actually used for routing, only parsed for the path. * Remove deprecated functions: use new path-based functions directly Remove deprecated wrapper functions and update all callers to use the new function names directly. Removed: - genPartUploadUrl() → all callers now use genPartUploadPath() - toFilerUrl() → all callers now use toFilerPath() - SetFilerClient() → removed along with fallback code Updated: - s3api_object_handlers_multipart.go: uploadUrl → filePath - s3api_object_handlers_put.go: uploadUrl → filePath, versionUploadUrl → versionFilePath - s3api_object_versioning.go: toFilerUrl → toFilerPath - s3api_object_handlers_test.go: toFilerUrl → toFilerPath - auth_credentials.go: removed SetFilerClient fallback - filer_etc_store.go: removed deprecated SetFilerClient method Benefits: - Cleaner codebase with no deprecated functions - All variable names accurately reflect that they're paths, not URLs - Single interface for credential stores (SetFilerAddressFunc only) All code now consistently uses the new path-based approach. * Fix toFilerPath: remove URL escaping for raw file paths The toFilerPath function should return raw file paths, not URL-escaped paths. URL escaping was needed when the path was embedded in a URL (old toFilerUrl), but now that we pass paths directly to putToFiler, they should be unescaped. This fixes S3 integration test failures: - test_bucket_listv2_encoding_basic - test_bucket_list_encoding_basic - test_bucket_listv2_delimiter_whitespace - test_bucket_list_delimiter_whitespace The tests were failing because paths were double-encoded (escaped when stored, then escaped again when listed), resulting in %252B instead of %2B for '+' characters. Root cause: When we removed URL parsing in putToFiler, we should have also removed URL escaping in toFilerPath since paths are now used directly without URL encoding/decoding. * Add thread safety to FilerEtcStore and clarify credential store comments Address review suggestions for better thread safety and code clarity: 1. Thread Safety: Add RWMutex to FilerEtcStore - Protects filerAddressFunc and grpcDialOption from concurrent access - Initialize() uses write lock when setting function - SetFilerAddressFunc() uses write lock - withFilerClient() uses read lock to get function and dial option - GetPolicies() uses read lock to check if configured 2. Improved Error Messages: - Prefix errors with "filer_etc:" for easier debugging - "filer address not configured" → "filer_etc: filer address function not configured" - "filer address is empty" → "filer_etc: filer address is empty" 3. Clarified Comments: - auth_credentials.go: Clarify that initial setup is temporary - Document that it's updated in s3api_server.go after FilerClient creation - Remove ambiguity about when FilerClient.GetCurrentFiler is used Benefits: - Safe for concurrent credential operations - Clear error messages for debugging - Explicit documentation of initialization order * Enable filer discovery: pass master addresses to FilerClient Fix two critical issues: 1. Filer Discovery Not Working: Master client was not being passed to FilerClient, so peer discovery couldn't work 2. Credential Store Design: Already uses FilerClient via GetCurrentFiler function - this is the correct design for HA Changes: Command (s3.go): - Read master addresses from GetFilerConfiguration response - Pass masterAddresses to S3ApiServerOption - Log master addresses for visibility S3ApiServerOption: - Add Masters []pb.ServerAddress field for discovery S3ApiServer: - Create MasterClient from Masters when available - Pass MasterClient + FilerGroup to FilerClient via options - Enable discovery with 5-minute refresh interval - Log whether discovery is enabled or disabled Credential Store: - Already correctly uses filerClient.GetCurrentFiler via function - This provides HA without tight coupling to FilerClient struct - Function-based design is clean and thread-safe Discovery Flow: 1. S3 command reads filer config → gets masters + filer group 2. S3ApiServer creates MasterClient from masters 3. FilerClient uses MasterClient to query for peer filers 4. Background goroutine refreshes peer list every 5 minutes 5. Credential store uses GetCurrentFiler to get active filer Now filer discovery actually works! �� * Use S3 endpoint in multipart Location instead of filer address * Add multi-filer failover to ReadFilerConf * Address CodeRabbit review: fix buffer reuse and improve lock safety Address two code review suggestions: 1. Fix buffer reuse in ReadFilerConfFromFilers: - Use local []byte data instead of shared buffer - Prevents partial data from failed attempts affecting successful reads - Creates fresh buffer inside callback for masterClient path - More robust to future changes in read helpers 2. Improve lock safety in FilerClient: - Add WithHealth variants that accept health pointer - Get health pointer while holding lock, then release before calling - Eliminates potential for lock confusion (though no actual deadlock existed) - Clearer separation: lock for data access, atomics for health ops Changes: - ReadFilerConfFromFilers: var data []byte, create buf inside callback - shouldSkipUnhealthyFilerWithHealth(health filerHealth) - recordFilerSuccessWithHealth(health filerHealth) - recordFilerFailureWithHealth(health filerHealth) - Keep old functions for backward compatibility (marked deprecated) - Update LookupVolumeIds to use WithHealth variants Benefits: - More robust multi-filer configuration reading - Clearer lock vs atomic operation boundaries - No lock held during health checks (even though atomics don't block) - Better code organization and maintainability * add constant * Fix IAM manager and post policy to use current active filer * Fix critical race condition and goroutine leak * Update weed/s3api/filer_multipart.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix compilation error and address code review suggestions Address remaining unresolved comments: 1. Fix compilation error: Add missing net/url import - filer_multipart.go used url.PathEscape without import - Added "net/url" to imports 2. Fix Location URL formatting (all 4 occurrences): - Add missing slash between bucket and key - Use url.PathEscape for bucket names - Use urlPathEscape for object keys - Handles special characters in bucket/key names - Before: http://host/bucketkey - After: http://host/bucket/key (properly escaped) 3. Optimize discovery loop (O(NM) → O(N+M)): - Use map for existing filers (O(1) lookup) - Reduces time holding write lock - Better performance with many filers - Before: Nested loop for each discovered filer - After: Build map once, then O(1) lookups Changes: - filer_multipart.go: Import net/url, fix all Location URLs - filer_client.go: Use map for efficient filer discovery Benefits: - Compiles successfully - Proper URL encoding (handles spaces, special chars) - Faster discovery with less lock contention - Production-ready URL formatting Fix race conditions and make Close() idempotent Address CodeRabbit review #3512078995: 1. Critical: Fix unsynchronized read in error message - Line 584 read len(fc.filerAddresses) without lock - Race with refreshFilerList appending to slice - Fixed: Take RLock to read length safely - Prevents race detector warnings 2. Important: Make Close() idempotent - Closing already-closed channel panics - Can happen with layered cleanup in shutdown paths - Fixed: Use sync.Once to ensure single close - Safe to call Close() multiple times now 3. Nitpick: Add warning for empty filer address - getFilerAddress() can return empty string - Helps diagnose unexpected state - Added: Warning log when no filers available 4. Nitpick: Guard deprecated index-based helpers - shouldSkipUnhealthyFiler, recordFilerSuccess/Failure - Accessed filerHealth without lock (races with discovery) - Fixed: Take RLock and check bounds before array access - Prevents index out of bounds and races Changes: - filer_client.go: - Add closeDiscoveryOnce sync.Once field - Use Do() in Close() for idempotent channel close - Add RLock guards to deprecated index-based helpers - Add bounds checking to prevent panics - Synchronized read of filerAddresses length in error - s3api_server.go: - Add warning log when getFilerAddress returns empty Benefits: - No race conditions (passes race detector) - No panic on double-close - Better error diagnostics - Safe with discovery enabled - Production-hardened shutdown logic * Fix hardcoded http scheme and add panic recovery Address CodeRabbit review #3512114811: 1. Major: Fix hardcoded http:// scheme in Location URLs - Location URLs always used http:// regardless of client connection - HTTPS clients got http:// URLs (incorrect) - Fixed: Detect scheme from request - Check X-Forwarded-Proto header (for proxies) first - Check r.TLS != nil for direct HTTPS - Fallback to http for plain connections - Applied to all 4 CompleteMultipartUploadResult locations 2. Major: Add panic recovery to discovery goroutine - Long-running background goroutine could crash entire process - Panic in refreshFilerList would terminate program - Fixed: Add defer recover() with error logging - Goroutine failures now logged, not fatal 3. Note: Close() idempotency already implemented - Review flagged as duplicate issue - Already fixed in commit 3d7a65c7e - sync.Once (closeDiscoveryOnce) prevents double-close panic - Safe to call Close() multiple times Changes: - filer_multipart.go: - Add getRequestScheme() helper function - Update all 4 Location URLs to use dynamic scheme - Format: scheme://host/bucket/key (was: http://...) - filer_client.go: - Add panic recovery to discoverFilers() - Log panics instead of crashing Benefits: - Correct scheme (https/http) in Location URLs - Works behind proxies (X-Forwarded-Proto) - No process crashes from discovery failures - Production-hardened background goroutine - Proper AWS S3 API compliance * filer: add ConcurrentFileUploadLimit option to limit number of concurrent uploads This adds a new configuration option ConcurrentFileUploadLimit that limits the number of concurrent file uploads based on file count, complementing the existing ConcurrentUploadLimit which limits based on total data size. This addresses an OOM vulnerability where requests with missing/zero Content-Length headers could bypass the size-based rate limiter. Changes: - Add ConcurrentFileUploadLimit field to FilerOption - Add inFlightUploads counter to FilerServer - Update upload handler to check both size and count limits - Add -concurrentFileUploadLimit command line flag (default: 0 = unlimited) Fixes #7529 * s3: add ConcurrentFileUploadLimit option to limit number of concurrent uploads This adds a new configuration option ConcurrentFileUploadLimit that limits the number of concurrent file uploads based on file count, complementing the existing ConcurrentUploadLimit which limits based on total data size. This addresses an OOM vulnerability where requests with missing/zero Content-Length headers could bypass the size-based rate limiter. Changes: - Add ConcurrentUploadLimit and ConcurrentFileUploadLimit fields to S3ApiServerOption - Add inFlightDataSize, inFlightUploads, and inFlightDataLimitCond to S3ApiServer - Add s3a reference to CircuitBreaker for upload limiting - Enhance CircuitBreaker.Limit() to apply upload limiting for write actions - Add -concurrentUploadLimitMB and -concurrentFileUploadLimit command line flags - Add s3.concurrentUploadLimitMB and s3.concurrentFileUploadLimit flags to filer command The upload limiting is integrated into the existing CircuitBreaker.Limit() function, avoiding creation of new wrapper functions and reusing the existing handler registration pattern. Fixes #7529 * server: add missing concurrentFileUploadLimit flags for server command The server command was missing the initialization of concurrentFileUploadLimit flags for both filer and S3, causing a nil pointer dereference when starting the server in combined mode. This adds: - filer.concurrentFileUploadLimit flag to server command - s3.concurrentUploadLimitMB flag to server command - s3.concurrentFileUploadLimit flag to server command Fixes the panic: runtime error: invalid memory address or nil pointer dereference at filer.go:332 * http status 503 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-26 12:07:54 -08:00
Chris Lu	5075381060	Support multiple filers for S3 and IAM servers with automatic failover (#7550 ) * Support multiple filers for S3 and IAM servers with automatic failover This change adds support for multiple filer addresses in the 'weed s3' and 'weed iam' commands, enabling high availability through automatic failover. Key changes: - Updated S3ApiServerOption.Filer to Filers ([]pb.ServerAddress) - Updated IamServerOption.Filer to Filers ([]pb.ServerAddress) - Modified -filer flag to accept comma-separated addresses - Added getFilerAddress() helper methods for backward compatibility - Updated all filer client calls to support multiple addresses - Uses pb.WithOneOfGrpcFilerClients for automatic failover Usage: weed s3 -filer=localhost:8888,localhost:8889 weed iam -filer=localhost:8888,localhost:8889 The underlying FilerClient already supported multiple filers with health tracking and automatic failover - this change exposes that capability through the command-line interface. * Add filer discovery: treat initial filers as seeds and discover peers from master Enhances FilerClient to automatically discover additional filers in the same filer group by querying the master server. This allows users to specify just a few seed filers, and the client will discover all other filers in the cluster. Key changes to wdclient/FilerClient: - Added MasterClient, FilerGroup, and DiscoveryInterval fields - Added thread-safe filer list management with RWMutex - Implemented discoverFilers() background goroutine - Uses cluster.ListExistingPeerUpdates() to query master for filers - Automatically adds newly discovered filers to the list - Added Close() method to clean up discovery goroutine New FilerClientOption fields: - MasterClient: enables filer discovery from master - FilerGroup: specifies which filer group to discover - DiscoveryInterval: how often to refresh (default 5 minutes) Usage example: masterClient := wdclient.NewMasterClient(...) filerClient := wdclient.NewFilerClient( []pb.ServerAddress{"localhost:8888"}, // seed filers grpcDialOption, dataCenter, &wdclient.FilerClientOption{ MasterClient: masterClient, FilerGroup: "my-group", }, ) defer filerClient.Close() The initial filers act as seeds - the client discovers and adds all other filers in the same group from the master. Discovered filers are added dynamically without removing existing ones (relying on health checks for unavailable filers). * Address PR review comments: implement full failover for IAM operations Critical fixes based on code review feedback: 1. IAM API Failover (Critical): - Replace pb.WithGrpcFilerClient with pb.WithOneOfGrpcFilerClients in: * GetS3ApiConfigurationFromFiler() * PutS3ApiConfigurationToFiler() * GetPolicies() * PutPolicies() - Now all IAM operations support automatic failover across multiple filers 2. Validation Improvements: - Add validation in NewIamApiServerWithStore() to require at least one filer - Add validation in NewS3ApiServerWithStore() to require at least one filer - Add warning log when no filers configured for credential store 3. Error Logging: - Circuit breaker now logs when config load fails instead of silently ignoring - Helps operators understand why circuit breaker limits aren't applied 4. Code Quality: - Use ToGrpcAddress() for filer address in credential store setup - More consistent with rest of codebase and future-proof These changes ensure IAM operations have the same high availability guarantees as S3 operations, completing the multi-filer failover implementation. * Fix IAM manager initialization: remove code duplication, add TODO for HA Addresses review comment on s3api_server.go:145 Changes: - Remove duplicate code for getting first filer address - Extract filerAddr variable once and reuse - Add TODO comment documenting the HA limitation for IAM manager - Document that loadIAMManagerFromConfig and NewS3IAMIntegration need updates to support multiple filers for full HA Note: This is a known limitation when using filer-backed IAM stores. The interfaces need to be updated to accept multiple filer addresses. For now, documenting this limitation clearly. * Document credential store HA limitation with TODO Addresses review comment on auth_credentials.go:149 Changes: - Add TODO comment documenting that SetFilerClient interface needs update for multi-filer support - Add informative log message indicating HA limitation - Document that this is a known limitation for filer-backed credential stores The SetFilerClient interface currently only accepts a single filer address. To properly support HA, the credential store interfaces need to be updated to handle multiple filer addresses. * Track current active filer in FilerClient for better HA Add GetCurrentFiler() method to FilerClient that returns the currently active filer based on the filerIndex which is updated on successful operations. This provides better availability than always using the first filer. Changes: - Add FilerClient.GetCurrentFiler() method that returns current active filer - Update S3ApiServer.getFilerAddress() to use FilerClient's current filer - Add fallback to first filer if FilerClient not yet initialized - Document IAM limitation (doesn't have FilerClient access) Benefits: - Single-filer operations (URLs, ReadFilerConf, etc.) now use the currently active/healthy filer - Better distribution and failover behavior - FilerClient's round-robin and health tracking automatically determines which filer to use * Document ReadFilerConf HA limitation in lifecycle handlers Addresses review comment on s3api_bucket_handlers.go:880 Add comment documenting that ReadFilerConf uses the current active filer from FilerClient (which is better than always using first filer), but doesn't have built-in multi-filer failover. Add TODO to update filer.ReadFilerConf to support multiple filers for complete HA. For now, it uses the currently active/healthy filer tracked by FilerClient which provides reasonable availability. * Document multipart upload URL HA limitation Addresses review comment on s3api_object_handlers_multipart.go:442 Add comment documenting that part upload URLs point to the current active filer (tracked by FilerClient), which is better than always using the first filer but still creates a potential point of failure if that filer becomes unavailable during upload. Suggest TODO solutions: - Use virtual hostname/load balancer for filers - Have S3 server proxy uploads to healthy filers Current behavior provides reasonable availability by using the currently active/healthy filer rather than being pinned to first filer. * Document multipart completion Location URL limitation Addresses review comment on filer_multipart.go:187 Add comment documenting that the Location URL in CompleteMultipartUpload response points to the current active filer (tracked by FilerClient). Note that clients should ideally use the S3 API endpoint rather than this direct URL. If direct access is attempted and the specific filer is unavailable, the request will fail. Current behavior uses the currently active/healthy filer rather than being pinned to the first filer, providing better availability. * Make credential store use current active filer for HA Update FilerEtcStore to use a function that returns the current active filer instead of a fixed address, enabling high availability. Changes: - Add SetFilerAddressFunc() method to FilerEtcStore - Store uses filerAddressFunc instead of fixed filerGrpcAddress - withFilerClient() calls the function to get current active filer - Keep SetFilerClient() for backward compatibility (marked deprecated) - Update S3ApiServer to pass FilerClient.GetCurrentFiler to store Benefits: - Credential store now uses currently active/healthy filer - Automatic failover when filer becomes unavailable - True HA for credential operations - Backward compatible with old SetFilerClient interface This addresses the credential store limitation - no longer pinned to first filer, uses FilerClient's tracked current active filer. * Clarify multipart URL comments: filer address not used for uploads Update comments to reflect that multipart upload URLs are not actually used for upload traffic - uploads go directly to volume servers. Key clarifications: - genPartUploadUrl: Filer address is parsed out, only path is used - CompleteMultipartUpload Location: Informational field per AWS S3 spec - Actual uploads bypass filer proxy and go directly to volume servers The filer address in these URLs is NOT a HA concern because: 1. Part uploads: URL is parsed for path, upload goes to volume servers 2. Location URL: Informational only, clients use S3 endpoint This addresses the observation that S3 uploads don't go through filers, only metadata operations do. * Remove filer address from upload paths - pass path directly Eliminate unnecessary filer address from upload URLs by passing file paths directly instead of full URLs that get immediately parsed. Changes: - Rename genPartUploadUrl() → genPartUploadPath() (returns path only) - Rename toFilerUrl() → toFilerPath() (returns path only) - Update putToFiler() to accept filePath instead of uploadUrl - Remove URL parsing code (no longer needed) - Remove net/url import (no longer used) - Keep old function names as deprecated wrappers for compatibility Benefits: - Cleaner code - no fake URL construction/parsing - No dependency on filer address for internal operations - More accurate naming (these are paths, not URLs) - Eliminates confusion about HA concerns This completely removes the filer address from upload operations - it was never actually used for routing, only parsed for the path. * Remove deprecated functions: use new path-based functions directly Remove deprecated wrapper functions and update all callers to use the new function names directly. Removed: - genPartUploadUrl() → all callers now use genPartUploadPath() - toFilerUrl() → all callers now use toFilerPath() - SetFilerClient() → removed along with fallback code Updated: - s3api_object_handlers_multipart.go: uploadUrl → filePath - s3api_object_handlers_put.go: uploadUrl → filePath, versionUploadUrl → versionFilePath - s3api_object_versioning.go: toFilerUrl → toFilerPath - s3api_object_handlers_test.go: toFilerUrl → toFilerPath - auth_credentials.go: removed SetFilerClient fallback - filer_etc_store.go: removed deprecated SetFilerClient method Benefits: - Cleaner codebase with no deprecated functions - All variable names accurately reflect that they're paths, not URLs - Single interface for credential stores (SetFilerAddressFunc only) All code now consistently uses the new path-based approach. * Fix toFilerPath: remove URL escaping for raw file paths The toFilerPath function should return raw file paths, not URL-escaped paths. URL escaping was needed when the path was embedded in a URL (old toFilerUrl), but now that we pass paths directly to putToFiler, they should be unescaped. This fixes S3 integration test failures: - test_bucket_listv2_encoding_basic - test_bucket_list_encoding_basic - test_bucket_listv2_delimiter_whitespace - test_bucket_list_delimiter_whitespace The tests were failing because paths were double-encoded (escaped when stored, then escaped again when listed), resulting in %252B instead of %2B for '+' characters. Root cause: When we removed URL parsing in putToFiler, we should have also removed URL escaping in toFilerPath since paths are now used directly without URL encoding/decoding. * Add thread safety to FilerEtcStore and clarify credential store comments Address review suggestions for better thread safety and code clarity: 1. Thread Safety: Add RWMutex to FilerEtcStore - Protects filerAddressFunc and grpcDialOption from concurrent access - Initialize() uses write lock when setting function - SetFilerAddressFunc() uses write lock - withFilerClient() uses read lock to get function and dial option - GetPolicies() uses read lock to check if configured 2. Improved Error Messages: - Prefix errors with "filer_etc:" for easier debugging - "filer address not configured" → "filer_etc: filer address function not configured" - "filer address is empty" → "filer_etc: filer address is empty" 3. Clarified Comments: - auth_credentials.go: Clarify that initial setup is temporary - Document that it's updated in s3api_server.go after FilerClient creation - Remove ambiguity about when FilerClient.GetCurrentFiler is used Benefits: - Safe for concurrent credential operations - Clear error messages for debugging - Explicit documentation of initialization order * Enable filer discovery: pass master addresses to FilerClient Fix two critical issues: 1. Filer Discovery Not Working: Master client was not being passed to FilerClient, so peer discovery couldn't work 2. Credential Store Design: Already uses FilerClient via GetCurrentFiler function - this is the correct design for HA Changes: Command (s3.go): - Read master addresses from GetFilerConfiguration response - Pass masterAddresses to S3ApiServerOption - Log master addresses for visibility S3ApiServerOption: - Add Masters []pb.ServerAddress field for discovery S3ApiServer: - Create MasterClient from Masters when available - Pass MasterClient + FilerGroup to FilerClient via options - Enable discovery with 5-minute refresh interval - Log whether discovery is enabled or disabled Credential Store: - Already correctly uses filerClient.GetCurrentFiler via function - This provides HA without tight coupling to FilerClient struct - Function-based design is clean and thread-safe Discovery Flow: 1. S3 command reads filer config → gets masters + filer group 2. S3ApiServer creates MasterClient from masters 3. FilerClient uses MasterClient to query for peer filers 4. Background goroutine refreshes peer list every 5 minutes 5. Credential store uses GetCurrentFiler to get active filer Now filer discovery actually works! �� * Use S3 endpoint in multipart Location instead of filer address * Add multi-filer failover to ReadFilerConf * Address CodeRabbit review: fix buffer reuse and improve lock safety Address two code review suggestions: 1. Fix buffer reuse in ReadFilerConfFromFilers: - Use local []byte data instead of shared buffer - Prevents partial data from failed attempts affecting successful reads - Creates fresh buffer inside callback for masterClient path - More robust to future changes in read helpers 2. Improve lock safety in FilerClient: - Add WithHealth variants that accept health pointer - Get health pointer while holding lock, then release before calling - Eliminates potential for lock confusion (though no actual deadlock existed) - Clearer separation: lock for data access, atomics for health ops Changes: - ReadFilerConfFromFilers: var data []byte, create buf inside callback - shouldSkipUnhealthyFilerWithHealth(health filerHealth) - recordFilerSuccessWithHealth(health filerHealth) - recordFilerFailureWithHealth(health filerHealth) - Keep old functions for backward compatibility (marked deprecated) - Update LookupVolumeIds to use WithHealth variants Benefits: - More robust multi-filer configuration reading - Clearer lock vs atomic operation boundaries - No lock held during health checks (even though atomics don't block) - Better code organization and maintainability * add constant * Fix IAM manager and post policy to use current active filer * Fix critical race condition and goroutine leak * Update weed/s3api/filer_multipart.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix compilation error and address code review suggestions Address remaining unresolved comments: 1. Fix compilation error: Add missing net/url import - filer_multipart.go used url.PathEscape without import - Added "net/url" to imports 2. Fix Location URL formatting (all 4 occurrences): - Add missing slash between bucket and key - Use url.PathEscape for bucket names - Use urlPathEscape for object keys - Handles special characters in bucket/key names - Before: http://host/bucketkey - After: http://host/bucket/key (properly escaped) 3. Optimize discovery loop (O(NM) → O(N+M)): - Use map for existing filers (O(1) lookup) - Reduces time holding write lock - Better performance with many filers - Before: Nested loop for each discovered filer - After: Build map once, then O(1) lookups Changes: - filer_multipart.go: Import net/url, fix all Location URLs - filer_client.go: Use map for efficient filer discovery Benefits: - Compiles successfully - Proper URL encoding (handles spaces, special chars) - Faster discovery with less lock contention - Production-ready URL formatting Fix race conditions and make Close() idempotent Address CodeRabbit review #3512078995: 1. Critical: Fix unsynchronized read in error message - Line 584 read len(fc.filerAddresses) without lock - Race with refreshFilerList appending to slice - Fixed: Take RLock to read length safely - Prevents race detector warnings 2. Important: Make Close() idempotent - Closing already-closed channel panics - Can happen with layered cleanup in shutdown paths - Fixed: Use sync.Once to ensure single close - Safe to call Close() multiple times now 3. Nitpick: Add warning for empty filer address - getFilerAddress() can return empty string - Helps diagnose unexpected state - Added: Warning log when no filers available 4. Nitpick: Guard deprecated index-based helpers - shouldSkipUnhealthyFiler, recordFilerSuccess/Failure - Accessed filerHealth without lock (races with discovery) - Fixed: Take RLock and check bounds before array access - Prevents index out of bounds and races Changes: - filer_client.go: - Add closeDiscoveryOnce sync.Once field - Use Do() in Close() for idempotent channel close - Add RLock guards to deprecated index-based helpers - Add bounds checking to prevent panics - Synchronized read of filerAddresses length in error - s3api_server.go: - Add warning log when getFilerAddress returns empty Benefits: - No race conditions (passes race detector) - No panic on double-close - Better error diagnostics - Safe with discovery enabled - Production-hardened shutdown logic * Fix hardcoded http scheme and add panic recovery Address CodeRabbit review #3512114811: 1. Major: Fix hardcoded http:// scheme in Location URLs - Location URLs always used http:// regardless of client connection - HTTPS clients got http:// URLs (incorrect) - Fixed: Detect scheme from request - Check X-Forwarded-Proto header (for proxies) first - Check r.TLS != nil for direct HTTPS - Fallback to http for plain connections - Applied to all 4 CompleteMultipartUploadResult locations 2. Major: Add panic recovery to discovery goroutine - Long-running background goroutine could crash entire process - Panic in refreshFilerList would terminate program - Fixed: Add defer recover() with error logging - Goroutine failures now logged, not fatal 3. Note: Close() idempotency already implemented - Review flagged as duplicate issue - Already fixed in commit 3d7a65c7e - sync.Once (closeDiscoveryOnce) prevents double-close panic - Safe to call Close() multiple times Changes: - filer_multipart.go: - Add getRequestScheme() helper function - Update all 4 Location URLs to use dynamic scheme - Format: scheme://host/bucket/key (was: http://...) - filer_client.go: - Add panic recovery to discoverFilers() - Log panics instead of crashing Benefits: - Correct scheme (https/http) in Location URLs - Works behind proxies (X-Forwarded-Proto) - No process crashes from discovery failures - Production-hardened background goroutine - Proper AWS S3 API compliance * Fix S3 WithFilerClient to use filer failover Critical fix for multi-filer deployments: Problem: - S3ApiServer.WithFilerClient() was creating direct connections to ONE filer - Used pb.WithGrpcClient() with single filer address - No failover - if that filer failed, ALL operations failed - Caused test failures: "bucket directory not found" - IAM Integration Tests failing with 500 Internal Error Root Cause: - WithFilerClient bypassed filerClient connection management - Always connected to getFilerAddress() (current filer only) - Didn't retry other filers on failure - All getEntry(), updateEntry(), etc. operations failed if current filer down Solution: 1. Added FilerClient.GetAllFilers() method - Returns snapshot of all filer addresses - Thread-safe copy to avoid races 2. Implemented withFilerClientFailover() - Try current filer first (fast path) - On failure, try all other filers - Log successful failover - Return error only if ALL filers fail 3. Updated WithFilerClient() - Use filerClient for failover when available - Fallback to direct connection for testing/init Impact: ✅ All S3 operations now support multi-filer failover ✅ Bucket metadata reads work with any available filer ✅ Entry operations (getEntry, updateEntry) failover automatically ✅ IAM tests should pass now ✅ Production-ready HA support Files Changed: - wdclient/filer_client.go: Add GetAllFilers() method - s3api/s3api_handlers.go: Implement failover logic This fixes the test failure where bucket operations failed when the primary filer was temporarily unavailable during cleanup. * Update current filer after successful failover Address code review: https://github.com/seaweedfs/seaweedfs/pull/7550#pullrequestreview-3512223723 Issue: After successful failover, the current filer index was not updated. This meant every subsequent request would still try the (potentially unhealthy) original filer first, then failover again. Solution: 1. Added FilerClient.SetCurrentFiler(addr) method: - Finds the index of specified filer address - Atomically updates filerIndex to point to it - Thread-safe with RLock 2. Call SetCurrentFiler after successful failover: - Update happens immediately after successful connection - Future requests start with the known-healthy filer - Reduces unnecessary failover attempts Benefits: ✅ Subsequent requests use healthy filer directly ✅ No repeated failover to same unhealthy filer ✅ Better performance - fast path hits healthy filer ✅ Comment now matches actual behavior * Integrate health tracking with S3 failover Address code review suggestion to leverage existing health tracking instead of simple iteration through all filers. Changes: 1. Added address-based health tracking API to FilerClient: - ShouldSkipUnhealthyFiler(addr) - check circuit breaker - RecordFilerSuccess(addr) - reset failure count - RecordFilerFailure(addr) - increment failure count These methods find the filer by address and delegate to existing WithHealth methods for actual health management. 2. Updated withFilerClientFailover to use health tracking: - Record success/failure for every filer attempt - Skip unhealthy filers during failover (circuit breaker) - Only try filers that haven't exceeded failure threshold - Automatic re-check after reset timeout Benefits:* ✅ Circuit breaker prevents wasting time on known-bad filers ✅ Health tracking shared across all operations ✅ Automatic recovery when unhealthy filers come back ✅ Reduced latency - skip filers in failure state ✅ Better visibility with health metrics Behavior: - Try current filer first (fast path) - If fails, record failure and try other HEALTHY filers - Skip filers with failureCount >= threshold (default 3) - Re-check unhealthy filers after resetTimeout (default 30s) - Record all successes/failures for health tracking * Update weed/wdclient/filer_client.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Enable filer discovery with empty filerGroup Empty filerGroup is a valid value representing the default group. The master client can discover filers even when filerGroup is empty. Change: - Remove the filerGroup != "" check in NewFilerClient - Keep only masterClient != nil check - Empty string will be passed to ListClusterNodes API as-is This enables filer discovery to work with the default group. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-26 11:29:55 -08:00
Chris Lu	c1b8d4bf0d	S3: adds FilerClient to use cached volume id (#7518 ) * adds FilerClient to use cached volume id * refactor: MasterClient embeds vidMapClient to eliminate ~150 lines of duplication - Create masterVolumeProvider that implements VolumeLocationProvider - MasterClient now embeds vidMapClient instead of maintaining duplicate cache logic - Removed duplicate methods: LookupVolumeIdsWithFallback, getStableVidMap, etc. - MasterClient still receives real-time updates via KeepConnected streaming - Updates call inherited addLocation/deleteLocation from vidMapClient - Benefits: DRY principle, shared singleflight, cache chain logic reused - Zero behavioral changes - only architectural improvement * refactor: mount uses FilerClient for efficient volume location caching - Add configurable vidMap cache size (default: 5 historical snapshots) - Add FilerClientOption struct for clean configuration * GrpcTimeout: default 5 seconds (prevents hanging requests) * UrlPreference: PreferUrl or PreferPublicUrl * CacheSize: number of historical vidMap snapshots (for volume moves) - NewFilerClient uses option struct for better API extensibility - Improved error handling in filerVolumeProvider.LookupVolumeIds: * Distinguish genuine 'not found' from communication failures * Log volumes missing from filer response * Return proper error context with volume count * Document that filer Locations lacks Error field (unlike master) - FilerClient.GetLookupFileIdFunction() handles URL preference automatically - Mount (WFS) creates FilerClient with appropriate options - Benefits for weed mount: * Singleflight: Deduplicates concurrent volume lookups * Cache history: Old volume locations available briefly when volumes move * Configurable cache depth: Tune for different deployment environments * Battle-tested vidMap cache with cache chain * Better concurrency handling with timeout protection * Improved error visibility and debugging - Old filer.LookupFn() kept for backward compatibility - Performance improvement for mount operations with high concurrency * fix: prevent vidMap swap race condition in LookupFileIdWithFallback - Hold vidMapLock.RLock() during entire vm.LookupFileId() call - Prevents resetVidMap() from swapping vidMap mid-operation - Ensures atomic access to the current vidMap instance - Added documentation warnings to getStableVidMap() about swap risks - Enhanced withCurrentVidMap() documentation for clarity This fixes a subtle race condition where: 1. Thread A: acquires lock, gets vm pointer, releases lock 2. Thread B: calls resetVidMap(), swaps vc.vidMap 3. Thread A: calls vm.LookupFileId() on old/stale vidMap While the old vidMap remains valid (in cache chain), holding the lock ensures we consistently use the current vidMap for the entire operation. * fix: FilerClient supports multiple filer addresses for high availability Critical fix: FilerClient now accepts []ServerAddress instead of single address - Prevents mount failure when first filer is down (regression fix) - Implements automatic failover to remaining filers - Uses round-robin with atomic index tracking (same pattern as WFS.WithFilerClient) - Retries all configured filers before giving up - Updates successful filer index for future requests Changes: - NewFilerClient([]pb.ServerAddress, ...) instead of (pb.ServerAddress, ...) - filerVolumeProvider references FilerClient for failover access - LookupVolumeIds tries all filers with util.Retry pattern - Mount passes all option.FilerAddresses for HA - S3 wraps single filer in slice for API consistency This restores the high availability that existed in the old implementation where mount would automatically failover between configured filers. * fix: restore leader change detection in KeepConnected stream loop Critical fix: Leader change detection was accidentally removed from the streaming loop - Master can announce leader changes during an active KeepConnected stream - Without this check, client continues talking to non-leader until connection breaks - This can lead to stale data or operational errors The check needs to be in TWO places: 1. Initial response (lines 178-187): Detect redirect on first connect 2. Stream loop (lines 203-209): Detect leader changes during active stream Restored the loop check that was accidentally removed during refactoring. This ensures the client immediately reconnects to new leader when announced. * improve: address code review findings on error handling and documentation 1. Master provider now preserves per-volume errors - Surface detailed errors from master (e.g., misconfiguration, deletion) - Return partial results with aggregated errors using errors.Join - Callers can now distinguish specific volume failures from general errors - Addresses issue of losing vidLoc.Error details 2. Document GetMaster initialization contract - Add comprehensive documentation explaining blocking behavior - Clarify that KeepConnectedToMaster must be started first - Provide typical initialization pattern example - Prevent confusing timeouts during warm-up 3. Document partial results API contract - LookupVolumeIdsWithFallback explicitly documents partial results - Clear examples of how to handle result + error combinations - Helps prevent callers from discarding valid partial results 4. Add safeguards to legacy filer.LookupFn - Add deprecation warning with migration guidance - Implement simple 10,000 entry cache limit - Log warning when limit reached - Recommend wdclient.FilerClient for new code - Prevents unbounded memory growth in long-running processes These changes improve API clarity and operational safety while maintaining backward compatibility. * fix: handle partial results correctly in LookupVolumeIdsWithFallback callers Two callers were discarding partial results by checking err before processing the result map. While these are currently single-volume lookups (so partial results aren't possible), the code was fragile and would break if we ever batched multiple volumes together. Changes: - Check result map FIRST, then conditionally check error - If volume is found in result, use it (ignore errors about other volumes) - If volume is NOT found and err != nil, include error context with %w - Add defensive comments explaining the pattern for future maintainers This makes the code: 1. Correct for future batched lookups 2. More informative (preserves underlying error details) 3. Consistent with filer_grpc_server.go which already handles this correctly Example: If looking up ["1", "2", "999"] and only 999 fails, callers looking for volumes 1 or 2 will succeed instead of failing unnecessarily. * improve: address remaining code review findings 1. Lazy initialize FilerClient in mount for proxy-only setups - Only create FilerClient when VolumeServerAccess != "filerProxy" - Avoids wasted work when all reads proxy through filer - filerClient is nil for proxy mode, initialized for direct access 2. Fix inaccurate deprecation comment in filer.LookupFn - Updated comment to reflect current behavior (10k bounded cache) - Removed claim of "unbounded growth" after adding size limit - Still directs new code to wdclient.FilerClient for better features 3. Audit all MasterClient usages for KeepConnectedToMaster - Verified all production callers start KeepConnectedToMaster early - Filer, Shell, Master, Broker, Benchmark, Admin all correct - IAM creates MasterClient but never uses it (harmless) - Test code doesn't need KeepConnectedToMaster (mocks) All callers properly follow the initialization pattern documented in GetMaster(), preventing unexpected blocking or timeouts. * fix: restore observability instrumentation in MasterClient During the refactoring, several important stats counters and logging statements were accidentally removed from tryConnectToMaster. These are critical for monitoring and debugging the health of master client connections. Restored instrumentation: 1. stats.MasterClientConnectCounter("total") - tracks all connection attempts 2. stats.MasterClientConnectCounter(FailedToKeepConnected) - when KeepConnected stream fails 3. stats.MasterClientConnectCounter(FailedToReceive) - when Recv() fails in loop 4. stats.MasterClientConnectCounter(Failed) - when overall gprcErr occurs 5. stats.MasterClientConnectCounter(OnPeerUpdate) - when peer updates detected Additionally restored peer update logging: - "+ filer@host noticed group.type address" for node additions - "- filer@host noticed group.type address" for node removals - Only logs updates matching the client's FilerGroup for noise reduction This information is valuable for: - Monitoring cluster health and connection stability - Debugging cluster membership changes - Tracking master failover and reconnection patterns - Identifying network issues between clients and masters No functional changes - purely observability restoration. * improve: implement gRPC-aware retry for FilerClient volume lookups The previous implementation used util.Retry which only retries errors containing the string "transport". This is insufficient for handling the full range of transient gRPC errors. Changes: 1. Added isRetryableGrpcError() to properly inspect gRPC status codes - Retries: Unavailable, DeadlineExceeded, ResourceExhausted, Aborted - Falls back to string matching for non-gRPC network errors 2. Replaced util.Retry with custom retry loop - 3 attempts with exponential backoff (1s, 1.5s, 2.25s) - Tries all N filers on each attempt (N3 total attempts max) - Fast-fails on non-retryable errors (NotFound, PermissionDenied, etc.) 3. Improved logging - Shows both filer attempt (x/N) and retry attempt (y/3) - Logs retry reason and wait time for debugging Benefits: - Better handling of transient gRPC failures (server restarts, load spikes) - Faster failure for permanent errors (no wasted retries) - More informative logs for troubleshooting - Maintains existing HA failover across multiple filers Example: If all 3 filers return Unavailable (server overload): - Attempt 1: try all 3 filers, wait 1s - Attempt 2: try all 3 filers, wait 1.5s - Attempt 3: try all 3 filers, fail Example: If filer returns NotFound (volume doesn't exist): - Attempt 1: try all 3 filers, fast-fail (no retry) fmt * improve: add circuit breaker to skip known-unhealthy filers The previous implementation tried all filers on every failure, including known-unhealthy ones. This wasted time retrying permanently down filers. Problem scenario (3 filers, filer0 is down): - Last successful: filer1 (saved as filerIndex=1) - Next lookup when filer1 fails: Retry 1: filer1(fail) → filer2(fail) → filer0(fail, wastes 5s timeout) Retry 2: filer1(fail) → filer2(fail) → filer0(fail, wastes 5s timeout) Retry 3: filer1(fail) → filer2(fail) → filer0(fail, wastes 5s timeout) Total wasted: 15 seconds on known-bad filer! Solution: Circuit breaker pattern - Track consecutive failures per filer (atomic int32) - Skip filers with 3+ consecutive failures - Re-check unhealthy filers every 30 seconds - Reset failure count on success New behavior: - filer0 fails 3 times → marked unhealthy - Future lookups skip filer0 for 30 seconds - After 30s, re-check filer0 (allows recovery) - If filer0 succeeds, reset failure count to 0 Benefits: 1. Avoids wasting time on known-down filers 2. Still sticks to last healthy filer (via filerIndex) 3. Allows recovery (30s re-check window) 4. No configuration needed (automatic) Implementation details: - filerHealth struct tracks failureCount (atomic) + lastFailureTime - shouldSkipUnhealthyFiler(): checks if we should skip this filer - recordFilerSuccess(): resets failure count to 0 - recordFilerFailure(): increments count, updates timestamp - Logs when skipping unhealthy filers (V(2) level) Example with circuit breaker: - filer0 down, saved filerIndex=1 (filer1 healthy) - Lookup 1: filer1(ok) → Done (0.01s) - Lookup 2: filer1(fail) → filer2(ok) → Done, save filerIndex=2 (0.01s) - Lookup 3: filer2(fail) → skip filer0 (unhealthy) → filer1(ok) → Done (0.01s) Much better than wasting 15s trying filer0 repeatedly! * fix: OnPeerUpdate should only process updates for matching FilerGroup Critical bug: The OnPeerUpdate callback was incorrectly moved outside the FilerGroup check when restoring observability instrumentation. This caused clients to process peer updates for ALL filer groups, not just their own. Problem: Before: mc.OnPeerUpdate only called for update.FilerGroup == mc.FilerGroup Bug: mc.OnPeerUpdate called for ALL updates regardless of FilerGroup Impact: - Multi-tenant deployments with separate filer groups would see cross-group updates (e.g., group A clients processing group B updates) - Could cause incorrect cluster membership tracking - OnPeerUpdate handlers (like Filer's DLM ring updates) would receive irrelevant updates from other groups Example scenario: Cluster has two filer groups: "production" and "staging" Production filer connects with FilerGroup="production" Incorrect behavior (bug): - Receives "staging" group updates - Incorrectly adds staging filers to production DLM ring - Cross-tenant data access issues Correct behavior (fixed): - Only receives "production" group updates - Only adds production filers to production DLM ring - Proper isolation between groups Fix: Moved mc.OnPeerUpdate(update, time.Now()) back INSIDE the FilerGroup check where it belongs, matching the original implementation. The logging and stats counter were already correctly scoped to matching FilerGroup, so they remain inside the if block as intended. * improve: clarify Aborted error handling in volume lookups Added documentation and logging to address the concern that codes.Aborted might not always be retryable in all contexts. Context-specific justification for treating Aborted as retryable: Volume location lookups (LookupVolume RPC) are simple, read-only operations: - No transactions - No write conflicts - No application-level state changes - Idempotent (safe to retry) In this context, Aborted is most likely caused by: - Filer restarting/recovering (transient) - Connection interrupted mid-request (transient) - Server-side resource cleanup (transient) NOT caused by: - Application-level conflicts (no writes) - Transaction failures (no transactions) - Logical errors (read-only lookup) Changes: 1. Added detailed comment explaining the context-specific reasoning 2. Added V(1) logging when treating Aborted as retryable - Helps detect misclassification if it occurs - Visible in verbose logs for troubleshooting 3. Split switch statement for clarity (one case per line) If future analysis shows Aborted should not be retried, operators will now have visibility via logs to make that determination. The logging provides evidence for future tuning decisions. Alternative approaches considered but not implemented: - Removing Aborted entirely (too conservative for read-only ops) - Message content inspection (adds complexity, no known patterns yet) - Different handling per RPC type (premature optimization) * fix: IAM server must start KeepConnectedToMaster for masterClient usage The IAM server creates and uses a MasterClient but never started KeepConnectedToMaster, which could cause blocking if IAM config files have chunks requiring volume lookups. Problem flow: NewIamApiServerWithStore() → creates masterClient → ❌ NEVER starts KeepConnectedToMaster GetS3ApiConfigurationFromFiler() → filer.ReadEntry(iama.masterClient, ...) → StreamContent(masterClient, ...) if file has chunks → masterClient.GetLookupFileIdFunction() → GetMaster(ctx) ← BLOCKS indefinitely waiting for connection! While IAM config files (identity & policies) are typically small and stored inline without chunks, the code path exists and would block if the files ever had chunks. Fix: Start KeepConnectedToMaster in background goroutine right after creating masterClient, following the documented pattern: mc := wdclient.NewMasterClient(...) go mc.KeepConnectedToMaster(ctx) This ensures masterClient is usable if ReadEntry ever needs to stream chunked content from volume servers. Note: This bug was dormant because IAM config files are small (<256 bytes) and SeaweedFS stores small files inline in Entry.Content, not as chunks. The bug would only manifest if: - IAM config grew > 256 bytes (inline threshold) - Config was stored as chunks on volume servers - ReadEntry called StreamContent - GetMaster blocked indefinitely Now all 9 production MasterClient instances correctly follow the pattern. * fix: data race on filerHealth.lastFailureTime in circuit breaker The circuit breaker tracked lastFailureTime as time.Time, which was written in recordFilerFailure and read in shouldSkipUnhealthyFiler without synchronization, causing a data race. Data race scenario: Goroutine 1: recordFilerFailure(0) health.lastFailureTime = time.Now() // ❌ unsynchronized write Goroutine 2: shouldSkipUnhealthyFiler(0) time.Since(health.lastFailureTime) // ❌ unsynchronized read → RACE DETECTED by -race detector Fix: Changed lastFailureTime from time.Time to int64 (lastFailureTimeNs) storing Unix nanoseconds for atomic access: Write side (recordFilerFailure): atomic.StoreInt64(&health.lastFailureTimeNs, time.Now().UnixNano()) Read side (shouldSkipUnhealthyFiler): lastFailureNs := atomic.LoadInt64(&health.lastFailureTimeNs) if lastFailureNs == 0 { return false } // Never failed lastFailureTime := time.Unix(0, lastFailureNs) time.Since(lastFailureTime) > 30time.Second Benefits: - Atomic reads/writes (no data race) - Efficient (int64 is 8 bytes, always atomic on 64-bit systems) - Zero value (0) naturally means "never failed" - No mutex needed (lock-free circuit breaker) Note: sync/atomic was already imported for failureCount, so no new import needed. fix: create fresh timeout context for each filer retry attempt The timeout context was created once at function start and reused across all retry attempts, causing subsequent retries to run with progressively shorter (or expired) deadlines. Problem flow: Line 244: timeoutCtx, cancel := context.WithTimeout(ctx, 5s) defer cancel() Retry 1, filer 0: client.LookupVolume(timeoutCtx, ...) ← 5s available ✅ Retry 1, filer 1: client.LookupVolume(timeoutCtx, ...) ← 3s left Retry 1, filer 2: client.LookupVolume(timeoutCtx, ...) ← 0.5s left Retry 2, filer 0: client.LookupVolume(timeoutCtx, ...) ← EXPIRED! ❌ Result: Retries always fail with DeadlineExceeded, defeating the purpose of retries. Fix: Moved context.WithTimeout inside the per-filer loop, creating a fresh timeout context for each attempt: for x := 0; x < n; x++ { timeoutCtx, cancel := context.WithTimeout(ctx, fc.grpcTimeout) err := pb.WithGrpcFilerClient(..., func(client) { resp, err := client.LookupVolume(timeoutCtx, ...) ... }) cancel() // Clean up immediately after call } Benefits: - Each filer attempt gets full fc.grpcTimeout (default 5s) - Retries actually have time to complete - No context leaks (cancel called after each attempt) - More predictable timeout behavior Example with fix: Retry 1, filer 0: fresh 5s timeout ✅ Retry 1, filer 1: fresh 5s timeout ✅ Retry 2, filer 0: fresh 5s timeout ✅ Total max time: 3 retries × 3 filers × 5s = 45s (plus backoff) Note: The outer ctx (from caller) still provides overall cancellation if the caller cancels or times out the entire operation. * fix: always reset vidMap cache on master reconnection The previous refactoring removed the else block that resets vidMap when the first message from a newly connected master is not a VolumeLocation. Problem scenario: 1. Client connects to master-1 and builds vidMap cache 2. Master-1 fails, client connects to master-2 3. First message from master-2 is a ClusterNodeUpdate (not VolumeLocation) 4. Old code: vidMap is reset and updated ✅ 5. New code: vidMap is NOT reset ❌ 6. Result: Client uses stale cache from master-1 → data access errors Example flow with bug: Connect to master-2 First message: ClusterNodeUpdate {filer.x added} → No resetVidMap() call → vidMap still has master-1's stale volume locations → Client reads from wrong volume servers → 404 errors Fix: Restored the else block that resets vidMap when first message is not a VolumeLocation: if resp.VolumeLocation != nil { // ... check leader, reset, and update ... } else { // First message is ClusterNodeUpdate or other type // Must still reset to avoid stale data mc.resetVidMap() } This ensures the cache is always cleared when establishing a new master connection, regardless of what the first message type is. Root cause: During the vidMapClient refactoring, this else block was accidentally dropped, making failover behavior fragile and non-deterministic (depends on which message type arrives first from the new master). Impact: - High severity for master failover scenarios - Could cause read failures, 404s, or wrong data access - Only manifests when first message is not VolumeLocation * fix: goroutine and connection leak in IAM server shutdown The IAM server's KeepConnectedToMaster goroutine used context.Background(), which is non-cancellable, causing the goroutine and its gRPC connections to leak on server shutdown. Problem: go masterClient.KeepConnectedToMaster(context.Background()) - context.Background() never cancels - KeepConnectedToMaster goroutine runs forever - gRPC connection to master stays open - No way to stop cleanly on server shutdown Result: Resource leaks when IAM server is stopped Fix: 1. Added shutdownContext and shutdownCancel to IamApiServer struct 2. Created cancellable context in NewIamApiServerWithStore: shutdownCtx, shutdownCancel := context.WithCancel(context.Background()) 3. Pass shutdownCtx to KeepConnectedToMaster: go masterClient.KeepConnectedToMaster(shutdownCtx) 4. Added Shutdown() method to invoke cancel: func (iama IamApiServer) Shutdown() { if iama.shutdownCancel != nil { iama.shutdownCancel() } } 5. Stored masterClient reference on IamApiServer for future use Benefits: - Goroutine stops cleanly when Shutdown() is called - gRPC connections are closed properly - No resource leaks on server restart/stop - Shutdown() is idempotent (safe to call multiple times) Usage (for future graceful shutdown): iamServer, _ := iamapi.NewIamApiServer(...) defer iamServer.Shutdown() // or in signal handler: sigChan := make(chan os.Signal, 1) signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT) go func() { <-sigChan iamServer.Shutdown() os.Exit(0) }() Note: Current command implementations (weed/command/iam.go) don't have shutdown paths yet, but this makes IAM server ready for proper lifecycle management when that infrastructure is added. refactor: remove unnecessary KeepMasterClientConnected wrapper in filer The Filer.KeepMasterClientConnected() method was an unnecessary wrapper that just forwarded to MasterClient.KeepConnectedToMaster(). This wrapper added no value and created inconsistency with other components that call KeepConnectedToMaster directly. Removed: filer.go:178-180 func (fs Filer) KeepMasterClientConnected(ctx context.Context) { fs.MasterClient.KeepConnectedToMaster(ctx) } Updated caller: filer_server.go:181 - go fs.filer.KeepMasterClientConnected(context.Background()) + go fs.filer.MasterClient.KeepConnectedToMaster(context.Background()) Benefits: - Consistent with other components (S3, IAM, Shell, Mount) - Removes unnecessary indirection - Clearer that KeepConnectedToMaster runs in background goroutine - Follows the documented pattern from MasterClient.GetMaster() Note: shell/commands.go was verified and already correctly starts KeepConnectedToMaster in a background goroutine (shell_liner.go:51): go commandEnv.MasterClient.KeepConnectedToMaster(ctx) fix: use client ID instead of timeout for gRPC signature parameter The pb.WithGrpcFilerClient signature parameter is meant to be a client identifier for logging and tracking (added as 'sw-client-id' gRPC metadata in streaming mode), not a timeout value. Problem: timeoutMs := int32(fc.grpcTimeout.Milliseconds()) // 5000 (5 seconds) err := pb.WithGrpcFilerClient(false, timeoutMs, filerAddress, ...) - Passing timeout (5000ms) as signature/client ID - Misuse of API: signature should be a unique client identifier - Timeout is already handled by timeoutCtx passed to gRPC call - Inconsistent with other callers (all use 0 or proper client ID) How WithGrpcFilerClient uses signature parameter: func WithGrpcClient(..., signature int32, ...) { if streamingMode && signature != 0 { md := metadata.New(map[string]string{"sw-client-id": fmt.Sprintf("%d", signature)}) ctx = metadata.NewOutgoingContext(ctx, md) } ... } It's for client identification, not timeout control! Fix: 1. Added clientId int32 field to FilerClient struct 2. Initialize with rand.Int31() in NewFilerClient for unique ID 3. Removed timeoutMs variable (and misleading comment) 4. Use fc.clientId in pb.WithGrpcFilerClient call Before: err := pb.WithGrpcFilerClient(false, timeoutMs, ...) ^^^^^^^^^ Wrong! (5000) After: err := pb.WithGrpcFilerClient(false, fc.clientId, ...) ^^^^^^^^^^^^ Correct! (random int31) Benefits: - Correct API usage (signature = client ID, not timeout) - Timeout still works via timeoutCtx (unchanged) - Consistent with other pb.WithGrpcFilerClient callers - Enables proper client tracking on filer side via gRPC metadata - Each FilerClient instance has unique ID for debugging Examples of correct usage elsewhere: weed/iamapi/iamapi_server.go:145 pb.WithGrpcFilerClient(false, 0, ...) weed/command/s3.go:215 pb.WithGrpcFilerClient(false, 0, ...) weed/shell/commands.go:110 pb.WithGrpcFilerClient(streamingMode, 0, ...) All use 0 (or a proper signature), not a timeout value. * fix: add timeout to master volume lookup to prevent indefinite blocking The masterVolumeProvider.LookupVolumeIds method was using the context directly without a timeout, which could cause it to block indefinitely if the master is slow to respond or unreachable. Problem: err := pb.WithMasterClient(false, p.masterClient.GetMaster(ctx), ...) resp, err := client.LookupVolume(ctx, &master_pb.LookupVolumeRequest{...}) - No timeout on gRPC call to master - Could block indefinitely if master is unresponsive - Inconsistent with FilerClient which uses 5s timeout - This is a fallback path (cache miss) but still needs protection Scenarios where this could hang: 1. Master server under heavy load (slow response) 2. Network issues between client and master 3. Master server hung or deadlocked 4. Master in process of shutting down Fix: timeoutCtx, cancel := context.WithTimeout(ctx, 5time.Second) defer cancel() err := pb.WithMasterClient(false, p.masterClient.GetMaster(timeoutCtx), ...) resp, err := client.LookupVolume(timeoutCtx, &master_pb.LookupVolumeRequest{...}) Benefits: - Prevents indefinite blocking on master lookup - Consistent with FilerClient timeout pattern (5 seconds) - Faster failure detection when master is unresponsive - Caller's context still honored (timeout is in addition, not replacement) - Improves overall system resilience Note: 5 seconds is a reasonable default for volume lookups: - Long enough for normal master response (~10-50ms) - Short enough to fail fast on issues - Matches FilerClient's grpcTimeout default purge * refactor: address code review feedback on comments and style Fixed several code quality issues identified during review: 1. Corrected backoff algorithm description in filer_client.go: - Changed "Exponential backoff" to "Multiplicative backoff with 1.5x factor" - The formula waitTime * 3/2 produces 1s, 1.5s, 2.25s, not exponential 2^n - More accurate terminology prevents confusion 2. Removed redundant nil check in vidmap_client.go: - After the for loop, node is guaranteed to be non-nil - Loop either returns early or assigns non-nil value to node - Simplified: if node != nil { node.cache.Store(nil) } → node.cache.Store(nil) 3. Added startup logging to IAM server for consistency: - Log when master client connection starts - Matches pattern in S3ApiServer (line 100 in s3api_server.go) - Improves operational visibility during startup - Added missing glog import 4. Fixed indentation in filer/reader_at.go: - Lines 76-91 had incorrect indentation (extra tab level) - Line 93 also misaligned - Now properly aligned with surrounding code 5. Updated deprecation comment to follow Go convention: - Changed "DEPRECATED:" to "Deprecated:" (standard Go format) - Tools like staticcheck and IDEs recognize the standard format - Enables automated deprecation warnings in tooling - Better developer experience All changes are cosmetic and do not affect functionality. * fmt * refactor: make circuit breaker parameters configurable in FilerClient The circuit breaker failure threshold (3) and reset timeout (30s) were hardcoded, making it difficult to tune the client's behavior in different deployment environments without modifying the code. Problem: func shouldSkipUnhealthyFiler(index int32) bool { if failureCount < 3 { // Hardcoded threshold return false } if time.Since(lastFailureTime) > 30time.Second { // Hardcoded timeout return false } } Different environments have different needs: - High-traffic production: may want lower threshold (2) for faster failover - Development/testing: may want higher threshold (5) to tolerate flaky networks - Low-latency services: may want shorter reset timeout (10s) - Batch processing: may want longer reset timeout (60s) Solution: 1. Added fields to FilerClientOption: - FailureThreshold int32 (default: 3) - ResetTimeout time.Duration (default: 30s) 2. Added fields to FilerClient: - failureThreshold int32 - resetTimeout time.Duration 3. Applied defaults in NewFilerClient with option override: failureThreshold := int32(3) resetTimeout := 30 time.Second if opt.FailureThreshold > 0 { failureThreshold = opt.FailureThreshold } if opt.ResetTimeout > 0 { resetTimeout = opt.ResetTimeout } 4. Updated shouldSkipUnhealthyFiler to use configurable values: if failureCount < fc.failureThreshold { ... } if time.Since(lastFailureTime) > fc.resetTimeout { ... } Benefits: ✓ Tunable for different deployment environments ✓ Backward compatible (defaults match previous hardcoded values) ✓ No breaking changes to existing code ✓ Better maintainability and flexibility Example usage: // Aggressive failover for low-latency production fc := wdclient.NewFilerClient(filers, dialOpt, dc, &wdclient.FilerClientOption{ FailureThreshold: 2, ResetTimeout: 10 * time.Second, }) // Tolerant of flaky networks in development fc := wdclient.NewFilerClient(filers, dialOpt, dc, &wdclient.FilerClientOption{ FailureThreshold: 5, ResetTimeout: 60 * time.Second, }) * retry parameters * refactor: make retry and timeout parameters configurable Made retry logic and gRPC timeouts configurable across FilerClient and MasterClient to support different deployment environments and network conditions. Problem 1: Hardcoded retry parameters in FilerClient waitTime := time.Second // Fixed at 1s maxRetries := 3 // Fixed at 3 attempts waitTime = waitTime * 3 / 2 // Fixed 1.5x multiplier Different environments have different needs: - Unstable networks: may want more retries (5) with longer waits (2s) - Low-latency production: may want fewer retries (2) with shorter waits (500ms) - Batch processing: may want exponential backoff (2x) instead of 1.5x Problem 2: Hardcoded gRPC timeout in MasterClient timeoutCtx, cancel := context.WithTimeout(ctx, 5time.Second) Master lookups may need different timeouts: - High-latency cross-region: may need 10s timeout - Local network: may use 2s timeout for faster failure detection Solution for FilerClient: 1. Added fields to FilerClientOption: - MaxRetries int (default: 3) - InitialRetryWait time.Duration (default: 1s) - RetryBackoffFactor float64 (default: 1.5) 2. Added fields to FilerClient: - maxRetries int - initialRetryWait time.Duration - retryBackoffFactor float64 3. Updated LookupVolumeIds to use configurable values: waitTime := fc.initialRetryWait maxRetries := fc.maxRetries for retry := 0; retry < maxRetries; retry++ { ... waitTime = time.Duration(float64(waitTime) fc.retryBackoffFactor) } Solution for MasterClient: 1. Added grpcTimeout field to MasterClient (default: 5s) 2. Initialize in NewMasterClient with 5 * time.Second default 3. Updated masterVolumeProvider to use p.masterClient.grpcTimeout Benefits: ✓ Tunable for different network conditions and deployment scenarios ✓ Backward compatible (defaults match previous hardcoded values) ✓ No breaking changes to existing code ✓ Consistent configuration pattern across FilerClient and MasterClient Example usage: // Fast-fail for low-latency production with stable network fc := wdclient.NewFilerClient(filers, dialOpt, dc, &wdclient.FilerClientOption{ MaxRetries: 2, InitialRetryWait: 500 * time.Millisecond, RetryBackoffFactor: 2.0, // Exponential backoff GrpcTimeout: 2 * time.Second, }) // Patient retries for unstable network or batch processing fc := wdclient.NewFilerClient(filers, dialOpt, dc, &wdclient.FilerClientOption{ MaxRetries: 5, InitialRetryWait: 2 * time.Second, RetryBackoffFactor: 1.5, GrpcTimeout: 10 * time.Second, }) Note: MasterClient timeout is currently set at construction time and not user-configurable via NewMasterClient parameters. Future enhancement could add a MasterClientOption struct similar to FilerClientOption. * fix: rename vicCacheLock to vidCacheLock for consistency Fixed typo in variable name for better code consistency and readability. Problem: vidCache := make(map[string]filer_pb.Locations) var vicCacheLock sync.RWMutex // Typo: vic instead of vid vicCacheLock.RLock() locations, found := vidCache[vid] vicCacheLock.RUnlock() The variable name 'vicCacheLock' is inconsistent with 'vidCache'. Both should use 'vid' prefix (volume ID) not 'vic'. Fix: Renamed all 5 occurrences: - var vicCacheLock → var vidCacheLock (line 56) - vicCacheLock.RLock() → vidCacheLock.RLock() (line 62) - vicCacheLock.RUnlock() → vidCacheLock.RUnlock() (line 64) - vicCacheLock.Lock() → vidCacheLock.Lock() (line 81) - vicCacheLock.Unlock() → vidCacheLock.Unlock() (line 91) Benefits: ✓ Consistent variable naming convention ✓ Clearer intent (volume ID cache lock) ✓ Better code readability ✓ Easier code navigation fix: use defer cancel() with anonymous function for proper context cleanup Fixed context cancellation to use defer pattern correctly in loop iteration. Problem: for x := 0; x < n; x++ { timeoutCtx, cancel := context.WithTimeout(ctx, fc.grpcTimeout) err := pb.WithGrpcFilerClient(...) cancel() // Only called on normal return, not on panic } Issues with original approach: 1. If pb.WithGrpcFilerClient panics, cancel() is never called → context leak 2. If callback returns early (though unlikely here), cleanup might be missed 3. Not following Go best practices for context.WithTimeout usage Problem with naive defer in loop: for x := 0; x < n; x++ { timeoutCtx, cancel := context.WithTimeout(ctx, fc.grpcTimeout) defer cancel() // ❌ WRONG: All defers accumulate until function returns } In Go, defer executes when the surrounding function returns, not when the loop iteration ends. This would accumulate n deferred cancel() calls and leak contexts until LookupVolumeIds returns. Solution: Wrap in anonymous function for x := 0; x < n; x++ { err := func() error { timeoutCtx, cancel := context.WithTimeout(ctx, fc.grpcTimeout) defer cancel() // ✅ Executes when anonymous function returns (per iteration) return pb.WithGrpcFilerClient(...) }() } Benefits: ✓ Context always cancelled, even on panic ✓ defer executes after each iteration (not accumulated) ✓ Follows Go best practices for context.WithTimeout ✓ No resource leaks during retry loop execution ✓ Cleaner error handling Reference: Go documentation for context.WithTimeout explicitly shows: ctx, cancel := context.WithTimeout(...) defer cancel() This is the idiomatic pattern that should always be followed. * Can't use defer directly in loop * improve: add data center preference and URL shuffling for consistent performance Added missing data center preference and load distribution (URL shuffling) to ensure consistent performance and behavior across all code paths. Problem 1: PreferPublicUrl path missing DC preference and shuffling Location: weed/wdclient/filer_client.go lines 184-192 The custom PreferPublicUrl implementation was simply iterating through locations and building URLs without considering: 1. Data center proximity (latency optimization) 2. Load distribution across volume servers Before: for _, loc := range locations { url := loc.PublicUrl if url == "" { url = loc.Url } fullUrls = append(fullUrls, "http://"+url+"/"+fileId) } return fullUrls, nil After: var sameDcUrls, otherDcUrls []string dataCenter := fc.GetDataCenter() for _, loc := range locations { url := loc.PublicUrl if url == "" { url = loc.Url } httpUrl := "http://" + url + "/" + fileId if dataCenter != "" && dataCenter == loc.DataCenter { sameDcUrls = append(sameDcUrls, httpUrl) } else { otherDcUrls = append(otherDcUrls, httpUrl) } } rand.Shuffle(len(sameDcUrls), ...) rand.Shuffle(len(otherDcUrls), ...) fullUrls = append(sameDcUrls, otherDcUrls...) Problem 2: Cache miss path missing URL shuffling Location: weed/wdclient/vidmap_client.go lines 95-108 The cache miss path (fallback lookup) was missing URL shuffling, while the cache hit path (vm.LookupFileId) already shuffles URLs. This inconsistency meant: - Cache hit: URLs shuffled → load distributed - Cache miss: URLs not shuffled → first server always hit Before: var sameDcUrls, otherDcUrls []string // ... build URLs ... fullUrls = append(sameDcUrls, otherDcUrls...) return fullUrls, nil After: var sameDcUrls, otherDcUrls []string // ... build URLs ... rand.Shuffle(len(sameDcUrls), ...) rand.Shuffle(len(otherDcUrls), ...) fullUrls = append(sameDcUrls, otherDcUrls...) return fullUrls, nil Benefits: ✓ Reduced latency by preferring same-DC volume servers ✓ Even load distribution across all volume servers ✓ Consistent behavior between cache hit/miss paths ✓ Consistent behavior between PreferUrl and PreferPublicUrl ✓ Matches behavior of existing vidMap.LookupFileId implementation Impact on performance: - Lower read latency (same-DC preference) - Better volume server utilization (load spreading) - No single volume server becomes a hotspot Note: Added math/rand import to vidmap_client.go for shuffle support. * Update weed/wdclient/masterclient.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * improve: call IAM server Shutdown() for best-effort cleanup Added call to iamApiServer.Shutdown() to ensure cleanup happens when possible, and documented the limitations of the current approach. Problem: The Shutdown() method was defined in IamApiServer but never called anywhere, meaning the KeepConnectedToMaster goroutine would continue running even when the IAM server stopped, causing resource leaks. Changes: 1. Store iamApiServer instance in weed/command/iam.go - Changed: _, iamApiServer_err := iamapi.NewIamApiServer(...) - To: iamApiServer, iamApiServer_err := iamapi.NewIamApiServer(...) 2. Added defer call for best-effort cleanup - defer iamApiServer.Shutdown() - This will execute if startIamServer() returns normally 3. Added logging in Shutdown() method - Log when shutdown is triggered for visibility 4. Documented limitations and future improvements - Added note that defer only works for normal function returns - SeaweedFS commands don't currently have signal handling - Suggested future enhancement: add SIGTERM/SIGINT handling Current behavior: - ✓ Cleanup happens if HTTP server fails to start (glog.Fatalf path) - ✓ Cleanup happens if Serve() returns with error (unlikely) - ✗ Cleanup does NOT happen on SIGTERM/SIGINT (process killed) The last case is a limitation of the current command architecture - all SeaweedFS commands (s3, filer, volume, master, iam) lack signal handling for graceful shutdown. This is a systemic issue that affects all services. Future enhancement: To properly handle SIGTERM/SIGINT, the command layer would need: sigChan := make(chan os.Signal, 1) signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT) go func() { httpServer.Serve(listener) // Non-blocking }() <-sigChan glog.V(0).Infof("Received shutdown signal") iamApiServer.Shutdown() httpServer.Shutdown(context.Background()) This would require refactoring the command structure for all services, which is out of scope for this change. Benefits of current approach: ✓ Best-effort cleanup (better than nothing) ✓ Proper cleanup in error paths ✓ Documented for future improvement ✓ Consistent with how other SeaweedFS services handle lifecycle * data racing in test --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-20 20:50:26 -08:00
Chris Lu	ca84a8a713	S3: Directly read write volume servers (#7481 ) * Lazy Versioning Check, Conditional SSE Entry Fetch, HEAD Request Optimization * revert Reverted the conditional versioning check to always check versioning status Reverted the conditional SSE entry fetch to always fetch entry metadata Reverted the conditional versioning check to always check versioning status Reverted the conditional SSE entry fetch to always fetch entry metadata * Lazy Entry Fetch for SSE, Skip Conditional Header Check * SSE-KMS headers are present, this is not an SSE-C request (mutually exclusive) * SSE-C is mutually exclusive with SSE-S3 and SSE-KMS * refactor * Removed Premature Mutual Exclusivity Check * check for the presence of the X-Amz-Server-Side-Encryption header * not used * fmt * directly read write volume servers * HTTP Range Request Support * set header * md5 * copy object * fix sse * fmt * implement sse * sse continue * fixed the suffix range bug (bytes=-N for "last N bytes") * debug logs * Missing PartsCount Header * profiling * url encoding * test_multipart_get_part * headers * debug * adjust log level * handle part number * Update s3api_object_handlers.go * nil safety * set ModifiedTsNs * remove * nil check * fix sse header * same logic as filer * decode values * decode ivBase64 * s3: Fix SSE decryption JWT authentication and streaming errors Critical fix for SSE (Server-Side Encryption) test failures: 1. JWT Authentication Bug (Root Cause): - Changed from GenJwtForFilerServer to GenJwtForVolumeServer - S3 API now uses correct JWT when directly reading from volume servers - Matches filer's authentication pattern for direct volume access - Fixes 'unexpected EOF' and 500 errors in SSE tests 2. Streaming Error Handling: - Added error propagation in getEncryptedStreamFromVolumes goroutine - Use CloseWithError() to properly communicate stream failures - Added debug logging for streaming errors 3. Response Header Timing: - Removed premature WriteHeader(http.StatusOK) call - Let Go's http package write status automatically on first write - Prevents header lock when errors occur during streaming 4. Enhanced SSE Decryption Debugging: - Added IV/Key validation and logging for SSE-C, SSE-KMS, SSE-S3 - Better error messages for missing or invalid encryption metadata - Added glog.V(2) debugging for decryption setup This fixes SSE integration test failures where encrypted objects could not be retrieved due to volume server authentication failures. The JWT bug was causing volume servers to reject requests, resulting in truncated/empty streams (EOF) or internal errors. * s3: Fix SSE multipart upload metadata preservation Critical fix for SSE multipart upload test failures (SSE-C and SSE-KMS): Root Cause - Incomplete SSE Metadata Copying: The old code only tried to copy 'SeaweedFSSSEKMSKey' from the first part to the completed object. This had TWO bugs: 1. Wrong Constant Name (Key Mismatch Bug): - Storage uses: SeaweedFSSSEKMSKeyHeader = 'X-SeaweedFS-SSE-KMS-Key' - Old code read: SeaweedFSSSEKMSKey = 'x-seaweedfs-sse-kms-key' - Result: SSE-KMS metadata was NEVER copied → 500 errors 2. Missing SSE-C and SSE-S3 Headers: - SSE-C requires: IV, Algorithm, KeyMD5 - SSE-S3 requires: encrypted key data + standard headers - Old code: copied nothing for SSE-C/SSE-S3 → decryption failures Fix - Complete SSE Header Preservation: Now copies ALL SSE headers from first part to completed object: - SSE-C: SeaweedFSSSEIV, CustomerAlgorithm, CustomerKeyMD5 - SSE-KMS: SeaweedFSSSEKMSKeyHeader, AwsKmsKeyId, ServerSideEncryption - SSE-S3: SeaweedFSSSES3Key, ServerSideEncryption Applied consistently to all 3 code paths: 1. Versioned buckets (creates version file) 2. Suspended versioning (creates main object with null versionId) 3. Non-versioned buckets (creates main object) Why This Is Correct: The headers copied EXACTLY match what putToFiler stores during part upload (lines 496-521 in s3api_object_handlers_put.go). This ensures detectPrimarySSEType() can correctly identify encrypted multipart objects and trigger inline decryption with proper metadata. Fixes: TestSSEMultipartUploadIntegration (SSE-C and SSE-KMS subtests) * s3: Add debug logging for versioning state diagnosis Temporary debug logging to diagnose test_versioning_obj_plain_null_version_overwrite_suspended failure. Added glog.V(0) logging to show: 1. setBucketVersioningStatus: when versioning status is changed 2. PutObjectHandler: what versioning state is detected (Enabled/Suspended/none) 3. PutObjectHandler: which code path is taken (putVersionedObject vs putSuspendedVersioningObject) This will help identify if: - The versioning status is being set correctly in bucket config - The cache is returning stale/incorrect versioning state - The switch statement is correctly routing to suspended vs enabled handlers * s3: Enhanced versioning state tracing for suspended versioning diagnosis Added comprehensive logging across the entire versioning state flow: PutBucketVersioningHandler: - Log requested status (Enabled/Suspended) - Log when calling setBucketVersioningStatus - Log success/failure of status change setBucketVersioningStatus: - Log bucket and status being set - Log when config is updated - Log completion with error code updateBucketConfig: - Log versioning state being written to cache - Immediate cache verification after Set - Log if cache verification fails getVersioningState: - Log bucket name and state being returned - Log if object lock forces VersioningEnabled - Log errors This will reveal: 1. If PutBucketVersioning(Suspended) is reaching the handler 2. If the cache update succeeds 3. What state getVersioningState returns during PUT 4. Any cache consistency issues Expected to show why bucket still reports 'Enabled' after 'Suspended' call. * s3: Add SSE chunk detection debugging for multipart uploads Added comprehensive logging to diagnose why TestSSEMultipartUploadIntegration fails: detectPrimarySSEType now logs: 1. Total chunk count and extended header count 2. All extended headers with 'sse'/'SSE'/'encryption' in the name 3. For each chunk: index, SseType, and whether it has metadata 4. Final SSE type counts (SSE-C, SSE-KMS, SSE-S3) This will reveal if: - Chunks are missing SSE metadata after multipart completion - Extended headers are copied correctly from first part - The SSE detection logic is working correctly Expected to show if chunks have SseType=0 (none) or proper SSE types set. * s3: Trace SSE chunk metadata through multipart completion and retrieval Added end-to-end logging to track SSE chunk metadata lifecycle: During Multipart Completion (filer_multipart.go): 1. Log finalParts chunks BEFORE mkFile - shows SseType and metadata 2. Log versionEntry.Chunks INSIDE mkFile callback - shows if mkFile preserves SSE info 3. Log success after mkFile completes During GET Retrieval (s3api_object_handlers.go): 1. Log retrieved entry chunks - shows SseType and metadata after retrieval 2. Log detected SSE type result This will reveal at which point SSE chunk metadata is lost: - If finalParts have SSE metadata but versionEntry.Chunks don't → mkFile bug - If versionEntry.Chunks have SSE metadata but retrieved chunks don't → storage/retrieval bug - If chunks never have SSE metadata → multipart completion SSE processing bug Expected to show chunks with SseType=NONE during retrieval even though they were created with proper SseType during multipart completion. * s3: Fix SSE-C multipart IV base64 decoding bug Critical Bug Found: SSE-C multipart uploads were failing because: Root Cause: - entry.Extended[SeaweedFSSSEIV] stores base64-encoded IV (24 bytes for 16-byte IV) - SerializeSSECMetadata expects raw IV bytes (16 bytes) - During multipart completion, we were passing base64 IV directly → serialization error Error Message: "Failed to serialize SSE-C metadata for chunk in part X: invalid IV length: expected 16 bytes, got 24" Fix: - Base64-decode IV before passing to SerializeSSECMetadata - Added error handling for decode failures Impact: - SSE-C multipart uploads will now correctly serialize chunk metadata - Chunks will have proper SSE metadata for decryption during GET This fixes the SSE-C subtest of TestSSEMultipartUploadIntegration. SSE-KMS still has a separate issue (error code 23) being investigated. * fixes * kms sse * handle retry if not found in .versions folder and should read the normal object * quick check (no retries) to see if the .versions/ directory exists * skip retry if object is not found * explicit update to avoid sync delay * fix map update lock * Remove fmt.Printf debug statements * Fix SSE-KMS multipart base IV fallback to fail instead of regenerating * fmt * Fix ACL grants storage logic * header handling * nil handling * range read for sse content * test range requests for sse objects * fmt * unused code * upload in chunks * header case * fix url * bucket policy error vs bucket not found * jwt handling * fmt * jwt in request header * Optimize Case-Insensitive Prefix Check * dead code * Eliminated Unnecessary Stream Prefetch for Multipart SSE * range sse * sse * refactor * context * fmt * fix type * fix SSE-C IV Mismatch * Fix Headers Being Set After WriteHeader * fix url parsing * propergate sse headers * multipart sse-s3 * aws sig v4 authen * sse kms * set content range * better errors * Update s3api_object_handlers_copy.go * Update s3api_object_handlers.go * Update s3api_object_handlers.go * avoid magic number * clean up * Update s3api_bucket_policy_handlers.go * fix url parsing * context * data and metadata both use background context * adjust the offset * SSE Range Request IV Calculation * adjust logs * IV relative to offset in each part, not the whole file * collect logs * offset * fix offset * fix url * logs * variable * jwt * Multipart ETag semantics: conditionally set object-level Md5 for single-chunk uploads only. * sse * adjust IV and offset * multipart boundaries * ensures PUT and GET operations return consistent ETags * Metadata Header Case * CommonPrefixes Sorting with URL Encoding * always sort * remove the extra PathUnescape call * fix the multipart get part ETag * the FileChunk is created without setting ModifiedTsNs * Sort CommonPrefixes lexicographically to match AWS S3 behavior * set md5 for multipart uploads * prevents any potential data loss or corruption in the small-file inline storage path * compiles correctly * decryptedReader will now be properly closed after use * Fixed URL encoding and sort order for CommonPrefixes * Update s3api_object_handlers_list.go * SSE-x Chunk View Decryption * Different IV offset calculations for single-part vs multipart objects * still too verbose in logs * less logs * ensure correct conversion * fix listing * nil check * minor fixes * nil check * single character delimiter * optimize * range on empty object or zero-length * correct IV based on its position within that part, not its position in the entire object * adjust offset * offset Fetch FULL encrypted chunk (not just the range) Adjust IV by PartOffset/ChunkOffset only Decrypt full chunk Skip in the DECRYPTED stream to reach OffsetInChunk * look breaking * refactor * error on no content * handle intra-block byte skipping * Incomplete HTTP Response Error Handling * multipart SSE * Update s3api_object_handlers.go * address comments * less logs * handling directory * Optimized rejectDirectoryObjectWithoutSlash() to avoid unnecessary lookups * Revert "handling directory" This reverts commit 3a335f0ac33c63f51975abc63c40e5328857a74b. * constant * Consolidate nil entry checks in GetObjectHandler * add range tests * Consolidate redundant nil entry checks in HeadObjectHandler * adjust logs * SSE type * large files * large files Reverted the plain-object range test * ErrNoEncryptionConfig * Fixed SSERangeReader Infinite Loop Vulnerability * Fixed SSE-KMS Multipart ChunkReader HTTP Body Leak * handle empty directory in S3, added PyArrow tests * purge unused code * Update s3_parquet_test.py * Update requirements.txt * According to S3 specifications, when both partNumber and Range are present, the Range should apply within the selected part's boundaries, not to the full object. * handle errors * errors after writing header * https * fix: Wait for volume assignment readiness before running Parquet tests The test-implicit-dir-with-server test was failing with an Internal Error because volume assignment was not ready when tests started. This fix adds a check that attempts a volume assignment and waits for it to succeed before proceeding with tests. This ensures that: 1. Volume servers are registered with the master 2. Volume growth is triggered if needed 3. The system can successfully assign volumes for writes Fixes the timeout issue where boto3 would retry 4 times and fail with 'We encountered an internal error, please try again.' * sse tests * store derived IV * fix: Clean up gRPC ports between tests to prevent port conflicts The second test (test-implicit-dir-with-server) was failing because the volume server's gRPC port (18080 = VOLUME_PORT + 10000) was still in use from the first test. The cleanup code only killed HTTP port processes, not gRPC port processes. Added cleanup for gRPC ports in all stop targets: - Master gRPC: MASTER_PORT + 10000 (19333) - Volume gRPC: VOLUME_PORT + 10000 (18080) - Filer gRPC: FILER_PORT + 10000 (18888) This ensures clean state between test runs in CI. * add import * address comments * docs: Add placeholder documentation files for Parquet test suite Added three missing documentation files referenced in test/s3/parquet/README.md: 1. TEST_COVERAGE.md - Documents 43 total test cases (17 Go unit tests, 6 Python integration tests, 20 Python end-to-end tests) 2. FINAL_ROOT_CAUSE_ANALYSIS.md - Explains the s3fs compatibility issue with PyArrow, the implicit directory problem, and how the fix works 3. MINIO_DIRECTORY_HANDLING.md - Compares MinIO's directory handling approach with SeaweedFS's implementation Each file contains: - Title and overview - Key technical details relevant to the topic - TODO sections for future expansion These placeholder files resolve the broken README links and provide structure for future detailed documentation. * clean up if metadata operation failed * Update s3_parquet_test.py * clean up * Update Makefile * Update s3_parquet_test.py * Update Makefile * Handle ivSkip for non-block-aligned offsets * Update README.md * stop volume server faster * stop volume server in 1 second * different IV for each chunk in SSE-S3 and SSE-KMS * clean up if fails * testing upload * error propagation * fmt * simplify * fix copying * less logs * endian * Added marshaling error handling * handling invalid ranges * error handling for adding to log buffer * fix logging * avoid returning too quickly and ensure proper cleaning up * Activity Tracking for Disk Reads * Cleanup Unused Parameters * Activity Tracking for Kafka Publishers * Proper Test Error Reporting * refactoring * less logs * less logs * go fmt * guard it with if entry.Attributes.TtlSec > 0 to match the pattern used elsewhere. * Handle bucket-default encryption config errors explicitly for multipart * consistent activity tracking * obsolete code for s3 on filer read/write handlers * Update weed/s3api/s3api_object_handlers_list.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 23:18:35 -08:00
Chris Lu	2a9d4d1e23	Refactor data structure (#7472 ) * refactor to avoids circular dependency * converts a policy.PolicyDocument to policy_engine.PolicyDocument * convert numeric types to strings * Update weed/s3api/policy_conversion.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * refactoring * not skipping numeric and boolean values in arrays * avoid nil * edge cases * handling conversion failure The handling of unsupported types in convertToString could lead to silent policy alterations. The conversion of map-based principals in convertPrincipal is too generic and could misinterpret policies. * concise * fix doc * adjust warning * recursion * return errors * reject empty principals * better error message --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-12 23:46:52 -08:00
Chris Lu	508d06d9a5	S3: Enforce bucket policy (#7471 ) * evaluate policies during authorization * cache bucket policy * refactor * matching with regex special characters * Case Sensitivity, pattern cache, Dead Code Removal * Fixed Typo, Restored []string Case, Added Cache Size Limit * hook up with policy engine * remove old implementation * action mapping * validate * if not specified, fall through to IAM checks * fmt * Fail-close on policy evaluation errors * Explicit `Allow` bypasses IAM checks * fix error message * arn:seaweed => arn:aws * remove legacy support * fix tests * Clean up bucket policy after this test * fix for tests * address comments * security fixes * fix tests * temp comment out	2025-11-12 22:14:50 -08:00
Nial	20e0d91037	IAM: add support for advanced IAM config file to server command (#7317 ) * IAM: add support for advanced IAM config file to server command * Add support for advanced IAM config file in S3 options * Fix S3 IAM config handling to simplify checks for configuration presence * simplify * simplify again * copy the value * const --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2025-10-28 17:30:12 -07:00
Konstantin Lebedev	bf58c5a688	fix: Use a mixed of virtual and path styles within a single subdomain (#7357 ) * fix: Use a mixed of virtual and path styles within a single subdomain * address comments * add tests --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2025-10-24 01:45:22 -07:00
Chris Lu	f7bd75ef3b	S3: Avoid in-memory map concurrent writes in SSE-S3 key manager (#7358 ) * Fix concurrent map writes in SSE-S3 key manager This commit fixes issue #7352 where parallel uploads to SSE-S3 enabled buckets were causing 'fatal error: concurrent map writes' crashes. The SSES3KeyManager struct had an unsynchronized map that was being accessed from multiple goroutines during concurrent PUT operations. Changes: - Added sync.RWMutex to SSES3KeyManager struct - Protected StoreKey() with write lock - Protected GetKey() with read lock - Updated GetOrCreateKey() with proper read/write locking pattern including double-check to prevent race conditions All existing SSE tests pass successfully. Fixes #7352 * Improve SSE-S3 key manager with envelope encryption Replace in-memory key storage with envelope encryption using a super key (KEK). Instead of storing DEKs in a map, the key manager now: - Uses a randomly generated 256-bit super key (KEK) - Encrypts each DEK with the super key using AES-GCM - Stores the encrypted DEK in object metadata - Decrypts the DEK on-demand when reading objects Benefits: - Eliminates unbounded memory growth from caching DEKs - Provides better security with authenticated encryption (AES-GCM) - Follows envelope encryption best practices (similar to AWS KMS) - No need for mutex-protected map lookups on reads - Each object's encrypted DEK is self-contained in its metadata This approach matches the design pattern used in the local KMS provider and is more suitable for production use. * Persist SSE-S3 KEK in filer for multi-server support Store the SSE-S3 super key (KEK) in the filer at /.seaweedfs/s3/kek instead of generating it per-server. This ensures: 1. Multi-server consistency: All S3 API servers use the same KEK 2. Persistence across restarts: KEK survives server restarts 3. Centralized management: KEK stored in filer, accessible to all servers 4. Automatic initialization: KEK is created on first startup if it doesn't exist The KEK is: - Stored as hex-encoded bytes in filer - Protected with file mode 0600 (read/write for owner only) - Located in /.seaweedfs/s3/ directory (mode 0700) - Loaded on S3 API server startup - Reused across all S3 API server instances This matches the architecture of centralized configuration in SeaweedFS and enables proper SSE-S3 support in multi-server deployments. * Change KEK storage location to /etc/s3/kek Move SSE-S3 KEK from /.seaweedfs/s3/kek to /etc/s3/kek for better organization and consistency with other SeaweedFS configuration files. The /etc directory is the standard location for configuration files in SeaweedFS. * use global sse-se key manager when copying * Update volume_growth_reservation_test.go * Rename KEK file to sse_kek for clarity Changed /etc/s3/kek to /etc/s3/sse_kek to make it clear this key is specifically for SSE-S3 encryption, not for other KMS purposes. This improves clarity and avoids potential confusion with the separate KMS provider system used for SSE-KMS. * Use constants for SSE-S3 KEK directory and file name Refactored to use named constants instead of string literals: - SSES3KEKDirectory = "/etc/s3" - SSES3KEKParentDir = "/etc" - SSES3KEKDirName = "s3" - SSES3KEKFileName = "sse_kek" This improves maintainability and makes it easier to change the storage location if needed in the future. * Address PR review: Improve error handling and robustness Addresses review comments from https://github.com/seaweedfs/seaweedfs/pull/7358#pullrequestreview-3367476264 Critical fixes: 1. Distinguish between 'not found' and other errors when loading KEK - Only generate new KEK if ErrNotFound - Fail fast on connectivity/permission errors to prevent data loss - Prevents creating new KEK that would make existing data undecryptable 2. Make SSE-S3 initialization failure fatal - Return error instead of warning when initialization fails - Prevents server from running in broken state 3. Improve directory creation error handling - Only ignore 'file exists' errors - Fail on permission/connectivity errors These changes ensure the SSE-S3 key manager is robust against transient errors and prevents accidental data loss. * Fix KEK path conflict with /etc/s3 file Changed KEK storage from /etc/s3/sse_kek to /etc/seaweedfs/s3_sse_kek to avoid conflict with the circuit breaker config at /etc/s3. The /etc/s3 path is used by CircuitBreakerConfigDir and may exist as a file (circuit_breaker.json), causing the error: 'CreateEntry /etc/s3/sse_kek: /etc/s3 should be a directory' New KEK location: /etc/seaweedfs/s3_sse_kek This uses the seaweedfs subdirectory which is more appropriate for internal SeaweedFS configuration files. Fixes startup failure when /etc/s3 exists as a file. * Revert KEK path back to /etc/s3/sse_kek Changed back from /etc/seaweedfs/s3_sse_kek to /etc/s3/sse_kek as requested. The /etc/s3 directory will be created properly when it doesn't exist. * Fix directory creation with proper ModeDir flag Set FileMode to uint32(0755 \| os.ModeDir) when creating /etc/s3 directory to ensure it's created as a directory, not a file. Without the os.ModeDir flag, the entry was being created as a file, which caused the error 'CreateEntry: /etc/s3 is a file' when trying to create the KEK file inside it. Uses 0755 permissions (rwxr-xr-x) for the directory and adds os import for os.ModeDir constant.	2025-10-22 14:12:31 -07:00

1 2 3

143 Commits