seaweedFS

Author	SHA1	Message	Date
Lars Lehtonen	b01a74c6bb	Prune Unused Functions from weed/s3api (#8815 ) * weed/s3api: prune calculatePartOffset() * weed/s3api: prune clearCachedListMetadata() * weed/s3api: prune S3ApiServer.isObjectRetentionActive() * weed/s3api: prune S3ApiServer.ensureDirectoryAllEmpty() * weed/s3api: prune s3ApiServer.getEncryptionTypeString() * weed/s3api: prune newStreamError() * weed/s3api: prune S3ApiServer.rotateSSECKey() weed/s3api: prune S3ApiServer.rotateSSEKMSKey() weed/s3api: prune S3ApiServer.rotateSSEKMSMetadataOnly() weed/s3api: prune S3ApiServer.rotateSSECChunks() weed/s3api: prune S3ApiServer.rotateSSEKMSChunks() weed/s3api: prune S3ApiServer.rotateSSECChunk() weed/s3api: prune S3ApiServer.rotateSSEKMSChunk() * weed/s3api: prune addCounterToIV() * weed/s3api: prune minInt() * weed/s3api: prune isMethodActionMismatch() * weed/s3api: prune hasSpecificQueryParameters() * weed/s3api: prune handlePutToFilerError() weed/s3api: prune handlePutToFilerInternalError() weed/s3api: prune logErrorAndReturn() weed/s3api: prune logInternalError weed/s3api: prune handleSSEError() weed/s3api: prune handleSSEInternalError() * weed/s3api: prune encryptionConfigToProto() * weed/s3api: prune S3ApiServer.touch()	2026-03-28 13:24:11 -07:00
Chris Lu	f6ec9941cb	lifecycle worker: NoncurrentVersionExpiration support (#8810 ) * lifecycle worker: add NoncurrentVersionExpiration support Add version-aware scanning to the rule-based execution path. When the walker encounters a .versions directory, processVersionsDirectory(): - Lists all version entries (v_<versionId>) - Sorts by version timestamp (newest first) - Walks non-current versions with ShouldExpireNoncurrentVersion() which handles both NoncurrentDays and NewerNoncurrentVersions - Extracts successor time from version IDs (both old/new format) - Skips delete markers in noncurrent version counting - Falls back to entry Mtime when version ID timestamp is unavailable Helper functions: - sortVersionsByTimestamp: insertion sort by version ID timestamp - getEntryVersionTimestamp: extracts timestamp with Mtime fallback * lifecycle worker: address review feedback for noncurrent versions - Use sentinel errLimitReached in versions directory handler - Set NoncurrentIndex on ObjectInfo for proper NewerNoncurrentVersions evaluation * lifecycle worker: fail closed on XML parse error, guard zero Mtime - Fail closed when lifecycle XML exists but fails to parse, instead of falling back to TTL which could apply broader rules - Guard Mtime > 0 before using time.Unix(mtime, 0) to avoid mapping unset Mtime to 1970, which would misorder versions and cause premature expiration * lifecycle worker: count delete markers toward NoncurrentIndex Noncurrent delete markers should count toward the NewerNoncurrentVersions retention threshold so data versions get the correct position index. Previously, skipping delete markers without incrementing the index could retain too many versions after delete/recreate cycles. * lifecycle worker: fix version ordering, error propagation, and fail-closed scope 1. Use full version ID comparison (CompareVersionIds) for sorting .versions entries, not just decoded timestamps. Two versions with the same timestamp prefix but different random suffixes were previously misordered, potentially treating the newest version as noncurrent and deleting it. 2. Propagate .versions listing failures to the caller instead of swallowing them with (nil, 0). Transient filer errors on a .versions directory now surface in the job result. 3. Narrow the fail-closed path to only malformed lifecycle XML (errMalformedLifecycleXML). Transient filer LookupEntry errors now fall back to TTL with a warning, matching the original intent of "fail closed on bad config, not on network blips." * lifecycle worker: only skip .uploads at bucket root * lifecycle worker: sort.Slice, mixed-format test, XML presence tracking - Replace manual insertion sort with sort.Slice in sortVersionsByVersionId - Add TestCompareVersionIdsMixedFormats covering old/new format ordering - Distinguish "no lifecycle XML" (nil) from "XML present but no effective rules" (non-nil empty slice) so buckets with all-disabled rules don't incorrectly fall back to filer.conf TTL expiration * lifecycle worker: guard nil Attributes, use TrimSuffix in test - Guard entry.Attributes != nil before accessing GetFileSize() and Mtime in both listExpiredObjectsByRules and processVersionsDirectory - Use strings.TrimPrefix/TrimSuffix in TestVersionsDirectoryNaming to match the production code pattern * lifecycle worker: skip TTL scan when XML present, fix test assertions - When lifecycle XML is present but has no effective rules, skip object scanning entirely instead of falling back to TTL path - Test sort output against concrete expected names instead of re-using the same comparator as the sort itself --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 12:58:21 -07:00
Chris Lu	54dd4f091d	s3lifecycle: add lifecycle rule evaluator package and extend XML types (#8807 ) * s3api: extend lifecycle XML types with NoncurrentVersionExpiration, AbortIncompleteMultipartUpload Add missing S3 lifecycle rule types to the XML data model: - NoncurrentVersionExpiration with NoncurrentDays and NewerNoncurrentVersions - NoncurrentVersionTransition with NoncurrentDays and StorageClass - AbortIncompleteMultipartUpload with DaysAfterInitiation - Filter.ObjectSizeGreaterThan and ObjectSizeLessThan - And.ObjectSizeGreaterThan and ObjectSizeLessThan - Filter.UnmarshalXML to properly parse Tag, And, and size filter elements Each new type follows the existing set-field pattern for conditional XML marshaling. No behavior changes - these types are not yet wired into handlers or the lifecycle worker. * s3lifecycle: add lifecycle rule evaluator package New package weed/s3api/s3lifecycle/ provides a pure-function lifecycle rule evaluation engine. The evaluator accepts flattened Rule structs and ObjectInfo metadata, and returns the appropriate Action. Components: - evaluator.go: Evaluate() for per-object actions with S3 priority ordering (delete marker > noncurrent version > current expiration), ShouldExpireNoncurrentVersion() with NewerNoncurrentVersions support, EvaluateMPUAbort() for multipart upload rules - filter.go: prefix, tag, and size-based filter matching - tags.go: ExtractTags() extracts S3 tags from filer Extended metadata, HasTagRules() for scan-time optimization - version_time.go: GetVersionTimestamp() extracts timestamps from SeaweedFS version IDs (both old and new format) Comprehensive test coverage: 54 tests covering all action types, filter combinations, edge cases, and version ID formats. * s3api: add UnmarshalXML for Expiration, Transition, ExpireDeleteMarker Add UnmarshalXML methods that set the internal 'set' flag during XML parsing. Previously these flags were only set programmatically, causing XML round-trip to drop elements. This ensures lifecycle configurations stored as XML survive unmarshal/marshal cycles correctly. Add comprehensive XML round-trip tests for all lifecycle rule types including NoncurrentVersionExpiration, AbortIncompleteMultipartUpload, Filter with Tag/And/size constraints, and a complete Terraform-style lifecycle configuration. * s3lifecycle: address review feedback - Fix version_time.go overflow: guard timestampPart > MaxInt64 before the inversion subtraction to prevent uint64 wrap - Make all expiry checks inclusive (!now.Before instead of now.After) so actions trigger at the exact scheduled instant - Add NoncurrentIndex to ObjectInfo so Evaluate() can properly handle NewerNoncurrentVersions via ShouldExpireNoncurrentVersion() - Add test for high-bit overflow version ID * s3lifecycle: guard ShouldExpireNoncurrentVersion against zero SuccessorModTime Add early return when obj.IsLatest or obj.SuccessorModTime.IsZero() to prevent premature expiration of versions with uninitialized successor timestamps (zero value would compute to epoch, always expired). --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-28 11:10:31 -07:00
Chris Lu	7d5cbfd547	s3: support s3:x-amz-server-side-encryption policy condition (#8806 ) * s3: support s3:x-amz-server-side-encryption policy condition (#7680) - Normalize x-amz-server-side-encryption header values to canonical form (aes256 → AES256, aws:kms mixed-case → aws:kms) so StringEquals conditions work regardless of client capitalisation - Exempt UploadPart and UploadPartCopy from SSE Null conditions: these actions inherit SSE from the initial CreateMultipartUpload request and do not re-send the header, so Deny/Null("true") should not block them - Add sse_condition_test.go covering StringEquals, Null, case-insensitive normalisation, and multipart continuation action exemption * s3: address review comments on SSE condition support - Replace "inherited" sentinel in injectSSEForMultipart with "AES256" so that StringEquals/Null conditions evaluate against a meaningful value; add TODO noting that KMS multipart uploads need the actual algorithm looked up from the upload state - Rewrite TestSSECaseInsensitiveNormalization to drive normalisation through EvaluatePolicyForRequest with a real http.Request so regressions in the production code path are caught; split into AES256 and aws:kms variants to cover both normalisation branches s3: plumb real inherited SSE from multipart upload state into policy eval Instead of injecting a static "AES256" sentinel for UploadPart/UploadPartCopy, look up the actual SSE algorithm from the stored CreateMultipartUpload entry and pass it through the evaluation chain. Changes: - PolicyEvaluationArgs gains InheritedSSEAlgorithm string; set by the BucketPolicyEngine wrapper for multipart continuation actions - injectSSEForMultipart(conditions, inheritedSSE) now accepts the real algorithm; empty string means no SSE → Null("true") fires correctly - IsMultipartContinuationAction exported so the s3api wrapper can use it - BucketPolicyEngine gets a MultipartSSELookup callback (set by S3ApiServer) that fetches the upload entry and reads SeaweedFSSSEKMSKeyID / SeaweedFSSSES3Encryption to determine the algorithm - S3ApiServer.getMultipartSSEAlgorithm implements the lookup via getEntry - Tests updated: three multipart cases (AES256, aws:kms, no-SSE-must-deny) plus UploadPartCopy coverage	2026-03-27 23:15:01 -07:00
Chris Lu	e3f052cd84	s3api: preserve lifecycle config responses for Terraform (#8805 ) * s3api: preserve lifecycle configs for terraform * s3api: bound lifecycle config request bodies * s3api: make bucket config updates copy-on-write * s3api: tighten string slice cloning	2026-03-27 22:50:02 -07:00
Chris Lu	0adb78bc6b	s3api: make conditional mutations atomic and AWS-compatible (#8802 ) * s3api: serialize conditional write finalization * s3api: add conditional delete mutation checks * s3api: enforce destination conditions for copy * s3api: revalidate multipart completion under lock * s3api: rollback failed put finalization hooks * s3api: report delete-marker version deletions * s3api: fix copy destination versioning edge cases * s3api: make versioned multipart completion idempotent * test/s3: cover conditional mutation regressions * s3api: rollback failed copy version finalization * s3api: resolve suspended delete conditions via latest entry * s3api: remove copy test null-version injection * s3api: reject out-of-order multipart completions * s3api: preserve multipart replay version metadata * s3api: surface copy destination existence errors * s3api: simplify delete condition target resolution * test/s3: make conditional delete assertions order independent * test/s3: add distributed lock gateway integration * s3api: fail closed multipart versioned completion * s3api: harden copy metadata and overwrite paths * s3api: create delete markers for suspended deletes * s3api: allow duplicate multipart completion parts	2026-03-27 19:22:26 -07:00
Chris Lu	17028fbf59	fix: serialize SSE-KMS metadata when bucket default encryption applies KMS (#8780 ) * fix: serialize SSE-KMS metadata when bucket default encryption applies KMS When a bucket has default SSE-KMS encryption enabled and a file is uploaded without explicit SSE headers, the encryption was applied correctly but the SSE-KMS metadata (x-seaweedfs-sse-kms-key) was not serialized. This caused downloads to fail with "empty SSE-KMS metadata" because the entry's Extended map stored an empty byte slice. The existing code already handled this for SSE-S3 bucket defaults (SerializeSSES3Metadata) but was missing the equivalent call to SerializeSSEKMSMetadata for the KMS path. Fixes seaweedfs/seaweedfs#8776 * ci: add KMS integration tests to GitHub Actions Add a kms-tests.yml workflow that runs on changes to KMS/SSE code with two jobs: 1. KMS provider tests: starts OpenBao via Docker, runs Go integration tests in test/kms/ against a real KMS backend 2. S3 KMS e2e tests: starts OpenBao + weed mini built from source, runs test_s3_kms.sh which covers bucket-default SSE-KMS upload/download (the exact scenario from #8776) Supporting changes: - test/kms/Makefile: add CI targets (test-provider-ci, test-s3-kms-ci) that manage OpenBao via plain Docker and run weed from source - test/kms/s3-config-openbao-template.json: S3 config template with OpenBao KMS provider for weed mini * refactor: combine SSE-S3 and SSE-KMS metadata serialization into else-if SSE-S3 and SSE-KMS bucket default encryption are mutually exclusive, so use a single if/else-if block instead of two independent if blocks. * Update .github/workflows/kms-tests.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix(ci): start weed mini from data dir to avoid Docker filer.toml weed mini reads filer.toml from the current working directory first. When running from test/kms/, it picked up the Docker-targeted filer.toml which has dir="/data/filerdb" (a path that doesn't exist in CI), causing a fatal crash at filer store initialization. Fix by cd-ing to the data directory before starting weed mini. Also improve log visibility on failure. --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-03-26 14:07:01 -07:00
Chris Lu	28fe92065a	S3: reject part uploads after AbortMultipartUpload (#8768 ) * S3: reject part uploads after AbortMultipartUpload PutObjectPartHandler did not verify that the multipart upload session still exists before accepting parts. After AbortMultipartUpload deleted the upload directory, the ErrNotFound from getEntry was silently ignored (treated as "may be non-SSE upload"), allowing parts to be stored as orphaned files. Now return ErrNoSuchUpload when the upload directory is not found, matching AWS S3 behavior. Fixes #8766 * S3: check upload existence unconditionally in PutObjectPartHandler Move the getEntry call out of the SSE-type conditional so the upload existence check runs for all part uploads, including SSE-C. Previously the SSE-C path skipped the check entirely, allowing parts to be uploaded after abort when SSE-C headers were present. Also flattens the nested SSE branching by one level now that getEntry is called once upfront. * S3: address PR review feedback for PutObjectPartHandler - Log at error level when getEntry fails with an unexpected error, since we return ErrInternalError to the client - Distinguish base IV decode errors from length validation failures with separate, clearer error messages --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-24 18:11:51 -07:00
Chris Lu	0b3867dca3	filer: add structured error codes to CreateEntryResponse (#8767 ) * filer: add FilerError enum and error_code field to CreateEntryResponse Add a machine-readable error code alongside the existing string error field. This follows the precedent set by PublishMessageResponse in the MQ broker proto. The string field is kept for human readability and backward compatibility. Defined codes: OK, ENTRY_NAME_TOO_LONG, PARENT_IS_FILE, EXISTING_IS_DIRECTORY, EXISTING_IS_FILE, ENTRY_ALREADY_EXISTS. * filer: add sentinel errors and error code mapping in filer_pb Define sentinel errors (ErrEntryNameTooLong, ErrParentIsFile, etc.) in the filer_pb package so both the filer and consumers can reference them without circular imports. Add FilerErrorToSentinel() to map proto error codes to sentinels, and update CreateEntryWithResponse() to check error_code first, falling back to the string-based path for backward compatibility with old servers. * filer: return wrapped sentinel errors and set proto error codes Replace fmt.Errorf string errors in filer.CreateEntry, UpdateEntry, and ensureParentDirectoryEntry with wrapped filer_pb sentinel errors (using %w). This preserves errors.Is() traversal on the server side. In the gRPC CreateEntry handler, map sentinel errors to the corresponding FilerError proto codes using errors.Is(), setting both resp.Error (string, for backward compat) and resp.ErrorCode (enum). * S3: use errors.Is() with filer sentinels instead of string matching Replace fragile string-based error matching in filerErrorToS3Error and other S3 API consumers with errors.Is() checks against filer_pb sentinel errors. This works because the updated CreateEntryWithResponse helper reconstructs sentinel errors from the proto FilerError code. Update iceberg stage_create and metadata_files to check resp.ErrorCode instead of parsing resp.Error strings. Update SSE-S3 to use errors.Is() for the already-exists check. String matching is retained only for non-filer errors (gRPC transport errors, checksum validation) that don't go through CreateEntryResponse. * filer: remove backward-compat string fallbacks for error codes Clients and servers are always deployed together, so there is no need for backward-compatibility fallback paths that parse resp.Error strings when resp.ErrorCode is unset. Simplify all consumers to rely solely on the structured error code. * iceberg: ensure unknown non-OK error codes are not silently ignored When FilerErrorToSentinel returns nil for an unrecognized error code, return an error including the code and message rather than falling through to return nil. * filer: fix redundant error message and restore error wrapping in helper Use request path instead of resp.Error in the sentinel error format string to avoid duplicating the sentinel message (e.g. "entry already exists: entry already exists"). Restore %w wrapping with errors.New() in the fallback paths so callers can use errors.Is()/errors.As(). * filer: promote file to directory on path conflict instead of erroring S3 allows both "foo/bar" (object) and "foo/bar/xyzzy" (another object) to coexist because S3 has a flat key space. When ensureParentDirectoryEntry finds a parent path that is a file instead of a directory, promote it to a directory by setting ModeDir while preserving the original content and chunks. Use Store.UpdateEntry directly to bypass the Filer.UpdateEntry type-change guard. This fixes the S3 compatibility test failures where creating overlapping keys (e.g. "foo/bar" then "foo/bar/xyzzy") returned ExistingObjectIsFile.	2026-03-24 17:08:22 -07:00
Chris Lu	152884eff2	S3: add s3: prefix to x-amz-* condition keys for AWS compatibility (#8765 ) AWS S3 policy conditions reference request headers with the s3: namespace prefix (e.g., s3:x-amz-server-side-encryption). The extraction code was storing these headers without the prefix, so bucket policy conditions using the standard AWS key names would never match.	2026-03-24 14:04:42 -07:00
Chris Lu	2877febd73	S3: fix silent PutObject failure and enforce 1024-byte key limit (#8764 ) * S3: add KeyTooLongError error code Add ErrKeyTooLongError (HTTP 400, code "KeyTooLongError") to match the standard AWS S3 error for object keys that exceed length limits. * S3: fix silent PutObject failure when entry name exceeds max_file_name_length putToFiler called client.CreateEntry() directly and discarded the gRPC response. The filer embeds application errors like "entry name too long" in resp.Error (not as gRPC transport errors), so the error was silently swallowed and clients received HTTP 200 with an ETag for objects that were never stored. Switch to the filer_pb.CreateEntry() helper which properly checks resp.Error, and map "entry name too long" to KeyTooLongError (HTTP 400). To avoid fragile string parsing across the gRPC boundary, define shared error message constants in weed/util/constants and use them in both the filer (producing errors) and S3 API (matching errors). Switch filerErrorToS3Error to use strings.Contains/HasSuffix with these constants so matches work regardless of any wrapper prefix. Apply filerErrorToS3Error to the mkdir path for directory markers. Fixes #8759 * S3: enforce 1024-byte maximum object key length AWS S3 limits object keys to 1024 bytes. Add early validation on write paths (PutObject, CopyObject, CreateMultipartUpload) to reject keys exceeding the limit with the standard KeyTooLongError (HTTP 400). The key length check runs before bucket auto-creation to prevent overlong keys from triggering unnecessary side effects. Also use filerErrorToS3Error for CopyObject's mkFile error paths so name-too-long errors from the filer return KeyTooLongError instead of InternalError. Ref #8758 * S3: add handler-level tests for key length validation and error mapping Add tests for filerErrorToS3Error mapping "entry name too long" to KeyTooLongError, including a regression test for the CreateEntry-prefixed "existing ... is a directory" form. Add handler-level integration tests that exercise PutObjectHandler, CopyObjectHandler, and NewMultipartUploadHandler via httptest, verifying HTTP 400 and KeyTooLongError XML response for overlong keys and acceptance of keys at the 1024-byte limit.	2026-03-24 13:35:28 -07:00
Chris Lu	e5f72077ee	fix: resolve CORS cache race condition causing stale 404 responses (#8748 ) The metadata subscription handler (updateBucketConfigCacheFromEntry) was making a separate RPC call via loadCORSFromBucketContent to load CORS configuration. This created a race window where a slow CreateBucket subscription event could re-cache stale data after PutBucketCors had already cleared the cache, causing subsequent GetBucketCors to return 404 NoSuchCORSConfiguration. Parse CORS directly from the subscription entry's Content field instead of making a separate RPC. Also fix getBucketConfig to parse CORS from the already-fetched entry, eliminating a redundant RPC call. Fix TestCORSCaching to use require.NoError to prevent nil pointer dereference panics when GetBucketCors fails.	2026-03-23 19:33:20 -07:00
Chris Lu	d5ee35c8df	Fix S3 delete for non-empty directory markers (#8740 ) * Fix S3 delete for non-empty directory markers * Address review feedback on directory marker deletes * Stabilize FUSE concurrent directory operations	2026-03-23 13:35:16 -07:00
Chris Lu	6ccda3e809	fix(s3): allow deleting the anonymous user from admin webui (#8706 ) Remove the block that prevented deleting the "anonymous" identity and stop auto-creating it when absent. If no anonymous identity exists (or it is disabled), LookupAnonymous returns not-found and both auth paths return ErrAccessDenied for anonymous requests. To enable anonymous access, explicitly create the "anonymous" user. To revoke it, delete the user like any other identity. Closes #8694	2026-03-19 18:10:20 -07:00
Chris Lu	80f3079d2a	fix(s3): include directory markers in ListObjects without delimiter (#8704 ) * fix(s3): include directory markers in ListObjects without delimiter (#8698) Directory key objects (zero-byte objects with keys ending in "/") created via PutObject were omitted from ListObjects/ListObjectsV2 results when no delimiter was specified. AWS S3 includes these as regular keys in Contents. The issue was in doListFilerEntries: when recursing into directories in non-delimiter mode, directory key objects were only emitted when prefixEndsOnDelimiter was true. Added an else branch to emit them in the general recursive case as well. * remove issue reference from inline comment * test: add child-under-marker and paginated listing coverage Extend test 6 to place a child object under the directory marker and paginate with MaxKeys=1 so the emit-then-recurse truncation path is exercised. * fix(test): skip directory markers in Spark temporary artifacts check The listing check now correctly shows directory markers (keys ending in "/") after the ListObjects fix. These 0-byte metadata objects are not data artifacts — filter them from the listing check since the HeadObject-based check already verifies their cleanup with a timeout.	2026-03-19 15:36:11 -07:00
Chris Lu	c197206897	fix(s3): return ETag header for directory marker PutObject requests (#8688 ) * fix(s3): return ETag header for directory marker PutObject requests The PutObject handler has a special path for keys ending with "/" (directory markers) that creates the entry via mkdir. This path never computed or set the ETag response header, unlike the regular PutObject path. AWS S3 always returns an ETag header, even for empty-body puts. Compute the MD5 of the content (empty or otherwise), store it in the entry attributes and extended attributes, and set the ETag response header. Fixes #8682 * fix: handle io.ReadAll error and chunked encoding for directory markers Address review feedback: - Handle error from io.ReadAll instead of silently discarding it - Change condition from ContentLength > 0 to ContentLength != 0 to correctly handle chunked transfer encoding (ContentLength == -1) * fix hanging tests	2026-03-18 17:26:33 -07:00
Chris Lu	9984ce7dcb	fix(s3): omit NotResource:null from bucket policy JSON response (#8658 ) * fix(s3): omit NotResource:null from bucket policy JSON response (#8657) Change NotResource from value type to pointer (StringOrStringSlice) so that omitempty properly omits it when unset, matching the existing Principal field pattern. This prevents IaC tools (Terraform, Ansible) from detecting false configuration drift. Add bucket policy round-trip idempotency integration tests. simplify JSON comparison in bucket policy idempotency test Use require.JSONEq directly on the raw JSON strings instead of round-tripping through unmarshal/marshal, since JSONEq already handles normalization internally. * fix bucket policy test cases that locked out the admin user The Deny+NotResource test cases used Action:"s3:" which denied the admin's own GetBucketPolicy call. Scope deny to s3:GetObject only, and add an Allow+NotResource variant instead. fix(s3): also make Resource a pointer to fix empty string in JSON Apply the same omitempty pointer fix to the Resource field, which was emitting "Resource":"" when only NotResource was set. Add NewStringOrStringSlicePtr helper, make Strings() nil-safe, and handle StringOrStringSlice in normalizeToStringSliceWithError. improve bucket policy integration tests per review feedback - Replace time.Sleep with waitForClusterReady using ListBuckets - Use structural hasKey check instead of brittle substring NotContains - Assert specific NoSuchBucketPolicy error code after delete - Handle single-statement policies in hasKey helper	2026-03-16 12:58:26 -07:00
Chris Lu	acea36a181	filer: add conditional update preconditions (#8647 ) * filer: add conditional update preconditions * iceberg: tighten metadata CAS preconditions	2026-03-16 12:33:32 -07:00
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:27:42 -07:00
Chris Lu	8ac4caf930	fix(s3api): return no-encryption instead of error when bucket metadata is missing When getEncryptionConfiguration encounters a not-found error (e.g., during bucket recreation after a partial delete), return ErrNoSuchBucketEncryptionConfiguration instead of ErrInternalError. This prevents uploads from failing with 500 errors during recovery.	2026-03-11 13:49:21 -07:00
Chris Lu	ab85f46529	fix(s3api): clear negative cache in autoCreateBucket when bucket exists When autoCreateBucket finds the bucket already exists, remove it from the negative cache so subsequent requests don't unnecessarily trigger another auto-create attempt.	2026-03-11 13:49:15 -07:00
Chris Lu	5208c7c727	fix(s3api): improve PutBucketHandler comment for orphaned collection recovery Clarify the comment and log message for the case where a collection exists but the bucket directory is missing, explaining the root cause (partial deletion) more precisely.	2026-03-11 13:49:11 -07:00
Chris Lu	12b360f499	fix(s3api): delete bucket directory before collection to prevent inconsistent state Reorder DeleteBucketHandler to remove the bucket directory first, then delete the collection. If collection deletion fails, the bucket is still effectively deleted and can be recreated. Previously, if directory deletion succeeded but collection deletion failed, the bucket was left in an unrecoverable state.	2026-03-11 13:49:03 -07:00
Chris Lu	d1a631123f	fix(s3api): allow bucket recreation when orphaned collection exists (#8605 ) * fix(s3api): allow bucket recreation when orphaned collection exists (#8601) When a bucket is deleted, its filer directory is removed but the underlying collection/volumes may not be fully cleaned up yet. If the bucket is immediately recreated, PutBucketHandler was returning ErrBucketAlreadyExists due to the orphaned collection, blocking bucket recreation and causing subsequent uploads to fail with InternalError. Allow bucket creation to proceed when a collection exists without a corresponding bucket directory, since this is a transient orphaned state from a previous deletion. * fix(s3api): handle concurrent bucket creation race in mkdir On mkdir failure, re-check whether the bucket directory now exists and return BucketAlreadyExists instead of InternalError when another request created the bucket concurrently.	2026-03-11 13:42:43 -07:00
Chris Lu	e1e4c9437a	fix(s3api): ListObjects with trailing-slash prefix matches sibling directories (#8599 ) fix(s3api): ListObjects with trailing-slash prefix returns wrong results When ListObjectsV2 is called with a prefix ending in "/" (e.g., "foo/"), normalizePrefixMarker strips the trailing slash and splits into dir="parent" and prefix="foo". The filer then lists entries matching prefix "foo", which returns both directory "foo" and "foo1000". The prefixEndsOnDelimiter guard correctly identifies directory "foo" as the target and recurses into it, but then resets the guard to false. The loop continues and incorrectly recurses into "foo1000" as well, causing the listing to return objects from unrelated directories. Fix: after recursing into the exact directory targeted by the trailing-slash prefix, return immediately from the listing loop. There is no reason to process sibling entries since the original prefix specifically targeted one directory.	2026-03-11 02:28:34 -07:00
Chris Lu	0a5c5ed4ce	Persist S3 bucket counter metrics across idle periods (#8595 ) * Stop deleting counter metrics during bucket TTL cleanup Counter metrics (traffic bytes, request counts, object counts) are monotonically increasing by design. Deleting them after 10 minutes of bucket inactivity causes them to vanish from /metrics output and reset to zero when traffic resumes, breaking Prometheus rate()/increase() queries and making historical traffic reporting impossible. Only delete gauges and histograms in the TTL cleanup loop, as these represent current state and are safely re-populated on next activity. Fixes https://github.com/seaweedfs/seaweedfs/issues/8521 * Clean up all bucket metrics on bucket deletion Add DeleteBucketMetrics() to delete all metrics (including counters) for a bucket when it is explicitly deleted. This prevents unbounded label cardinality from accumulating for buckets that no longer exist. Called from DeleteBucketHandler after successful bucket deletion. * Reduce mutex scope in bucket metrics TTL sweep Collect expired bucket names under the lock, then release before calling DeletePartialMatch on Prometheus metrics. This prevents RecordBucketActiveTime from blocking during the expensive cleanup.	2026-03-10 19:00:40 -07:00
Chris Lu	f8d783f80e	fix: ListObjectVersions interleave Version and DeleteMarker in sort order (#8567 ) * fix: ListObjectVersions interleave Version and DeleteMarker in sort order Go's default xml.Marshal serializes struct fields in definition order, causing all <Version> elements to appear before all <DeleteMarker> elements. The S3 API contract requires these elements to be interleaved in the correct global sort order (by key ascending, then newest version first within each key). This broke clients that validate version list ordering within a single key — an older Version would appear before a newer DeleteMarker for the same object. Fix: Replace the separate Versions/DeleteMarkers/CommonPrefixes arrays with a single Entries []VersionListEntry slice. Each VersionListEntry uses a per-element MarshalXML that outputs the correct XML tag name (<Version>, <DeleteMarker>, or <CommonPrefixes>) based on which field is populated. Since the entries are already in their correct sorted order from buildSortedCombinedList, the XML output is automatically interleaved correctly. Also removes the unused ListObjectVersionsResult struct. Note: The reporter also mentioned a cross-key timestamp ordering issue when paginating with max-keys=1, but that is correct S3 behavior — ListObjectVersions sorts by key name (ascending), not by timestamp. Different keys having non-monotonic timestamps is expected. * test: add CommonPrefixes XML marshaling coverage for ListObjectVersions * fix: validate VersionListEntry has exactly one field set in MarshalXML Return an error instead of silently emitting an empty <Version> element when no field (or multiple fields) are populated. Also clean up the misleading xml:"Version" struct tag on the Entries field.	2026-03-09 12:37:59 -07:00
Chris Lu	992db11d2b	iam: add IAM group management (#8560 ) * iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742). * iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration. * iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser. * iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction. * iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects. * iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies. * iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers. * iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads. * admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation. * test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow. * iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation * iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed. * iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries. * iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format. * iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response. * iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies. * iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop. * test: use t.Cleanup for bucket cleanup in group policy test * iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500. * admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation. * iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration * admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies * iam: short-circuit UpdateUser when newUserName equals current name * iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors. * ci: add weed/pb/** to S3 IAM test trigger paths * test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation. * fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation. * feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status. * fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework. * feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity. * fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element. * fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename. * fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename. * fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window. * fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket. * fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization. * fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies. * fix: use errors.Is for sql.ErrNoRows comparison in postgres group store * docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol. * fix: remove unused policyNameFromArn and strings import * fix: update service account ParentUser on user rename When renaming a user via UpdateUser, also update ParentUser references in service accounts to prevent them from becoming orphaned after the next configuration reload. * fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500. * fix: use admin S3 client for bucket cleanup in enforcement test The user S3 client may lack permissions by cleanup time since the user is removed from the group in an earlier subtest. Use the admin S3 client to ensure bucket and object cleanup always succeeds. * fix: add nil guard for group param in propagating store log calls Prevent potential nil dereference when logging group.Name in CreateGroup and UpdateGroup of PropagatingCredentialStore. * fix: validate Disabled field in UpdateGroup handlers Reject values other than "true" or "false" with InvalidInputException instead of silently treating them as false. * fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration Previously the merge started with empty group maps, dropping any static-file groups. Now seeds from existing iam.groups before overlaying dynamic config, and builds the reverse index after merging to avoid stale entries from overridden groups. * fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading Replace direct equality (==) with errors.Is() to correctly match wrapped errors, consistent with the rest of the codebase. * fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes. * fix: log cleanup errors in group integration tests Replace fire-and-forget cleanup calls with error-checked versions that log failures via t.Logf for debugging visibility. * fix: prevent duplicate group test runs in CI matrix The basic lane's -run "TestIAM" regex also matched TestIAMGroup* tests, causing them to run in both the basic and group lanes. Replace with explicit test function names. * fix: add GIN index on groups.members JSONB for membership lookups Without this index, ListGroupsForUser and membership queries require full table scans on the groups table. * fix: handle cross-directory moves in IAM config subscription When a file is moved out of an IAM directory (e.g., /etc/iam/groups), the dir variable was overwritten with NewParentPath, causing the source directory change to be missed. Now also notifies handlers about the source directory for cross-directory moves. * fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards. * fix: merge groups by name instead of blind append during filer load Match the identity loader's merge behavior: find existing group by name and replace, only append when no match exists. Prevents duplicates when legacy and multi-file configs overlap. * fix: check DeleteEntry response error when cleaning obsolete group files Capture and log resp.Error from filer DeleteEntry calls during group file cleanup, matching the pattern used in deleteGroupFile. * fix: verify source user exists before no-op check in UpdateUser Reorder UpdateUser to find the source identity first and return NoSuchEntityException if not found, before checking if the rename is a no-op. Previously a non-existent user renamed to itself would incorrectly return success. * fix: update service account parent refs on user rename in embedded IAM The embedded IAM UpdateUser handler updated group membership but not service account ParentUser fields, unlike the standalone handler. * fix: replay source-side events for all handlers on cross-dir moves Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for the source directory during cross-directory moves, so all watchers can clear caches for the moved-away resource. * fix: don't seed mergedGroups from existing iam.groups in merge Groups are always dynamic (from filer), never static (from s3.config). Seeding from iam.groups caused stale deleted groups to persist. Now only uses config.Groups from the dynamic filer config. * fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect Register t.Cleanup for the created user so it gets cleaned up even if the test fails before the inline DeleteUser call. * fix: assert UpdateGroup HTTP status in disabled group tests Add require.Equal checks for 200 status after UpdateGroup calls so the test fails immediately on API errors rather than relying on the subsequent Eventually timeout. * fix: trim whitespace from group name in filer store operations Trim leading/trailing whitespace from group.Name before validation in CreateGroup and UpdateGroup to prevent whitespace-only filenames. Also merge groups by name during multi-file load to prevent duplicates. * fix: add nil/empty group validation in gRPC store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics and invalid persistence. * fix: add nil/empty group validation in postgres store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil member access and empty-name row inserts. * fix: add name collision check in embedded IAM UpdateUser The embedded IAM handler renamed users without checking if the target name already existed, unlike the standalone handler. * fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500. * fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found". * fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency. * fix: add nil/empty group validation in memory store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil pointer dereference on map access. * fix: reorder embedded IAM UpdateUser to verify source first Find the source identity before checking for collisions, matching the standalone handler's logic. Previously a non-existent user renamed to an existing name would get EntityAlreadyExists instead of NoSuchEntity. * fix: handle same-directory renames in metadata subscription Replay a delete event for the old entry name during same-directory renames so handlers like onBucketMetadataChange can clean up stale state for the old name. * fix: abort GetGroups on non-ErrGroupNotFound errors Only skip groups that return ErrGroupNotFound. Other errors (e.g., transient backend failures) now abort the handler and return the error to the caller instead of silently producing partial results. * fix: add aria-label and title to icon-only group action buttons Add accessible labels to View and Delete buttons so screen readers and tooltips provide meaningful context. * fix: validate group name in saveGroup to prevent invalid filenames Trim whitespace and reject empty names before writing group JSON files, preventing creation of files like ".json". * fix: add /etc/iam/groups to filer subscription watched directories The groups directory was missing from the watched directories list, so S3 servers in a cluster would not detect group changes made by other servers via filer. The onIamConfigChange handler already had code to handle group directory changes but it was never triggered. * add direct gRPC propagation for group changes to S3 servers Groups now have the same dual propagation as identities and policies: direct gRPC push via propagateChange + async filer subscription. - Add PutGroup/RemoveGroup proto messages and RPCs - Add PutGroup/RemoveGroup in-memory cache methods on IAM - Add PutGroup/RemoveGroup gRPC server handlers - Update PropagatingCredentialStore to call propagateChange on group mutations * reduce log verbosity for config load summary Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof to avoid noisy output on every config reload. * admin: show user groups in view and edit user modals - Add Groups field to UserDetails and populate from credential manager - Show groups as badges in user details view modal - Add group management to edit user modal: display current groups, add to group via dropdown, remove from group via badge x button * fix: remove duplicate showAlert that broke modal-alerts.js admin.js defined showAlert(type, message) which overwrote the modal-alerts.js version showAlert(message, type), causing broken unstyled alert boxes. Remove the duplicate and swap all callers in admin.js to use the correct (message, type) argument order. * fix: unwrap groups API response in edit user modal The /api/groups endpoint returns {"groups": [...]}, not a bare array. * Update object_store_users_templ.go * test: assert AccessDenied error code in group denial tests Replace plain assert.Error checks with awserr.Error type assertion and AccessDenied code verification, matching the pattern used in other IAM integration tests. * fix: propagate GetGroups errors in ShowGroups handler getGroupsPageData was swallowing errors and returning an empty page with 200 status. Now returns the error so ShowGroups can respond with a proper error status. * fix: reject AttachGroupPolicy when credential manager is nil Previously skipped policy existence validation when credentialManager was nil, allowing attachment of nonexistent policies. Now returns a ServiceFailureException error. * fix: preserve groups during partial MergeS3ApiConfiguration updates UpsertIdentity calls MergeS3ApiConfiguration with a partial config containing only the updated identity (nil Groups). This was wiping all in-memory group state. Now only replaces groups when config.Groups is non-nil (full config reload). * fix: propagate errors from group lookup in GetObjectStoreUserDetails ListGroups and GetGroup errors were silently ignored, potentially showing incomplete group data in the UI. * fix: use DOM APIs for group badge remove button to prevent XSS Replace innerHTML with onclick string interpolation with DOM createElement + addEventListener pattern. Also add aria-label and title to the add-to-group button. * fix: snapshot group policies under RLock to prevent concurrent map access evaluateIAMPolicies was copying the map reference via groupMap := iam.groups under RLock then iterating after RUnlock, while PutGroup mutates the map in-place. Now copies the needed policy names into a slice while holding the lock. * fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers Match the nil guard pattern used by PutPolicy/DeletePolicy to prevent nil pointer dereference when IAM is not initialized.	2026-03-09 11:54:32 -07:00
Chris Lu	d89eb8267f	s3: use url.PathUnescape for X-Amz-Copy-Source header (#8545 ) * s3: use url.PathUnescape for X-Amz-Copy-Source header (#8544) The X-Amz-Copy-Source header is a URL-encoded path, not a query string. Using url.QueryUnescape incorrectly converts literal '+' characters to spaces, which can cause object key mismatches during copy operations. Switch to url.PathUnescape in CopyObjectHandler, CopyObjectPartHandler, and pathToBucketObjectAndVersion to correctly handle special characters like '!', '+', and other RFC 3986 sub-delimiters that S3 clients may percent-encode (e.g. '!' as %21). * s3: add path validation to CopyObjectPartHandler CopyObjectPartHandler was missing the validateTableBucketObjectPath checks that CopyObjectHandler has, allowing potential path traversal in the source bucket/object of copy part requests. * s3: fix case-sensitive HeadersRegexp for copy source routing The HeadersRegexp for X-Amz-Copy-Source used `%2F` which only matched uppercase hex encoding. RFC 3986 allows both `%2F` and `%2f`, so clients sending lowercase percent-encoding would bypass the copy handler and hit PutObjectHandler instead. Add (?i) flag for case-insensitive matching. Also add test coverage for the versionId branch in pathToBucketObjectAndVersion and for lowercase %2f routing.	2026-03-07 11:10:02 -08:00
Chris Lu	3f946fc0c0	mount: make metadata cache rebuilds snapshot-consistent (#8531 ) * filer: expose metadata events and list snapshots * mount: invalidate hot directory caches * mount: read hot directories directly from filer * mount: add sequenced metadata cache applier * mount: apply metadata responses through cache applier * mount: replay snapshot-consistent directory builds * mount: dedupe self metadata events * mount: factor directory build cleanup * mount: replace proto marshal dedup with composite key and ring buffer The dedup logic was doing a full deterministic proto.Marshal on every metadata event just to produce a dedup key. Replace with a cheap composite string key (TsNs\|Directory\|OldName\|NewName). Also replace the sliding-window slice (which leaked the backing array unboundedly) with a fixed-size ring buffer that reuses the same array. * filer: remove mutex and proto.Clone from request-scoped MetadataEventSink MetadataEventSink is created per-request and only accessed by the goroutine handling the gRPC call. The mutex and double proto.Clone (once in Record, once in Last) were unnecessary overhead on every filer write operation. Store the pointer directly instead. * mount: skip proto.Clone for caller-owned metadata events Add ApplyMetadataResponseOwned that takes ownership of the response without cloning. Local metadata events (mkdir, create, flush, etc.) are freshly constructed and never shared, so the clone is unnecessary. * filer: only populate MetadataEvent on successful DeleteEntry Avoid calling eventSink.Last() on error paths where the sink may contain a partial event from an intermediate child deletion during recursive deletes. * mount: avoid map allocation in collectDirectoryNotifications Replace the map with a fixed-size array and linear dedup. There are at most 3 directories to notify (old parent, new parent, new child if directory), so a 3-element array avoids the heap allocation on every metadata event. * mount: fix potential deadlock in enqueueApplyRequest Release applyStateMu before the blocking channel send. Previously, if the channel was full (cap 128), the send would block while holding the mutex, preventing Shutdown from acquiring it to set applyClosed. * mount: restore signature-based self-event filtering as fast path Re-add the signature check that was removed when content-based dedup was introduced. Checking signatures is O(1) on a small slice and avoids enqueuing and processing events that originated from this mount instance. The content-based dedup remains as a fallback. * filer: send snapshotTsNs only in first ListEntries response The snapshot timestamp is identical for every entry in a single ListEntries stream. Sending it in every response message wastes wire bandwidth for large directories. The client already reads it only from the first response. * mount: exit read-through mode after successful full directory listing MarkDirectoryRefreshed was defined but never called, so directories that entered read-through mode (hot invalidation threshold) stayed there permanently, hitting the filer on every readdir even when cold. Call it after a complete read-through listing finishes. * mount: include event shape and full paths in dedup key The previous dedup key only used Names, which could collapse distinct rename targets. Include the event shape (C/D/U/R), source directory, new parent path, and both entry names so structurally different events are never treated as duplicates. * mount: drain pending requests on shutdown in runApplyLoop After receiving the shutdown sentinel, drain any remaining requests from applyCh non-blockingly and signal each with errMetaCacheClosed so callers waiting on req.done are released. * mount: include IsDirectory in synthetic delete events metadataDeleteEvent now accepts an isDirectory parameter so the applier can distinguish directory deletes from file deletes. Rmdir passes true, Unlink passes false. * mount: fall back to synthetic event when MetadataEvent is nil In mknod and mkdir, if the filer response omits MetadataEvent (e.g. older filer without the field), synthesize an equivalent local metadata event so the cache is always updated. * mount: make Flush metadata apply best-effort after successful commit After filer_pb.CreateEntryWithResponse succeeds, the entry is persisted. Don't fail the Flush syscall if the local metadata cache apply fails — log and invalidate the directory cache instead. Also fall back to a synthetic event when MetadataEvent is nil. * mount: make Rename metadata apply best-effort The rename has already succeeded on the filer by the time we apply the local metadata event. Log failures instead of returning errors that would be dropped by the caller anyway. * mount: make saveEntry metadata apply best-effort with fallback After UpdateEntryWithResponse succeeds, treat local metadata apply as non-fatal. Log and invalidate the directory cache on failure. Also fall back to a synthetic event when MetadataEvent is nil. * filer_pb: preserve snapshotTsNs on error in ReadDirAllEntriesWithSnapshot Return the snapshot timestamp even when the first page fails, so callers receive the snapshot boundary when partial data was received. * filer: send snapshot token for empty directory listings When no entries are streamed, send a final ListEntriesResponse with only SnapshotTsNs so clients always receive the snapshot boundary. * mount: distinguish not-found vs transient errors in lookupEntry Return fuse.EIO for non-not-found filer errors instead of unconditionally returning ENOENT, so transient failures don't masquerade as missing entries. * mount: make CacheRemoteObject metadata apply best-effort The file content has already been cached successfully. Don't fail the read if the local metadata cache update fails. * mount: use consistent snapshot for readdir in direct mode Capture the SnapshotTsNs from the first loadDirectoryEntriesDirect call and store it on the DirectoryHandle. Subsequent batch loads pass this stored timestamp so all batches use the same snapshot. Also export DoSeaweedListWithSnapshot so mount can use it directly with snapshot passthrough. * filer_pb: fix test fake to send SnapshotTsNs only on first response Match the server behavior: only the first ListEntriesResponse in a page carries the snapshot timestamp, subsequent entries leave it zero. * Fix nil pointer dereference in ListEntries stream consumers Remove the empty-directory snapshot-only response from ListEntries that sent a ListEntriesResponse with Entry==nil, which crashed every raw stream consumer that assumed resp.Entry is always non-nil. Also add defensive nil checks for resp.Entry in all raw ListEntries stream consumers across: S3 listing, broker topic lookup, broker topic config, admin dashboard, topic retention, hybrid message scanner, Kafka integration, and consumer offset storage. * Add nil guards for resp.Entry in remaining ListEntries stream consumers Covers: S3 object lock check, MQ management dashboard (version/ partition/offset loops), and topic retention version loop. * Make applyLocalMetadataEvent best-effort in Link and Symlink The filer operations already succeeded; failing the syscall because the local cache apply failed is wrong. Log a warning and invalidate the parent directory cache instead. * Make applyLocalMetadataEvent best-effort in Mkdir/Rmdir/Mknod/Unlink The filer RPC already committed; don't fail the syscall when the local metadata cache apply fails. Log a warning and invalidate the parent directory cache to force a re-fetch on next access. * flushFileMetadata: add nil-fallback for metadata event and best-effort apply Synthesize a metadata event when resp.GetMetadataEvent() is nil (matching doFlush), and make the apply best-effort with cache invalidation on failure. * Prevent double-invocation of cleanupBuild in doEnsureVisited Add a cleanupDone guard so the deferred cleanup and inline error-path cleanup don't both call DeleteFolderChildren/AbortDirectoryBuild. * Fix comment: signature check is O(n) not O(1) * Prevent deferred cleanup after successful CompleteDirectoryBuild Set cleanupDone before returning from the success path so the deferred context-cancellation check cannot undo a published build. * Invalidate parent directory caches on rename metadata apply failure When applyLocalMetadataEvent fails during rename, invalidate the source and destination parent directory caches so subsequent accesses trigger a re-fetch from the filer. * Add event nil-fallback and cache invalidation to Link and Symlink Synthesize metadata events when the server doesn't return one, and invalidate parent directory caches on apply failure. * Match requested partition when scanning partition directories Parse the partition range format (NNNN-NNNN) and match against the requested partition parameter instead of using the first directory. * Preserve snapshot timestamp across empty directory listings Initialize actualSnapshotTsNs from the caller-requested value so it isn't lost when the server returns no entries. Re-add the server-side snapshot-only response for empty directories (all raw stream consumers now have nil guards for Entry). * Fix CreateEntry error wrapping to support errors.Is/errors.As Use errors.New + %w instead of %v for resp.Error so callers can unwrap the underlying error. * Fix object lock pagination: only advance on non-nil entries Move entriesReceived inside the nil check so nil entries don't cause repeated ListEntries calls with the same lastFileName. * Guard Attributes nil check before accessing Mtime in MQ management * Do not send nil-Entry response for empty directory listings The snapshot-only ListEntriesResponse (with Entry == nil) for empty directories breaks consumers that treat any received response as an entry (Java FilerClient, S3 listing). The Go client-side DoSeaweedListWithSnapshot already preserves the caller-requested snapshot via actualSnapshotTsNs initialization, so the server-side send is unnecessary. * Fix review findings: subscriber dedup, invalidation normalization, nil guards, shutdown race - Remove self-signature early-return in processEventFn so all events flow through the applier (directory-build buffering sees self-originated events that arrive after a snapshot) - Normalize NewParentPath in collectEntryInvalidations to avoid duplicate invalidations when NewParentPath is empty (same-directory update) - Guard resp.Entry.Attributes for nil in admin_server.go and topic_retention.go to prevent panics on entries without attributes - Fix enqueueApplyRequest race with shutdown by using select on both applyCh and applyDone, preventing sends after the apply loop exits - Add cleanupDone check to deferred cleanup in meta_cache_init.go for clarity alongside the existing guard in cleanupBuild - Add empty directory test case for snapshot consistency * Propagate authoritative metadata event from CacheRemoteObjectToLocalCluster and generate client-side snapshot for empty directories - Add metadata_event field to CacheRemoteObjectToLocalClusterResponse proto so the filer-emitted event is available to callers - Use WithMetadataEventSink in the server handler to capture the event from NotifyUpdateEvent and return it on the response - Update filehandle_read.go to prefer the RPC's metadata event over a locally fabricated one, falling back to metadataUpdateEvent when the server doesn't provide one (e.g., older filers) - Generate a client-side snapshot cutoff in DoSeaweedListWithSnapshot when the server sends no snapshot (empty directory), so callers like CompleteDirectoryBuild get a meaningful boundary for filtering buffered events * Skip directory notifications for dirs being built to prevent mid-build cache wipe When a metadata event is buffered during a directory build, applyMetadataSideEffects was still firing noteDirectoryUpdate for the building directory. If the directory accumulated enough updates to become "hot", markDirectoryReadThrough would call DeleteFolderChildren, wiping entries that EnsureVisited had already inserted. The build would then complete and mark the directory cached with incomplete data. Fix by using applyMetadataSideEffectsSkippingBuildingDirs for buffered events, which suppresses directory notifications for dirs currently in buildingDirs while still applying entry invalidations. * Add test for directory notification suppression during active build TestDirectoryNotificationsSuppressedDuringBuild verifies that metadata events targeting a directory under active EnsureVisited build do NOT fire onDirectoryUpdate for that directory. In production, this prevents markDirectoryReadThrough from calling DeleteFolderChildren mid-build, which would wipe entries already inserted by the listing. The test inserts an entry during a build, sends multiple metadata events for the building directory, asserts no notifications fired for it, verifies the entry survives, and confirms buffered events are replayed after CompleteDirectoryBuild. * Fix create invalidations, build guard, event shape, context, and snapshot error path - collectEntryInvalidations: invalidate FUSE kernel cache on pure create events (OldEntry==nil && NewEntry!=nil), not just updates and deletes - completeDirectoryBuildNow: only call markCachedFn when an active build existed (state != nil), preventing an unpopulated directory from being marked as cached - Add metadataCreateEvent helper that produces a create-shaped event (NewEntry only, no OldEntry) and use it in mkdir, mknod, symlink, and hardlink create fallback paths instead of metadataUpdateEvent which incorrectly set both OldEntry and NewEntry - applyMetadataResponseEnqueue: use context.Background() for the queued mutation so a cancelled caller context cannot abort the apply loop mid-write - DoSeaweedListWithSnapshot: move snapshot initialization before ListEntries call so the error path returns the preserved snapshot instead of 0 * Fix review findings: test loop, cache race, context safety, snapshot consistency - Fix build test loop starting at i=1 instead of i=0, missing new-0.txt verification - Re-check IsDirectoryCached after cache miss to avoid ENOENT race with markDirectoryReadThrough - Use context.Background() in enqueueAndWait so caller cancellation can't abort build/complete mid-way - Pass dh.snapshotTsNs in skip-batch loadDirectoryEntriesDirect for snapshot consistency - Prefer resp.MetadataEvent over fallback in Unlink event derivation - Add comment on MetadataEventSink.Record single-event assumption * Fix empty-directory snapshot clock skew and build cancellation race Empty-directory snapshot: Remove client-side time.Now() synthesis when the server returns no entries. Instead return snapshotTsNs=0, and in completeDirectoryBuildNow replay ALL buffered events when snapshot is 0. This eliminates the clock-skew bug where a client ahead of the filer would filter out legitimate post-list events. Build cancellation: Use context.Background() for BeginDirectoryBuild and CompleteDirectoryBuild calls in doEnsureVisited, so errgroup cancellation doesn't cause enqueueAndWait to return early and trigger cleanupBuild while the operation is still queued. * Add tests for empty-directory build replay and cancellation resilience TestEmptyDirectoryBuildReplaysAllBufferedEvents: verifies that when CompleteDirectoryBuild receives snapshotTsNs=0 (empty directory, no server snapshot), ALL buffered events are replayed regardless of their TsNs values — no clock-skew-sensitive filtering occurs. TestBuildCompletionSurvivesCallerCancellation: verifies that once CompleteDirectoryBuild is enqueued, a cancelled caller context does not prevent the build from completing. The apply loop runs with context.Background(), so the directory becomes cached and buffered events are replayed even when the caller gives up waiting. * Fix directory subtree cleanup, Link rollback, test robustness - applyMetadataResponseLocked: when a directory entry is deleted or moved, call DeleteFolderChildren on the old path so cached descendants don't leak as stale entries. - Link: save original HardLinkId/Counter before mutation. If CreateEntryWithResponse fails after the source was already updated, rollback the source entry to its original state via UpdateEntry. - TestBuildCompletionSurvivesCallerCancellation: replace fixed time.Sleep(50ms) with a deadline-based poll that checks IsDirectoryCached in a loop, failing only after 2s timeout. - TestReadDirAllEntriesWithSnapshotEmptyDirectory: assert that ListEntries was actually invoked on the mock client so the test exercises the RPC path. - newMetadataEvent: add early return when both oldEntry and newEntry are nil to avoid emitting events with empty Directory. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-07 09:19:40 -08:00
Chris Lu	540fc97e00	s3/iam: reuse one request id per request (#8538 ) * request_id: add shared request middleware * s3err: preserve request ids in responses and logs * iam: reuse request ids in XML responses * sts: reuse request ids in XML responses * request_id: drop legacy header fallback * request_id: use AWS-style request id format * iam: fix AWS-compatible XML format for ErrorResponse and field ordering - ErrorResponse uses bare <RequestId> at root level instead of <ResponseMetadata> wrapper, matching the AWS IAM error response spec - Move CommonResponse to last field in success response structs so <ResponseMetadata> serializes after result elements - Add randomness to request ID generation to avoid collisions - Add tests for XML ordering and ErrorResponse format * iam: remove duplicate error_response_test.go Test is already covered by responses_test.go. * address PR review comments - Guard against typed nil pointers in SetResponseRequestID before interface assertion (CodeRabbit) - Use regexp instead of strings.Index in test helpers for extracting request IDs (Gemini) * request_id: prevent spoofing, fix nil-error branch, thread reqID to error writers - Ensure() now always generates a server-side ID, ignoring client-sent x-amz-request-id headers to prevent request ID spoofing. Uses a private context key (contextKey{}) instead of the header string. - writeIamErrorResponse in both iamapi and embedded IAM now accepts reqID as a parameter instead of calling Ensure() internally, ensuring a single request ID per request lifecycle. - The nil-iamError branch in writeIamErrorResponse now writes a 500 Internal Server Error response instead of returning silently. - Updated tests to set request IDs via context (not headers) and added tests for spoofing prevention and context reuse. * sts: add request-id consistency assertions to ActionInBody tests * test: update admin test to expect server-generated request IDs The test previously sent a client x-amz-request-id header and expected it echoed back. Since Ensure() now ignores client headers to prevent spoofing, update the test to verify the server returns a non-empty server-generated request ID instead. * iam: add generic WithRequestID helper alongside reflection-based fallback Add WithRequestID[T] that uses generics to take the address of a value type, satisfying the pointer receiver on SetRequestId without reflection. The existing SetResponseRequestID is kept for the two call sites that operate on interface{} (from large action switches where the concrete type varies at runtime). Generics cannot replace reflection there since Go cannot infer type parameters from interface{}. * Remove reflection and generics from request ID setting Call SetRequestId directly on concrete response types in each switch branch before boxing into interface{}, eliminating the need for WithRequestID (generics) and SetResponseRequestID (reflection). * iam: return pointer responses in action dispatch * Fix IAM error handling consistency and ensure request IDs on all responses - UpdateUser/CreatePolicy error branches: use writeIamErrorResponse instead of s3err.WriteErrorResponse to preserve IAM formatting and request ID - ExecuteAction: accept reqID parameter and generate one if empty, ensuring every response carries a RequestId regardless of caller * Clean up inline policies on DeleteUser and UpdateUser rename DeleteUser: remove InlinePolicies[userName] from policy storage before removing the identity, so policies are not orphaned. UpdateUser: move InlinePolicies[userName] to InlinePolicies[newUserName] when renaming, so GetUserPolicy/DeleteUserPolicy work under the new name. Both operations persist the updated policies and return an error if the storage write fails, preventing partial state.	2026-03-06 15:22:39 -08:00
Aaron	14cd0f53ba	Places the CommonResponse struct at the end of all IAM responses. (#8537 ) * Places the CommonResponse struct at the end of all IAM responses, rather than the start. * iam: fix error response request id layout * iam: add XML ordering regression test * iam: share request id generation --------- Co-authored-by: Aaron Segal <aaron.segal@rpsolutions.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-06 12:53:23 -08:00
Chris Lu	f9311a3422	s3api: fix static IAM policy enforcement after reload (#8532 ) * s3api: honor attached IAM policies over legacy actions * s3api: hydrate IAM policy docs during config reload * s3api: use policy-aware auth when listing buckets * credential: propagate context through filer_etc policy reads * credential: make legacy policy deletes durable * s3api: exercise managed policy runtime loader * s3api: allow static IAM users without session tokens * iam: deny unmatched attached policies under default allow * iam: load embedded policy files from filer store * s3api: require session tokens for IAM presigning * s3api: sync runtime policies into zero-config IAM * credential: respect context in policy file loads * credential: serialize legacy policy deletes * iam: align filer policy store naming * s3api: use authenticated principals for presigning * iam: deep copy policy conditions * s3api: require request creation in policy tests * filer: keep ReadInsideFiler as the context-aware API * iam: harden filer policy store writes * credential: strengthen legacy policy serialization test * credential: forward runtime policy loaders through wrapper * s3api: harden runtime policy merging * iam: require typed already-exists errors	2026-03-06 12:35:08 -08:00
Chris Lu	1b6e96614d	s3api: cache parsed IAM policy engines for fallback auth Previously, evaluateIAMPolicies created a new PolicyEngine and re-parsed the JSON policy document for every policy on every request. This adds a shared iamPolicyEngine field that caches compiled policies, kept in sync by PutPolicy, DeletePolicy, and bulk config reload paths. - PutPolicy deletes the old cache entry before setting the new one, so a parse failure on update does not leave a stale allow. - Log warnings when policy compilation fails instead of silently discarding errors. - Add test for valid-to-invalid policy update regression.	2026-03-05 14:27:48 -08:00
SrikanthBhandary	4eb45ecc5e	s3api: add IAM policy fallback authorization tests (#8518 ) * s3api: add IAM policy fallback auth with tests * s3api: use policy engine for IAM fallback evaluation	2026-03-05 12:13:18 -08:00
Chris Lu	10a30a83e1	s3api: add GetObjectAttributes API support (#8504 ) * s3api: add error code and header constants for GetObjectAttributes Add ErrInvalidAttributeName error code and header constants (X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker, X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: implement GetObjectAttributes handler Add GetObjectAttributesHandler that returns selected object metadata (ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without returning the object body. Follows the same versioning and conditional header patterns as HeadObjectHandler. The handler parses the X-Amz-Object-Attributes header to determine which attributes to include in the XML response, and supports ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker. Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: register GetObjectAttributes route Register the GET /{object}?attributes route for the GetObjectAttributes API, placed before other object query routes to ensure proper matching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add integration tests for GetObjectAttributes Test coverage: - Basic: simple object with all attribute types - MultipartObject: multipart upload with parts pagination - SelectiveAttributes: requesting only specific attributes - InvalidAttribute: server rejects invalid attribute names - NonExistentObject: returns NoSuchKey for missing objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add versioned object test for GetObjectAttributes Test puts two versions of the same object and verifies that: - GetObjectAttributes returns the latest version by default - GetObjectAttributes with versionId returns the specific version - ObjectSize and VersionId are correct for each version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: fix combined conditional header evaluation per RFC 7232 Per RFC 7232: - Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is present (If-Match is the more accurate replacement) - Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is present (If-None-Match is the more accurate replacement) Previously, all four conditional headers were evaluated independently. This caused incorrect 412 responses when If-Match succeeded but If-Unmodified-Since failed (should return 200 per AWS S3 behavior). Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and validateConditionalHeaders (PUT) paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add conditional header combination tests for GetObjectAttributes Test the RFC 7232 combined conditional header semantics: - If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored) - If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored) - If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored) - If-Match=true + If-Unmodified-Since=true => 200 - If-Match=false => 412 regardless Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: document Checksum attribute as not yet populated Checksum is accepted in validation (so clients requesting it don't get a 400 error, matching AWS behavior for objects without checksums) but SeaweedFS does not yet store S3 checksums. Add a comment explaining this and noting where to populate it when checksum storage is added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add s3:GetObjectAttributes IAM action for ?attributes query Previously, GET /{object}?attributes resolved to s3:GetObject via the fallback path since resolveFromQueryParameters had no case for the "attributes" query parameter. Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes") and a branch in resolveFromQueryParameters to return it for GET requests with the "attributes" query parameter, so IAM policies can distinguish GetObjectAttributes from GetObject. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: evaluate conditional headers after version resolution Move conditional header evaluation (If-Match, If-None-Match, etc.) to after the version resolution step in GetObjectAttributesHandler. This ensures that when a specific versionId is requested, conditions are checked against the correct version entry rather than always against the latest version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: use bounded HTTP client in GetObjectAttributes tests Replace http.DefaultClient with a timeout-aware http.Client (10s) in the signedGetObjectAttributes helper and testGetObjectAttributesInvalid to prevent tests from hanging indefinitely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: check attributes query before versionId in action resolver Move the GetObjectAttributes action check before the versionId check in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz being incorrectly classified as s3:GetObjectVersion instead of s3:GetObjectAttributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: add tests for versioned conditional headers and action resolver Add integration test that verifies conditional headers (If-Match, If-None-Match) are evaluated against the requested version entry, not the latest version. This covers the fix in 55c409dec. Add unit test for ResolveS3Action verifying that the attributes query parameter takes precedence over versionId, so GET ?attributes&versionId resolves to s3:GetObjectAttributes. This covers the fix in b92c61c95. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: guard negative chunk indices and rename PartsCount field Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in buildObjectAttributesParts to prevent panics from corrupted metadata with negative index values. Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match the AWS SDK v2 Go field naming convention, while preserving the XML element name "PartsCount" via the struct tag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * s3api: reject malformed max-parts and part-number-marker headers Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain non-integer or negative values, matching ListObjectPartsHandler behavior. Previously these were silently ignored with defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 12:52:09 -08:00
Chris Lu	e8946e59ca	fix(s3api): correctly extract host header port in extractHostHeader (#8464 ) * Prevent concurrent maintenance tasks per volume * fix panic * fix(s3api): correctly extract host header port when X-Forwarded-Port is present * test(s3api): add test cases for misreported X-Forwarded-Port	2026-02-27 13:41:45 -08:00
Chris Lu	4f647e1036	Worker set its working directory (#8461 ) * set working directory * consolidate to worker directory * working directory * correct directory name * refactoring to use wildcard matcher * simplify * cleaning ec working directory * fix reference * clean * adjust test	2026-02-27 12:22:21 -08:00
Chris Lu	8eba7ba5b2	feat: drop table location mapping support (#8458 ) * feat: drop table location mapping support Disable external metadata locations for S3 Tables and remove the table location mapping index entirely. Table metadata must live under the table bucket paths, so lookups no longer use mapping directories. Changes: - Remove mapping lookup and cache from bucket path resolution - Reject metadataLocation in CreateTable and UpdateTable - Remove mapping helpers and tests * compile * refactor * fix: accept metadataLocation in S3 Tables API requests We removed the external table location mapping feature, but still need to accept and store metadataLocation values from clients like Trino. The mapping feature was an internal implementation detail that mapped external buckets to internal table paths. The metadataLocation field itself is part of the S3 Tables API and should be preserved. * fmt * fix: handle MetadataLocation in UpdateTable requests Mirror handleCreateTable behavior by updating metadata.MetadataLocation when req.MetadataLocation is provided in UpdateTable requests. This ensures table metadata location can be updated, not just set during creation.	2026-02-26 16:36:24 -08:00
Chris Lu	641351da78	fix: table location mappings to /etc/s3tables (#8457 ) * fix: move table location mappings to /etc/s3tables to avoid bucket name validation Fixes #8362 - table location mappings were stored under /buckets/.table-location-mappings which fails bucket name validation because it starts with a dot. Moving them to /etc/s3tables resolves the migration error for upgrades. Changes: - Table location mappings now stored under /etc/s3tables - Ensure parent /etc directory exists before creating /etc/s3tables - Normal writes go to new location only (no legacy compatibility) - Removed bucket name validation exception for old location * refactor: simplify lookupTableLocationMapping by removing redundant mappingPath parameter The mappingPath function parameter was redundant as the path can be derived from mappingDir and bucket using path.Join. This simplifies the code and reduces the risk of path mismatches between parameters.	2026-02-26 15:35:13 -08:00
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 14:20:42 -08:00
Lars Lehtonen	0fac6e39ea	weed/s3api/s3tables: fix dropped errors (#8456 ) * weed/s3api/s3tables: fix dropped errors * enhance errors * fail fast when listing tables --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 11:12:10 -08:00
Chris Lu	b9fa05153a	Allow multipart upload operations when s3:PutObject is authorized (#8445 ) * Allow multipart upload operations when s3:PutObject is authorized Multipart upload is an implementation detail of putting objects, not a separate permission. When a policy grants s3:PutObject, it should implicitly allow: - s3:CreateMultipartUpload - s3:UploadPart - s3:CompleteMultipartUpload - s3:AbortMultipartUpload - s3:ListParts This fixes a compatibility issue where clients like PyArrow that use multipart uploads by default would fail even though the role had s3:PutObject permission. The session policy intersection still applies - both the identity-based policy AND session policy must allow s3:PutObject for multipart operations to work. Implementation: - Added constants for S3 multipart action strings - Added multipartActionSet to efficiently check if action is multipart-related - Updated MatchesAction method to implicitly grant multipart when PutObject allowed * Update weed/s3api/policy_engine/types.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Add s3:ListMultipartUploads to multipart action set Include s3:ListMultipartUploads in the multipartActionSet so that listing multipart uploads is implicitly granted when s3:PutObject is authorized. ListMultipartUploads is a critical part of the multipart upload workflow, allowing clients to query in-progress uploads before completing them. Changes: - Added s3ListMultipartUploads constant definition - Included s3ListMultipartUploads in multipartActionSet initialization - Existing references to multipartActionSet automatically now cover ListMultipartUploads All policy engine tests pass (0.351s execution time) * Refactor: reuse multipart action constants from s3_constants package Remove duplicate constant definitions from policy_engine/types.go and import the canonical definitions from s3api/s3_constants/s3_actions.go instead. This eliminates duplication and ensures a single source of truth for multipart action strings: - ACTION_CREATE_MULTIPART_UPLOAD - ACTION_UPLOAD_PART - ACTION_COMPLETE_MULTIPART - ACTION_ABORT_MULTIPART - ACTION_LIST_PARTS - ACTION_LIST_MULTIPART_UPLOADS All policy engine tests pass (0.350s execution time) * Fix S3_ACTION_LIST_MULTIPART_UPLOADS constant value Move S3_ACTION_LIST_MULTIPART_UPLOADS from bucket operations to multipart operations section and change value from 's3:ListBucketMultipartUploads' to 's3:ListMultipartUploads' to match the action strings used in policy_engine and s3_actions.go. This ensures consistent action naming across all S3 constant definitions. * refactor names * Fix S3 action constant mismatches and MatchesAction early return bug Fix two critical issues in policy engine: 1. S3Actions map had incorrect multipart action mappings: - 'ListMultipartUploads' was 's3:ListMultipartUploads' (should be 's3:ListBucketMultipartUploads') - 'ListParts' was 's3:ListParts' (should be 's3:ListMultipartUploadParts') These mismatches caused authorization checks to fail for list operations 2. CompiledStatement.MatchesAction() had early return bug: - Previously returned true immediately upon first direct action match - This prevented scanning remaining matchers for s3:PutObject permission - Now scans ALL matchers before returning, tracking both direct match and PutObject grant - Ensures multipart operations inherit s3:PutObject authorization even when explicitly requested action doesn't match (e.g., s3:ListMultipartUploadParts) Changes: - Track matchedAction flag to defer Fix two critical issues in policy engine: 1. S3Actions map had incorrect multipart action mappings: - 'ListMultipartUploads' was 's3:ListMultipartUplPer 1. S3Actions map had incorrect multiparAll - 'ListMultipartUploads(0.334s execution time) * Refactor S3Actions map to use s3_constants Replace hardcoded action strings in the S3Actions map with references to canonical S3_ACTION_* constants from s3_constants/s3_action_strings.go. Benefits: - Single source of truth for S3 action values - Eliminates string duplication across codebase - Ensures consistency between policy engine and middleware - Reduces maintenance burden when action strings need updates All policy engine tests pass (0.334s execution time) * Remove unused S3Actions map The S3Actions map in types.go was never referenced anywhere in the codebase. All action mappings are handled by GetActionMappings() in integration.go instead. This removes 42 lines of dead code. * Fix test: reload configuration function must also reload IAM state TestEmbeddedIamAttachUserPolicyRefreshesIAM was failing because the test's reloadConfigurationFunc only updated mockConfig but didn't reload the actual IAM state. When AttachUserPolicy calls refreshIAMConfiguration(), it would use the test's incomplete reload function instead of the real LoadS3ApiConfigurationFromCredentialManager(). Fixed by making the test's reloadConfigurationFunc also call e.iam.LoadS3ApiConfigurationFromCredentialManager() so lookupByIdentityName() sees the updated policy attachments. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 12:31:04 -08:00
Chris Lu	d5e71eb0d8	Revert "s3api: preserve Host header port in signature verification (#8434 )" This reverts commit `98d89ffad7`.	2026-02-25 12:28:44 -08:00
Chris Lu	8c0c7248b3	Refresh IAM config after policy attachments (#8439 ) * Refresh IAM cache after policy attachments * error handling	2026-02-25 10:30:05 -08:00
Chris Lu	98d89ffad7	s3api: preserve Host header port in signature verification (#8434 ) Avoid stripping default ports (80/443) from the Host header in extractHostHeader. This fixes SignatureDoesNotMatch errors when SeaweedFS is accessed via a proxy (like Kong Ingress) that explicitly includes the port in the Host header or X-Forwarded-Host, which S3 clients sign. Also cleaned up unused variables and logic after refactoring.	2026-02-24 13:09:40 -08:00
Plamen Nikolov	ff84ef880d	fix(s3api): make ListObjectsV1 namespaced and prevent marker-echo pagination loops (#8409 ) * fix(s3api): make ListObjectsV1 namespaced and stop marker-echo pagination loops * test(s3api): harden marker-echo coverage and align V1 encoding tag * test(s3api): cover encoded marker matching and trim redundant setup * refactor(s3api): tighten V1 list helper visibility and test mock docs	2026-02-23 23:45:08 -08:00
Chris Lu	2d65d7f499	Embed role policies in AssumeRole STS tokens (#8421 ) * Embed role policies in AssumeRole STS tokens * Log STS policy lookup failures * Use IAMManager provider * Guard policy embedding role lookup	2026-02-23 22:59:53 -08:00
Chris Lu	8e8edd7706	not empty only if there are actual files in the bucket	2026-02-23 00:12:04 -08:00
Chris Lu	8e25c55bfb	S3: Truncate timestamps to milliseconds for CopyObjectResult and CopyPartResult (#8398 ) * S3: Truncate timestamps to milliseconds for CopyObjectResult and CopyPartResult Fixes #8394 * S3: Address nitpick comments in copy handlers - Synchronize Mtime and LastModified by capturing time once\n- Optimize copyChunksForRange loop\n- Use built-in min/max\n- Remove dead previewLen code	2026-02-20 21:01:31 -08:00

1 2 3 4 5 ...

1018 Commits