seaweedFS

Author	SHA1	Message	Date
Chris Lu	3a3fff1399	Fix TUS chunked upload and resume failures (#8783 ) (#8786 ) * Fix TUS chunked upload and resume failures caused by request context cancellation (#8783) The filer's TCP connections use a 10-second inactivity timeout (net_timeout.go). After the TUS PATCH request body is fully consumed, internal operations (assigning file IDs via gRPC to the master, uploading data to volume servers, completing uploads) do not generate any activity on the client connection, so the inactivity timer fires and Go's HTTP server cancels the request context. This caused HTTP 500 errors on PATCH requests where body reading + internal processing exceeded the timeout. Fix by using context.WithoutCancel in TUS create and patch handlers, matching the existing pattern used by assignNewFileInfo. This ensures internal operations complete regardless of client connection state. Fixes seaweedfs/seaweedfs#8783 * Add comment to tusCreateHandler explaining context.WithoutCancel rationale * Run TUS integration tests on all PRs, not just TUS file changes The previous path filter meant these tests only ran when TUS-specific files changed. This allowed regressions from changes to shared infrastructure (net_timeout.go, upload paths, gRPC) to go undetected — which is exactly how the context cancellation bug in #8783 was missed. Matches the pattern used by s3-go-tests.yml.	2026-03-26 14:06:21 -07:00
Chris Lu	94bfa2b340	mount: stream all filer mutations over single ordered gRPC stream (#8770 ) * filer: add StreamMutateEntry bidi streaming RPC Add a bidirectional streaming RPC that carries all filer mutation types (create, update, delete, rename) over a single ordered stream. This eliminates per-request connection overhead for pipelined operations and guarantees mutation ordering within a stream. The server handler delegates each request to the existing unary handlers (CreateEntry, UpdateEntry, DeleteEntry) and uses a proxy stream adapter for rename operations to reuse StreamRenameEntry logic. The is_last field signals completion for multi-response operations (rename sends multiple events per request; create/update/delete always send exactly one response with is_last=true). * mount: add streaming mutation multiplexer (streamMutateMux) Implement a client-side multiplexer that routes all filer mutation RPCs (create, update, delete, rename) over a single bidirectional gRPC stream. Multiple goroutines submit requests through a send channel; a dedicated sendLoop serializes them on the stream; a recvLoop dispatches responses to waiting callers via per-request channels. Key features: - Lazy stream opening on first use - Automatic reconnection on stream failure - Permanent fallback to unary RPCs if filer returns Unimplemented - Monotonic request_id for response correlation - Multi-response support for rename operations (is_last signaling) The mux is initialized on WFS and closed during unmount cleanup. No call sites use it yet — wiring comes in subsequent commits. * mount: route CreateEntry and UpdateEntry through streaming mux Wire all CreateEntry call sites to use wfs.streamCreateEntry() which routes through the StreamMutateEntry stream when available, falling back to unary RPCs otherwise. Also wire Link's UpdateEntry calls through wfs.streamUpdateEntry(). Updated call sites: - flushMetadataToFiler (file flush after write) - Mkdir (directory creation) - Symlink (symbolic link creation) - createRegularFile non-deferred path (Mknod) - flushFileMetadata (periodic metadata flush) - Link (hard link: update source + create link + rollback) * mount: route UpdateEntry and DeleteEntry through streaming mux Wire remaining mutation call sites through the streaming mux: - saveEntry (Setattr/chmod/chown/utimes) → streamUpdateEntry - Unlink → streamDeleteEntry (replaces RemoveWithResponse) - Rmdir → streamDeleteEntry (replaces RemoveWithResponse) All filer mutations except Rename now go through StreamMutateEntry when the filer supports it, with automatic unary RPC fallback. * mount: route Rename through streaming mux Wire Rename to use streamMutate.Rename() when available, with fallback to the existing StreamRenameEntry unary stream. The streaming mux sends rename as a StreamRenameEntryRequest oneof variant. The server processes it through the existing rename logic and sends multiple StreamRenameEntryResponse events (one per moved entry), with is_last=true on the final response. All filer mutations now go through a single ordered stream. * mount: fix stream mux connection ownership WithGrpcClient(streamingMode=true) closes the gRPC connection when the callback returns, destroying the stream. Own the connection directly via pb.GrpcDial so it stays alive for the stream's lifetime. Close it explicitly in recvLoop on stream failure and in Close on shutdown. * mount: fix rename failure for deferred-create files Three fixes for rename operations over the streaming mux: 1. lookupEntry: fall back to local metadata store when filer returns "not found" for entries in uncached directories. Files created with deferFilerCreate=true exist only in the local leveldb store until flushed; lookupEntry skipped the local store when the parent directory had never been readdir'd, causing rename to fail with ENOENT. 2. Rename: wait for pending async flushes and force synchronous flush of dirty metadata before sending rename to the filer. Covers the writebackCache case where close() defers the flush to a background worker that may not complete before rename fires. 3. StreamMutateEntry: propagate rename errors from server to client. Add error/errno fields to StreamMutateEntryResponse so the mount can map filer errors to correct FUSE status codes instead of silently returning OK. Also fix the existing Rename error handler which could return fuse.OK on unrecognized errors. * mount: fix streaming mux error handling, sendLoop lifecycle, and fallback Address PR review comments: 1. Server: populate top-level Error/Errno on StreamMutateEntryResponse for create/update/delete errors, not just rename. Previously update errors were silently dropped and create/delete errors were only in nested response fields that the client didn't check. 2. Client: check nested error fields in CreateEntry (ErrorCode, Error) and DeleteEntry (Error) responses, matching CreateEntryWithResponse behavior. 3. Fix sendLoop lifecycle: give each stream generation a stopSend channel. recvLoop closes it on error to stop the paired sendLoop. Previously a reconnect left the old sendLoop draining sendCh, breaking ordering. 4. Transparent fallback: stream helpers and doRename fall back to unary RPCs on transport errors (ErrStreamTransport), including the first Unimplemented from ensureStream. Previously the first call failed instead of degrading. 5. Filer rotation in openStream: try all filer addresses on dial failure, matching WithFilerClient behavior. Stop early on Unimplemented. 6. Pass metadata-bearing context to StreamMutateEntry RPC call so sw-client-id header is actually sent. 7. Gate lookupEntry local-cache fallback on open dirty handle or pending async flush to avoid resurrecting deleted/renamed entries. 8. Remove dead code in flushFileMetadata (err=nil followed by if err!=nil). 9. Use string matching for rename error-to-errno mapping in the mount to stay portable across Linux/macOS (numeric errno values differ). * mount: make failAllPending idempotent with delete-before-close Change failAllPending to collect pending entries into a local slice (deleting from the sync.Map first) before closing channels. This prevents double-close panics if called concurrently. Also remove the unused err parameter. * mount: add stream generation tracking and teardownStream Introduce a generation counter on streamMutateMux that increments each time a new stream is created. Requests carry the generation they were enqueued for so sendLoop can reject stale requests after reconnect. Add teardownStream(gen) which is idempotent (only acts when gen matches current generation and stream is non-nil). Both sendLoop and recvLoop call it on error, replacing the inline cleanup in recvLoop. sendLoop now actively triggers teardown on send errors instead of silently exiting. ensureStream waits for the prior generation's recvDone before creating a new stream, ensuring all old pending waiters are failed before reconnect. recvLoop now takes the stream, generation, and recvDone channel as parameters to avoid accessing shared fields without the lock. * mount: harden Close to prevent races with teardownStream Nil out stream, cancel, and grpcConn under the lock so that any concurrent teardownStream call from recvLoop/sendLoop becomes a no-op. Call failAllPending before closing sendCh to unblock waiters promptly. Guard recvDone with a nil check for the case where Close is called before any stream was ever opened. * mount: make errCh receive ctx-aware in doUnary and Rename Replace the blocking <-sendReq.errCh with a select that also observes ctx.Done(). If sendLoop exits via stopSend without consuming a buffered request, the caller now returns ctx.Err() instead of blocking forever. The buffered errCh (capacity 1) ensures late acknowledgements from sendLoop don't block the sender. * mount: fix sendLoop/Close race and recvLoop/teardown pending channel race Three related fixes: 1. Stop closing sendCh in Close(). Closing the shared producer channel races with callers who passed ensureStream() but haven't sent yet, causing send-on-closed-channel panics. sendCh is now left open; ensureStream checks m.closed to reject new callers. 2. Drain buffered sendCh items on shutdown. sendLoop defers drainSendCh() on exit so buffered requests get an ErrStreamTransport on their errCh instead of blocking forever. Close() drains again for any stragglers enqueued between sendLoop's drain and the final shutdown. 3. Move failAllPending from teardownStream into recvLoop's defer. teardownStream (called from sendLoop on send error) was closing pending response channels while recvLoop could be between pending.Load and the channel send — a send-on-closed-channel panic. recvLoop is now the sole closer of pending channels, eliminating the race. Close() waits on recvDone (with cancel() to guarantee Recv unblocks) so pending cleanup always completes. * filer/mount: add debug logging for hardlink lifecycle Add V(0) logging at every point where a HardLinkId is created, stored, read, or deleted to trace orphaned hardlink references. Logging covers: - gRPC server: CreateEntry/UpdateEntry when request carries HardLinkId - FilerStoreWrapper: InsertEntry/UpdateEntry when entry has HardLinkId - handleUpdateToHardLinks: entry path, HardLinkId, counter, chunk count - setHardLink: KvPut with blob size - maybeReadHardLink: V(1) on read attempt and successful decode - DeleteHardLink: counter decrement/deletion events - Mount Link(): when NewHardLinkId is generated and link is created This helps diagnose how a git pack .rev file ended up with a HardLinkId during a clone (no hard links should be involved). * test: add git clone/pull integration test for FUSE mount Shell script that exercises git operations on a SeaweedFS mount: 1. Creates a bare repo on the mount 2. Clones locally, makes 3 commits, pushes to mount 3. Clones from mount bare repo into an on-mount working dir 4. Verifies clone integrity (files, content, commit hashes) 5. Pushes 2 more commits with renames and deletes 6. Checks out an older revision on the mount clone 7. Returns to branch and pulls with real changes 8. Verifies file content, renames, deletes after pull 9. Checks git log integrity and clean status 27 assertions covering file existence, content, commit hashes, file counts, renames, deletes, and git status. Run against any existing mount: bash test-git-on-mount.sh /path/to/mount * test: add git clone/pull FUSE integration test to CI suite Add TestGitOperations to the existing fuse_integration test framework. The test exercises git's full file operation surface on the mount: 1. Creates a bare repo on the mount (acts as remote) 2. Clones locally, makes 3 commits (files, bulk data, renames), pushes 3. Clones from mount bare repo into an on-mount working dir 4. Verifies clone integrity (content, commit hash, file count) 5. Pushes 2 more commits with new files, renames, and deletes 6. Checks out an older revision on the mount clone 7. Returns to branch and pulls with real fast-forward changes 8. Verifies post-pull state: content, renames, deletes, file counts 9. Checks git log integrity (5 commits) and clean status Runs automatically in the existing fuse-integration.yml CI workflow. * mount: fix permission check with uid/gid mapping The permission checks in createRegularFile() and Access() compared the caller's local uid/gid against the entry's filer-side uid/gid without applying the uid/gid mapper. With -map.uid 501:0, a directory created as uid 0 on the filer would not match the local caller uid 501, causing hasAccess() to fall through to "other" permission bits and reject write access (0755 → other has r-x, no w). Fix: map entry uid/gid from filer-space to local-space before the hasAccess() call so both sides are in the same namespace. This fixes rsync -a failing with "Permission denied" on mkstempat when using uid/gid mapping. * mount: fix Mkdir/Symlink returning filer-side uid/gid to kernel Mkdir and Symlink used `defer wfs.mapPbIdFromFilerToLocal(entry)` to restore local uid/gid, but `outputPbEntry` writes the kernel response before the function returns — so the kernel received filer-side uid/gid (e.g., 0:0). macFUSE then caches these and rejects subsequent child operations (mkdir, create) because the caller uid (501) doesn't match the directory owner (0), and "other" bits (0755 → r-x) lack write permission. Fix: replace the defer with an explicit call to mapPbIdFromFilerToLocal before outputPbEntry, so the kernel gets local uid/gid. Also add nil guards for UidGidMapper in Access and createRegularFile to prevent panics in tests that don't configure a mapper. This fixes rsync -a "Permission denied" on mkpathat for nested directories when using uid/gid mapping. * mount: fix Link outputting filer-side uid/gid to kernel, add nil guards Link had the same defer-before-outputPbEntry bug as Mkdir and Symlink: the kernel received filer-side uid/gid because the defer hadn't run yet when outputPbEntry wrote the response. Also add nil guards for UidGidMapper in Access and createRegularFile so tests without a mapper don't panic. Audit of all outputPbEntry/outputFilerEntry call sites: - Mkdir: fixed in prior commit (explicit map before output) - Symlink: fixed in prior commit (explicit map before output) - Link: fixed here (explicit map before output) - Create (existing file): entry from maybeLoadEntry (already mapped) - Create (deferred): entry has local uid/gid (never mapped to filer) - Create (non-deferred): createRegularFile defer runs before return - Mknod: createRegularFile defer runs before return - Lookup: entry from lookupEntry (already mapped) - GetAttr: entry from maybeReadEntry/maybeLoadEntry (already mapped) - readdir: entry from cache (mapIdFromFilerToLocal) or filer (mapped) - saveEntry: no kernel output - flushMetadataToFiler: no kernel output - flushFileMetadata: no kernel output * test: fix git test for same-filesystem FUSE clone When both the bare repo and working clone live on the same FUSE mount, git's local transport uses hardlinks and cross-repo stat calls that fail on FUSE. Fix: - Use --no-local on clone to disable local transport optimizations - Use reset --hard instead of checkout to stay on branch - Use fetch + reset --hard origin/<branch> instead of git pull to avoid local transport stat failures during fetch * adjust logging * test: use plain git clone/pull to exercise real FUSE behavior Remove --no-local and fetch+reset workarounds. The test should use the same git commands users run (clone, reset --hard, pull) so it reveals real FUSE issues rather than hiding them. * test: enable V(1) logging for filer/mount and collect logs on failure - Run filer and mount with -v=1 so hardlink lifecycle logs (V(0): create/delete/insert, V(1): read attempts) are captured - On test failure, automatically dump last 16KB of all process logs (master, volume, filer, mount) to test output - Copy process logs to /tmp/seaweedfs-fuse-logs/ for CI artifact upload - Update CI workflow to upload SeaweedFS process logs alongside test output * mount: clone entry for filer flush to prevent uid/gid race flushMetadataToFiler and flushFileMetadata used entry.GetEntry() which returns the file handle's live proto entry pointer, then mutated it in-place via mapPbIdFromLocalToFiler. During the gRPC call window, a concurrent Lookup (which takes entryLock.RLock but NOT fhLockTable) could observe filer-side uid/gid (e.g., 0:0) on the file handle entry and return it to the kernel. The kernel caches these attributes, so subsequent opens by the local user (uid 501) fail with EACCES. Fix: proto.Clone the entry before mapping uid/gid for the filer request. The file handle's live entry is never mutated, so concurrent Lookup always sees local uid/gid. This fixes the intermittent "Permission denied" on .git/FETCH_HEAD after the first git pull on a mount with uid/gid mapping. * mount: add debug logging for stale lock file investigation Add V(0) logging to trace the HEAD.lock recreation issue: - Create: log when O_EXCL fails (file already exists) with uid/gid/mode - completeAsyncFlush: log resolved path, saved path, dirtyMetadata, isDeleted at entry to trace whether async flush fires after rename - flushMetadataToFiler: log the dir/name/fullpath being flushed This will show whether the async flush is recreating the lock file after git renames HEAD.lock → HEAD. * mount: prevent async flush from recreating renamed .lock files When git renames HEAD.lock → HEAD, the async flush from the prior close() can run AFTER the rename and re-insert HEAD.lock into the meta cache via its CreateEntryRequest response event. The next git pull then sees HEAD.lock and fails with "File exists". Fix: add isRenamed flag on FileHandle, set by Rename before waiting for the pending async flush. The async flush checks this flag and skips the metadata flush for renamed files (same pattern as isDeleted for unlinked files). The data pages still flush normally. The Rename handler flushes deferred metadata synchronously (Case 1) before setting isRenamed, ensuring the entry exists on the filer for the rename to proceed. For already-released handles (Case 2), the entry was created by a prior flush. * mount: also mark renamed inodes via entry.Attributes.Inode fallback When GetInode fails (Forget already removed the inode mapping), the Rename handler couldn't find the pending async flush to set isRenamed. The async flush then recreated the .lock file on the filer. Fix: fall back to oldEntry.Attributes.Inode to find the pending async flush when the inode-to-path mapping is gone. Also extract MarkInodeRenamed into a method on FileHandleToInode for clarity. * mount: skip async metadata flush when saved path no longer maps to inode The isRenamed flag approach failed for refs/remotes/origin/HEAD.lock because neither GetInode nor oldEntry.Attributes.Inode could find the inode (Forget already evicted the mapping, and the entry's stored inode was 0). Add a direct check in completeAsyncFlush: before flushing metadata, verify that the saved path still maps to this inode in the inode-to-path table. If the path was renamed or removed (inode mismatch or not found), skip the metadata flush to avoid recreating a stale entry. This catches all rename cases regardless of whether the Rename handler could set the isRenamed flag. * mount: wait for pending async flush in Unlink before filer delete Unlink was deleting the filer entry first, then marking the draining async-flush handle as deleted. The async flush worker could race between these two operations and recreate the just-unlinked entry on the filer. This caused git's .lock files (e.g. refs/remotes/origin/HEAD.lock) to persist after git pull, breaking subsequent git operations. Move the isDeleted marking and add waitForPendingAsyncFlush() before the filer delete so any in-flight flush completes first. Even if the worker raced past the isDeleted check, the wait ensures it finishes before the filer delete cleans up any recreated entry. * mount: reduce async flush and metadata flush log verbosity Raise completeAsyncFlush entry log, saved-path-mismatch skip log, and flushMetadataToFiler entry log from V(0) to V(3)/V(4). These fire for every file close with writebackCache and are too noisy for normal use. * filer: reduce hardlink debug log verbosity from V(0) to V(4) HardLinkId logs in filerstore_wrapper, filerstore_hardlink, and filer_grpc_server fire on every hardlinked file operation (git pack files use hardlinks extensively) and produce excessive noise. * mount/filer: reduce noisy V(0) logs for link, rmdir, and empty folder check - weedfs_link.go: hardlink creation logs V(0) → V(4) - weedfs_dir_mkrm.go: non-empty folder rmdir error V(0) → V(1) - empty_folder_cleaner.go: "not empty" check log V(0) → V(4) * filer: handle missing hardlink KV as expected, not error A "kv: not found" on hardlink read is normal when the link blob was already cleaned up but a stale entry still references it. Log at V(1) for not-found; keep Error level for actual KV failures. * test: add waitForDir before git pull in FUSE git operations test After git reset --hard, the FUSE mount's metadata cache may need a moment to settle on slow CI. The git pull subprocess (unpack-objects) could fail to stat the working directory. Poll for up to 5s. * Update git_operations_test.go * wait * test: simplify FUSE test framework to use weed mini Replace the 4-process setup (master + volume + filer + mount) with 2 processes: "weed mini" (all-in-one) + "weed mount". This simplifies startup, reduces port allocation, and is faster on CI. * test: fix mini flag -admin → -admin.ui	2026-03-25 20:06:34 -07:00
Copilot	2b6271469b	Merge branch 'master' of https://github.com/seaweedfs/seaweedfs	2026-03-23 11:22:45 -07:00
Copilot	963ec4c6e6	remove claude from github ci	2026-03-23 11:22:19 -07:00
dependabot[bot]	fb442a57d7	build(deps): bump actions/checkout from 4 to 6 (#8738 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Commits](https://github.com/actions/checkout/compare/v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-23 10:50:22 -07:00
dependabot[bot]	3a765df2ff	build(deps): bump dorny/test-reporter from 2 to 3 (#8733 ) Bumps [dorny/test-reporter](https://github.com/dorny/test-reporter) from 2 to 3. - [Release notes](https://github.com/dorny/test-reporter/releases) - [Changelog](https://github.com/dorny/test-reporter/blob/main/CHANGELOG.md) - [Commits](https://github.com/dorny/test-reporter/compare/v2...v3) --- updated-dependencies: - dependency-name: dorny/test-reporter dependency-version: '3' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-23 09:16:38 -07:00
dependabot[bot]	bb151d8e57	build(deps): bump actions/setup-node from 4 to 6 (#8732 ) Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 6. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](https://github.com/actions/setup-node/compare/v4...v6) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-23 09:16:21 -07:00
Chris Lu	9434d3733d	mount: async flush on close() when writebackCache is enabled (#8727 ) * mount: async flush on close() when writebackCache is enabled When -writebackCache is enabled, defer data upload and metadata flush from Flush() (triggered by close()) to a background goroutine in Release(). This allows processes like rsync that write many small files to proceed to the next file immediately instead of blocking on two network round-trips (volume upload + filer metadata) per file. Fixes #8718 * mount: add retry with backoff for async metadata flush The metadata flush in completeAsyncFlush now retries up to 3 times with exponential backoff (1s, 2s, 4s) on transient gRPC errors. Since the chunk data is already safely on volume servers at this point, only the filer metadata reference needs persisting — retrying is both safe and effective. Data flush (FlushData) is not retried externally because UploadWithRetry already handles transient HTTP/gRPC errors internally; if it still fails, the chunk memory has been freed. * test: add integration tests for writebackCache async flush Add comprehensive FUSE integration tests for the writebackCache async flush feature (issue #8718): - Basic operations: write/read, sequential files, large files, empty files, overwrites - Fsync correctness: fsync forces synchronous flush even in writeback mode, immediate read-after-fsync - Concurrent small files: multi-worker parallel writes (rsync-like workload), multi-directory, rapid create/close - Data integrity: append after close, partial writes, file size correctness, binary data preservation - Performance comparison: writeback vs synchronous flush throughput - Stress test: 16 workers x 100 files with content verification - Mixed concurrent operations: reads, writes, creates running together Also fix pre-existing test infrastructure issues: - Rename framework.go to framework_test.go (fixes Go package conflict) - Fix undefined totalSize variable in concurrent_operations_test.go * ci: update fuse-integration workflow to run full test suite The workflow previously only ran placeholder tests (simple_test.go, working_demo_test.go) in a temp directory due to a Go module conflict. Now that framework.go is renamed to framework_test.go, the full test suite compiles and runs correctly from test/fuse_integration/. Changes: - Run go test directly in test/fuse_integration/ (no temp dir copy) - Install weed binary to /usr/local/bin for test framework discovery - Configure /etc/fuse.conf with user_allow_other for FUSE mounts - Install fuse3 for modern FUSE support - Stream test output to log file for artifact upload * mount: fix three P1 races in async flush P1-1: Reopen overwrites data still flushing in background ReleaseByHandle removes the old handle from fhMap before the deferred flush finishes. A reopen of the same inode during that window would build from stale filer metadata, overwriting the async flush. Fix: Track in-flight async flushes per inode via pendingAsyncFlush map. AcquireHandle now calls waitForPendingAsyncFlush(inode) to block until any pending flush completes before reading filer metadata. P1-2: Deferred flush races rename and unlink after close completeAsyncFlush captured the path once at entry, but rename or unlink after close() could cause metadata to be written under the wrong name or recreate a deleted file. Fix: Re-resolve path from inode via GetPath right before metadata flush. GetPath returns the current path (reflecting renames) or ENOENT (if unlinked), in which case we skip the metadata flush. P1-3: SIGINT/SIGTERM bypasses the async-flush drain grace.OnInterrupt runs hooks then calls os.Exit(0), so WaitForAsyncFlush after server.Serve() never executes on signal. Fix: Add WaitForAsyncFlush (with 10s timeout) to the WFS interrupt handler, before cache cleanup. The timeout prevents hanging on Ctrl-C when the filer is unreachable. * mount: fix P1 races — draining handle stays in fhMap P1-1: Reopen TOCTOU The gap between ReleaseByHandle removing from fhMap and submitAsyncFlush registering in pendingAsyncFlush allowed a concurrent AcquireHandle to slip through with stale metadata. Fix: Hold pendingAsyncFlushMu across both the counter decrement (ReleaseByHandle) and the pending registration. The handle is registered as pending before the lock is released, so waitForPendingAsyncFlush always sees it. P1-2: Rename/unlink can't find draining handle ReleaseByHandle deleted from fhMap immediately. Rename's FindFileHandle(inode) at line 251 could not find the handle to update entry.Name. Unlink could not coordinate either. Fix: When asyncFlushPending is true, ReleaseByHandle/ReleaseByInode leave the handle in fhMap (counter=0 but maps intact). The handle stays visible to FindFileHandle so rename can update entry.Name. completeAsyncFlush re-resolves the path from the inode (GetPath) right before metadata flush for correctness after rename/unlink. After drain, RemoveFileHandle cleans up the maps. Double-return prevention: ReleaseByHandle/ReleaseByInode return nil if counter is already <= 0, so Forget after Release doesn't start a second drain goroutine. P1-3: SIGINT deletes swap files under running goroutines After the 10s timeout, os.RemoveAll deleted the write cache dir (containing swap files) while FlushData goroutines were still reading from them. Fix: Increase timeout to 30s. If timeout expires, skip write cache dir removal so in-flight goroutines can finish reading swap files. The OS (or next mount) cleans them up. Read cache is always removed. * mount: never skip metadata flush when Forget drops inode mapping Forget removes the inode→path mapping when the kernel's lookup count reaches zero, but this does NOT mean the file was unlinked — it only means the kernel evicted its cache entry. completeAsyncFlush was treating GetPath failure as "file unlinked" and skipping the metadata flush, which orphaned the just-uploaded chunks for live files. Fix: Save dir and name at doFlush defer time. In completeAsyncFlush, try GetPath first to pick up renames; if the mapping is gone, fall back to the saved dir/name. Always attempt the metadata flush — the filer is the authority on whether the file exists, not the local inode cache. * mount: distinguish Forget from Unlink in async flush path fallback The saved-path fallback (from the previous fix) always flushed metadata when GetPath failed, which recreated files that were explicitly unlinked after close(). The same stale fallback could recreate the pre-rename path if Forget dropped the inode mapping after a rename. Root cause: GetPath failure has two meanings: 1. Forget — kernel evicted the cache entry (file still exists) 2. Unlink — file was explicitly deleted (should not recreate) Fix (three coordinated changes): Unlink (weedfs_file_mkrm.go): Before RemovePath, look up the inode and find any draining handle via FindFileHandle. Set fh.isDeleted = true so the async flush knows the file was explicitly removed. Rename (weedfs_rename.go): When renaming a file with a draining handle, update asyncFlushDir/asyncFlushName to the post-rename location. This keeps the saved-path fallback current so Forget after rename doesn't flush to the old (pre-rename) path. completeAsyncFlush (weedfs_async_flush.go): Check fh.isDeleted first — if true, skip metadata flush (file was unlinked, chunks become orphans for volume.fsck). Otherwise, try GetPath for the current path (renames); fall back to saved path if Forget dropped the mapping (file is live, just evicted from kernel cache). * test/ci: address PR review nitpicks concurrent_operations_test.go: - Restore precise totalSize assertion instead of info.Size() > 0 writeback_cache_test.go: - Check rand.Read errors in all 3 locations (lines 310, 512, 757) - Check os.MkdirAll error in stress test (line 752) - Remove dead verifyErrors variable (line 332) - Replace both time.Sleep(5s) with polling via waitForFileContent to avoid flaky tests under CI load (lines 638, 700) fuse-integration.yml: - Add set -o pipefail so go test failures propagate through tee * ci: fix fuse3/fuse package conflict on ubuntu-22.04 runner fuse3 is pre-installed on ubuntu-22.04 runners and conflicts with the legacy fuse package. Only install libfuse3-dev for the headers. * mount/page_writer: remove debug println statements Remove leftover debug println("read new data1/2") from ReadDataAt in MemChunk and SwapFileChunk. * test: fix findWeedBinary matching source directory instead of binary findWeedBinary() matched ../../weed (the source directory) via os.Stat before checking PATH, then tried to exec a directory which fails with "permission denied" on the CI runner. Fix: Check PATH first (reliable in CI where the binary is installed to /usr/local/bin). For relative paths, verify the candidate is a regular file (!info.IsDir()). Add ../../weed/weed as a candidate for in-tree builds. * test: fix framework — dynamic ports, output capture, data dirs The integration test framework was failing in CI because: 1. All tests used hardcoded ports (19333/18080/18888), so sequential tests could conflict when prior processes hadn't fully released their ports yet. 2. Data subdirectories (data/master, data/volume) were not created before starting processes. 3. Master was started with -peers=none which is not a valid address. 4. Process stdout/stderr was not captured, making failures opaque ("service not ready within timeout" with no diagnostics). 5. The unmount fallback used 'umount' instead of 'fusermount -u'. 6. The mount used -cacheSizeMB (nonexistent) instead of -cacheCapacityMB and was missing -allowOthers=false for unprivileged CI runners. Fixes: - Dynamic port allocation via freePort() (net.Listen ":0") - Explicit gRPC ports via -port.grpc to avoid default port conflicts - Create data/master and data/volume directories in Setup() - Remove invalid -peers=none and -raftBootstrap flags - Capture process output to logDir/.log via startProcess() helper - dumpLog() prints tail of log file on service startup failure - Use fusermount3/fusermount -u for unmount - Fix mount flag names (-cacheCapacityMB, -allowOthers=false) test: remove explicit -port.grpc flags from test framework SeaweedFS convention: gRPC port = HTTP port + 10000. Volume and filer discover the master gRPC port by this convention. Setting explicit -port.grpc on master/volume/filer broke inter-service communication because the volume server computed master gRPC as HTTP+10000 but the actual gRPC was on a different port. Remove all -port.grpc flags and let the default convention work. Dynamic HTTP ports already ensure uniqueness; the derived gRPC ports (HTTP+10000) will also be unique. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-22 15:24:08 -07:00
Chris Lu	d6a872c4b9	Preserve explicit directory markers with octet-stream MIME (#8726 ) * Preserve octet-stream MIME on explicit directory markers * Run empty directory marker regression in CI * Run S3 Spark workflow for filer changes	2026-03-21 19:31:56 -07:00
Chris Lu	ba855f9962	fix(telemetry): use correct TopologyId field in integration test (#8714 ) * fix(telemetry): use correct TopologyId field in integration test The proto field was renamed from cluster_id to topology_id but the integration test was not updated, causing a compilation error. * ci: add telemetry integration test workflow Runs the telemetry integration test (server startup, protobuf marshaling, client send, metrics/stats/instances API checks) on changes to telemetry/ or weed/telemetry/. * fix(telemetry): improve error message specificity in integration test * fix(ci): pre-build telemetry server binary for integration test go run compiles the server on the fly, which exceeds the 15s startup timeout in CI. Build the binary first so the test starts instantly. * fix(telemetry): fix ClusterId references in server and CI build path - Replace ClusterId with TopologyId in server storage and API handler (same rename as the integration test fix) - Fix CI build: telemetry server has its own go.mod, so build from within its directory * ci(telemetry): add least-privilege permissions to workflow Scope the workflow token to read-only repository contents, matching the convention used in go.yml. * fix(telemetry): set TopologyId in client integration test The client only populates TopologyId when SetTopologyId has been called. The test was missing this call, causing the server to reject the request with 400 (missing required field). * fix(telemetry): delete clusterInfo metric on instance cleanup The cleanup loop removed all per-instance metrics except clusterInfo, leaking that label set after eviction.	2026-03-20 22:15:05 -07:00
Chris Lu	5e76f55077	fix(helm): namespace app-specific global values under global.seaweedfs (#8700 ) * fix(helm): namespace app-specific values under global.seaweedfs Move all app-specific values from the global namespace to global.seaweedfs.* to avoid polluting the shared .Values.global namespace when the chart is used as a subchart. Standard Helm conventions (global.imageRegistry, global.imagePullSecrets) remain at the global level as they are designed to be shared across subcharts. Fixes seaweedfs/seaweedfs#8699 BREAKING CHANGE: global values have been restructured. Users must update their values files to use the new paths: - global.registry → global.imageRegistry - global.repository → global.seaweedfs.image.repository - global.imageName → global.seaweedfs.image.name - global.<key> → global.seaweedfs.<key> (for all other app-specific values) * fix(ci): update helm CI tests to use new global.seaweedfs.* value paths Update all --set flags in helm_ci.yml to use the new namespaced global.seaweedfs.* paths matching the values.yaml restructuring. * fix(ci): install Claude Code via npm to avoid install.sh 403 The claude-code-action's built-in installer uses `curl https://claude.ai/install.sh \| bash` which can fail with 403. Due to the pipe, bash exits 0 on empty input, masking the curl failure and leaving the `claude` binary missing. Work around this by installing Claude Code via npm before invoking the action, and passing the executable path via path_to_claude_code_executable. * revert: remove claude-code-review.yml changes from this PR The claude-code-action OIDC token exchange validates that the workflow file matches the version on the default branch. Modifying it in a PR causes the review job to fail with "Workflow validation failed". The Claude Code install fix will need to be applied directly to master or in a separate PR. * fix: update stale references to old global.* value paths - admin-statefulset.yaml: fix fail message to reference global.seaweedfs.masterServer - values.yaml: fix comment to reference image.name instead of imageName - helm_ci.yml: fix diagnostic message to reference global.seaweedfs.enableSecurity * feat(helm): add backward-compat shim for old global.* value paths Add _compat.tpl with a seaweedfs.compat helper that detects old-style global.* keys (e.g. global.enableSecurity, global.registry) and merges them into the new global.seaweedfs.* namespace. Since the old keys no longer have defaults in values.yaml, their presence means the user explicitly provided them. The helper uses in-place mutation via `set` so all templates see the merged values. This ensures existing deployments using old value paths continue to work without changes after upgrading. * fix: update stale comment references in values.yaml Update comments referencing global.enableSecurity and global.masterServer to the new global.seaweedfs.* paths. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-19 13:00:48 -07:00
Chris Lu	5606557c6b	fix(ci): install Claude Code via npm to avoid install.sh 403 (#8701 ) The claude-code-action's built-in installer uses `curl https://claude.ai/install.sh \| bash` which can fail with 403. Due to the pipe, bash exits 0 on empty input, masking the curl failure and leaving the `claude` binary missing. Work around this by installing Claude Code via npm before invoking the action, and passing the executable path via path_to_claude_code_executable. Co-authored-by: Copilot <copilot@github.com>	2026-03-19 11:42:52 -07:00
Copilot	509fa23200	fix(ci): allow all bots to trigger Claude Code review	2026-03-19 10:59:31 -07:00
hoppla20	e989ad3ee9	feat(ci): publish helm chart to ghcr (#8697 )	2026-03-19 05:50:15 -07:00
Chris Lu	e59bcfebcc	Add Claude Code GitHub Workflow (#8687 ) * "Claude PR Assistant workflow" * "Claude Code Review workflow"	2026-03-18 12:17:53 -07:00
dependabot[bot]	6df5009fae	build(deps): bump docker/metadata-action from 5 to 6 (#8652 ) Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](https://github.com/docker/metadata-action/compare/v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-16 10:46:36 -07:00
Chris Lu	0443b66a75	fix(helm): trim whitespace before s3 TLS args to prevent command breakage (#8614 ) * fix(helm): trim whitespace before s3 TLS args to prevent command breakage (#8613) When global.enableSecurity is enabled, the `{{ include }}` call for s3 TLS args lacked the leading dash (`{{-`), producing an extra blank line in the rendered shell command. This broke shell continuation and caused the filer (and s3/all-in-one) to crash because arguments after the blank line were silently dropped. * ci(helm): assert no blank lines in security+S3 command blocks Renders the chart with global.enableSecurity=true and S3 enabled for normal mode (filer + s3 deployments) and all-in-one mode, then parses every /bin/sh -ec command block and fails if any contains blank lines. This catches the whitespace regression from #8613 where a missing {{- dash on the seaweedfs.s3.tlsArgs include produced a blank line that broke shell continuation. * ci(helm): enable S3 in all-in-one security render test The s3.tlsArgs include is gated by allInOne.s3.enabled, so without this flag the all-in-one command block wasn't actually exercising the TLS args path.	2026-03-12 15:35:22 -07:00
Chris Lu	bfd0d5c084	fix(helm): use componentName for all service names to fix truncation mismatch (#8612 ) * fix(helm): use componentName for all service names to fix truncation mismatch (#8610) PR #8143 updated statefulsets and deployments to use the componentName helper (which truncates the fullname before appending the suffix), but left service definitions using the old `printf + trunc 63` pattern. When release names are long enough, these two strategies produce different names, causing DNS resolution failures (e.g., S3 cannot find the filer-client service and falls back to localhost:8888). Unify all service name definitions and cluster address helpers to use the componentName helper consistently. * refactor(helm): simplify cluster address helpers with ternary * test(helm): add regression test for service name truncation with long release names Renders the chart with a >63-char fullname in both normal and all-in-one modes, then asserts that Service metadata.name values match the hostnames produced by cluster.masterAddress, cluster.filerAddress, and the S3 deployment's -filer= argument. Prevents future truncation/DNS mismatch regressions like #8610. * fix(helm-ci): limit S3_FILER_HOST extraction to first match	2026-03-12 11:59:24 -07:00
Chris Lu	4c88fbfd5e	Fix nil pointer crash during concurrent vacuum compaction (#8592 ) * check for nil needle map before compaction sync When CommitCompact runs concurrently, it sets v.nm = nil under dataFileAccessLock. CompactByIndex does not hold that lock, so v.nm.Sync() can hit a nil pointer. Add an early nil check to return an error instead of crashing. Fixes #8591 * guard copyDataBasedOnIndexFile size check against nil needle map The post-compaction size validation at line 538 accesses v.nm.ContentSize() and v.nm.DeletedSize(). If CommitCompact has concurrently set v.nm to nil, this causes a SIGSEGV. Skip the validation when v.nm is nil since the actual data copy uses local needle maps (oldNm/newNm) and is unaffected. Fixes #8591 * use atomic.Bool for compaction flags to prevent concurrent vacuum races The isCompacting and isCommitCompacting flags were plain bools read and written from multiple goroutines without synchronization. This allowed concurrent vacuums on the same volume to pass the guard checks and run simultaneously, leading to the nil pointer crash. Using atomic.Bool with CompareAndSwap ensures only one compaction or commit can run per volume at a time. Fixes #8591 * use go-version-file in CI workflows instead of hardcoded versions Use go-version-file: 'go.mod' so CI automatically picks up the Go version from go.mod, avoiding future version drift. Reordered checkout before setup-go in go.yml and e2e.yml so go.mod is available. Removed the now-unused GO_VERSION env vars. * capture v.nm locally in CompactByIndex to close TOCTOU race A bare nil check on v.nm followed by v.nm.Sync() has a race window where CommitCompact can set v.nm = nil between the two. Snapshot the pointer into a local variable so the nil check and Sync operate on the same reference. * add dynamic timeouts to plugin worker vacuum gRPC calls All vacuum gRPC calls used context.Background() with no deadline, so the plugin scheduler's execution timeout could kill a job while a large volume compact was still in progress. Use volume-size-scaled timeouts matching the topology vacuum approach: 3 min/GB for compact, 1 min/GB for check, commit, and cleanup. Fixes #8591 * Revert "add dynamic timeouts to plugin worker vacuum gRPC calls" This reverts commit 80951934c37416bc4f6c1472a5d3f8d204a637d9. * unify compaction lifecycle into single atomic flag Replace separate isCompacting and isCommitCompacting flags with a single isCompactionInProgress atomic.Bool. This ensures CompactBy*, CommitCompact, Close, and Destroy are mutually exclusive — only one can run at a time per volume. Key changes: - All entry points use CompareAndSwap(false, true) to claim exclusive access. CompactByVolumeData and CompactByIndex now also guard v.nm and v.DataBackend with local captures. - Close() waits for the flag outside dataFileAccessLock to avoid deadlocking with CommitCompact (which holds the flag while waiting for the lock). It claims the flag before acquiring the lock so no new compaction can start. - Destroy() uses CAS instead of a racy Load check, preventing concurrent compaction from racing with volume teardown. - unmountVolumeByCollection no longer deletes from the map; DeleteCollectionFromDiskLocation removes entries only after successful Destroy, preventing orphaned volumes on failure. Fixes #8591	2026-03-10 13:31:45 -07:00
Chris Lu	3d9f7f6f81	go 1.25	2026-03-09 23:10:27 -07:00
dependabot[bot]	e1c4faba38	build(deps): bump org.apache.zookeeper:zookeeper from 3.9.4 to 3.9.5 in /test/java/spark (#8580 ) * build(deps): bump org.apache.zookeeper:zookeeper in /test/java/spark Bumps org.apache.zookeeper:zookeeper from 3.9.4 to 3.9.5. --- updated-dependencies: - dependency-name: org.apache.zookeeper:zookeeper dependency-version: 3.9.5 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * fix: use go-version-file instead of hardcoded Go version in CI workflows The hardcoded go-version '1.24' is too old for go.mod which requires go >= 1.25.0, causing build failures in Spark integration tests. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-03-09 17:29:53 -07:00
Chris Lu	b3d32fe73b	fix go version	2026-03-09 14:18:46 -07:00
Chris Lu	89f1096c0e	Update ec-integration.yml	2026-03-09 14:05:39 -07:00
Chris Lu	992db11d2b	iam: add IAM group management (#8560 ) * iam: add Group message to protobuf schema Add Group message (name, members, policy_names, disabled) and add groups field to S3ApiConfiguration for IAM group management support (issue #7742). * iam: add group CRUD to CredentialStore interface and all backends Add group management methods (CreateGroup, GetGroup, DeleteGroup, ListGroups, UpdateGroup) to the CredentialStore interface with implementations for memory, filer_etc, postgres, and grpc stores. Wire group loading/saving into filer_etc LoadConfiguration and SaveConfiguration. * iam: add group IAM response types Add XML response types for group management IAM actions: CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser. * iam: add group management handlers to embedded IAM API Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, and ListGroupsForUser handlers with dispatch in ExecuteAction. * iam: add group management handlers to standalone IAM API Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy, ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions dispatch. Also add helper functions for user/policy side effects. * iam: integrate group policies into authorization Add groups and userGroups reverse index to IdentityAccessManagement. Populate both maps during ReplaceS3ApiConfiguration and MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate policies from user's enabled groups in addition to user policies. Update VerifyActionPermission to consider group policies when checking hasAttachedPolicies. * iam: add group side effects on user deletion and rename When a user is deleted, remove them from all groups they belong to. When a user is renamed, update group membership references. Applied to both embedded and standalone IAM handlers. * iam: watch /etc/iam/groups directory for config changes Add groups directory to the filer subscription watcher so group file changes trigger IAM configuration reloads. * admin: add group management page to admin UI Add groups page with CRUD operations, member management, policy attachment, and enable/disable toggle. Register routes in admin handlers and add Groups entry to sidebar navigation. * test: add IAM group management integration tests Add comprehensive integration tests for group CRUD, membership, policy attachment, policy enforcement, disabled group behavior, user deletion side effects, and multi-group membership. Add "group" test type to CI matrix in s3-iam-tests workflow. * iam: address PR review comments for group management - Fix XSS vulnerability in groups.templ: replace innerHTML string concatenation with DOM APIs (createElement/textContent) for rendering member and policy lists - Use userGroups reverse index in embedded IAM ListGroupsForUser for O(1) lookup instead of iterating all groups - Add buildUserGroupsIndex helper in standalone IAM handlers; use it in ListGroupsForUser and removeUserFromAllGroups for efficient lookup - Add note about gRPC store load-modify-save race condition limitation * iam: add defensive copies, validation, and XSS fixes for group management - Memory store: clone groups on store/retrieve to prevent mutation - Admin dash: deep copy groups before mutation, validate user/policy exists - HTTP handlers: translate credential errors to proper HTTP status codes, use bool for Enabled field to distinguish missing vs false - Groups templ: use data attributes + event delegation instead of inline onclick for XSS safety, prevent stale async responses iam: add explicit group methods to PropagatingCredentialStore Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup methods instead of relying on embedded interface fallthrough. Group changes propagate via filer subscription so no RPC propagation needed. * iam: detect postgres unique constraint violation and add groups index Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of a generic error. Add index on groups(disabled) for filtered queries. * iam: add Marker field to group list response types Add Marker string field to GetGroupResult, ListGroupsResult, ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to match AWS IAM pagination response format. * iam: check group attachment before policy deletion Reject DeletePolicy if the policy is attached to any group, matching AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response. * iam: include group policies in IAM authorization Merge policy names from user's enabled groups into the IAMIdentity used for authorization, so group-attached policies are evaluated alongside user-attached policies. * iam: check for name collision before renaming user in UpdateUser Scan identities and inline policies for newUserName before mutating, returning EntityAlreadyExists if a collision is found. Reuse the already-loaded policies instead of loading them again inside the loop. * test: use t.Cleanup for bucket cleanup in group policy test * iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error Wrap credential.ErrUserNotInGroup so errors.Is works in groupErrorToHTTPStatus, returning proper 400 instead of 500. * admin: regenerate groups_templ.go with XSS-safe data attributes Regenerated from groups.templ which uses data-group-name attributes instead of inline onclick with string interpolation. * iam: add input validation and persist groups during migration - Validate nil/empty group name in CreateGroup and UpdateGroup - Save groups in migrateToMultiFile so they survive legacy migration * admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies * iam: short-circuit UpdateUser when newUserName equals current name * iam: require empty PolicyNames before group deletion Reject DeleteGroup when group has attached policies, matching the existing members check. Also fix GetGroup error handling in DeletePolicy to only skip ErrGroupNotFound, not all errors. * ci: add weed/pb/** to S3 IAM test trigger paths * test: replace time.Sleep with require.Eventually for propagation waits Use polling with timeout instead of fixed sleeps to reduce flakiness in integration tests waiting for IAM policy propagation. * fix: use credentialManager.GetPolicy for AttachGroupPolicy validation Policies created via CreatePolicy through credentialManager are stored in the credential store, not in s3cfg.Policies (which only has static config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy() for policy existence validation. * feat: add UpdateGroup handler to embedded IAM API Add UpdateGroup action to enable/disable groups and rename groups via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used by tests to toggle group disabled status. * fix: authenticate raw IAM API calls in group tests The embedded IAM endpoint rejects anonymous requests. Replace callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token authentication via the test framework. * feat: add UpdateGroup handler to standalone IAM API Mirror the embedded IAM UpdateGroup handler in the standalone IAM API for parity. * fix: add omitempty to Marker XML tags in group responses Non-truncated responses should not emit an empty <Marker/> element. * fix: distinguish backend errors from missing policies in AttachGroupPolicy Return ServiceFailure for credential manager errors instead of masking them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups instead of in-memory reverse index to avoid stale data. Add duplicate name check to UpdateGroup rename. * fix: standalone IAM AttachGroupPolicy uses persisted policy store Check managed policies from GetPolicies() instead of s3cfg.Policies so dynamically created policies are found. Also add duplicate name check to UpdateGroup rename. * fix: rollback inline policies on UpdateUser PutPolicies failure If PutPolicies fails after moving inline policies to the new username, restore both the identity name and the inline policies map to their original state to avoid a partial-write window. * fix: correct test cleanup ordering for group tests Replace scattered defers with single ordered t.Cleanup in each test to ensure resources are torn down in reverse-creation order: remove membership, detach policies, delete access keys, delete users, delete groups, delete policies. Move bucket cleanup to parent test scope and delete objects before bucket. * fix: move identity nil check before map lookup and refine hasAttachedPolicies Move the nil check on identity before accessing identity.Name to prevent panic. Also refine hasAttachedPolicies to only consider groups that are enabled and have actual policies attached, so membership in a no-policy group doesn't incorrectly trigger IAM authorization. * fix: fail group reload on unreadable or corrupt group files Return errors instead of logging and continuing when group files cannot be read or unmarshaled. This prevents silently applying a partial IAM config with missing group memberships or policies. * fix: use errors.Is for sql.ErrNoRows comparison in postgres group store * docs: explain why group methods skip propagateChange Group changes propagate to S3 servers via filer subscription (watching /etc/iam/groups/) rather than gRPC RPCs, since there are no group-specific RPCs in the S3 cache protocol. * fix: remove unused policyNameFromArn and strings import * fix: update service account ParentUser on user rename When renaming a user via UpdateUser, also update ParentUser references in service accounts to prevent them from becoming orphaned after the next configuration reload. * fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps it to 400 instead of falling back to 500. * fix: use admin S3 client for bucket cleanup in enforcement test The user S3 client may lack permissions by cleanup time since the user is removed from the group in an earlier subtest. Use the admin S3 client to ensure bucket and object cleanup always succeeds. * fix: add nil guard for group param in propagating store log calls Prevent potential nil dereference when logging group.Name in CreateGroup and UpdateGroup of PropagatingCredentialStore. * fix: validate Disabled field in UpdateGroup handlers Reject values other than "true" or "false" with InvalidInputException instead of silently treating them as false. * fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration Previously the merge started with empty group maps, dropping any static-file groups. Now seeds from existing iam.groups before overlaying dynamic config, and builds the reverse index after merging to avoid stale entries from overridden groups. * fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading Replace direct equality (==) with errors.Is() to correctly match wrapped errors, consistent with the rest of the codebase. * fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus Map these sentinel errors to 404 so AddGroupMember and AttachGroupPolicy return proper HTTP status codes. * fix: log cleanup errors in group integration tests Replace fire-and-forget cleanup calls with error-checked versions that log failures via t.Logf for debugging visibility. * fix: prevent duplicate group test runs in CI matrix The basic lane's -run "TestIAM" regex also matched TestIAMGroup* tests, causing them to run in both the basic and group lanes. Replace with explicit test function names. * fix: add GIN index on groups.members JSONB for membership lookups Without this index, ListGroupsForUser and membership queries require full table scans on the groups table. * fix: handle cross-directory moves in IAM config subscription When a file is moved out of an IAM directory (e.g., /etc/iam/groups), the dir variable was overwritten with NewParentPath, causing the source directory change to be missed. Now also notifies handlers about the source directory for cross-directory moves. * fix: validate members/policies before deleting group in admin handler AdminServer.DeleteGroup now checks for attached members and policies before delegating to credentialManager, matching the IAM handler guards. * fix: merge groups by name instead of blind append during filer load Match the identity loader's merge behavior: find existing group by name and replace, only append when no match exists. Prevents duplicates when legacy and multi-file configs overlap. * fix: check DeleteEntry response error when cleaning obsolete group files Capture and log resp.Error from filer DeleteEntry calls during group file cleanup, matching the pattern used in deleteGroupFile. * fix: verify source user exists before no-op check in UpdateUser Reorder UpdateUser to find the source identity first and return NoSuchEntityException if not found, before checking if the rename is a no-op. Previously a non-existent user renamed to itself would incorrectly return success. * fix: update service account parent refs on user rename in embedded IAM The embedded IAM UpdateUser handler updated group membership but not service account ParentUser fields, unlike the standalone handler. * fix: replay source-side events for all handlers on cross-dir moves Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for the source directory during cross-directory moves, so all watchers can clear caches for the moved-away resource. * fix: don't seed mergedGroups from existing iam.groups in merge Groups are always dynamic (from filer), never static (from s3.config). Seeding from iam.groups caused stale deleted groups to persist. Now only uses config.Groups from the dynamic filer config. * fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect Register t.Cleanup for the created user so it gets cleaned up even if the test fails before the inline DeleteUser call. * fix: assert UpdateGroup HTTP status in disabled group tests Add require.Equal checks for 200 status after UpdateGroup calls so the test fails immediately on API errors rather than relying on the subsequent Eventually timeout. * fix: trim whitespace from group name in filer store operations Trim leading/trailing whitespace from group.Name before validation in CreateGroup and UpdateGroup to prevent whitespace-only filenames. Also merge groups by name during multi-file load to prevent duplicates. * fix: add nil/empty group validation in gRPC store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics and invalid persistence. * fix: add nil/empty group validation in postgres store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil member access and empty-name row inserts. * fix: add name collision check in embedded IAM UpdateUser The embedded IAM handler renamed users without checking if the target name already existed, unlike the standalone handler. * fix: add ErrGroupNotEmpty sentinel and map to HTTP 409 AdminServer.DeleteGroup now wraps conflict errors with ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to 409 Conflict instead of 500. * fix: use appropriate error message in GetGroupDetails based on status Return "Group not found" only for 404, use "Failed to retrieve group" for other error statuses instead of always saying "Group not found". * fix: use backend-normalized group.Name in CreateGroup response After credentialManager.CreateGroup may normalize the name (e.g., trim whitespace), use group.Name instead of the raw input for the returned GroupData to ensure consistency. * fix: add nil/empty group validation in memory store Guard CreateGroup and UpdateGroup against nil group or empty name to prevent panics from nil pointer dereference on map access. * fix: reorder embedded IAM UpdateUser to verify source first Find the source identity before checking for collisions, matching the standalone handler's logic. Previously a non-existent user renamed to an existing name would get EntityAlreadyExists instead of NoSuchEntity. * fix: handle same-directory renames in metadata subscription Replay a delete event for the old entry name during same-directory renames so handlers like onBucketMetadataChange can clean up stale state for the old name. * fix: abort GetGroups on non-ErrGroupNotFound errors Only skip groups that return ErrGroupNotFound. Other errors (e.g., transient backend failures) now abort the handler and return the error to the caller instead of silently producing partial results. * fix: add aria-label and title to icon-only group action buttons Add accessible labels to View and Delete buttons so screen readers and tooltips provide meaningful context. * fix: validate group name in saveGroup to prevent invalid filenames Trim whitespace and reject empty names before writing group JSON files, preventing creation of files like ".json". * fix: add /etc/iam/groups to filer subscription watched directories The groups directory was missing from the watched directories list, so S3 servers in a cluster would not detect group changes made by other servers via filer. The onIamConfigChange handler already had code to handle group directory changes but it was never triggered. * add direct gRPC propagation for group changes to S3 servers Groups now have the same dual propagation as identities and policies: direct gRPC push via propagateChange + async filer subscription. - Add PutGroup/RemoveGroup proto messages and RPCs - Add PutGroup/RemoveGroup in-memory cache methods on IAM - Add PutGroup/RemoveGroup gRPC server handlers - Update PropagatingCredentialStore to call propagateChange on group mutations * reduce log verbosity for config load summary Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof to avoid noisy output on every config reload. * admin: show user groups in view and edit user modals - Add Groups field to UserDetails and populate from credential manager - Show groups as badges in user details view modal - Add group management to edit user modal: display current groups, add to group via dropdown, remove from group via badge x button * fix: remove duplicate showAlert that broke modal-alerts.js admin.js defined showAlert(type, message) which overwrote the modal-alerts.js version showAlert(message, type), causing broken unstyled alert boxes. Remove the duplicate and swap all callers in admin.js to use the correct (message, type) argument order. * fix: unwrap groups API response in edit user modal The /api/groups endpoint returns {"groups": [...]}, not a bare array. * Update object_store_users_templ.go * test: assert AccessDenied error code in group denial tests Replace plain assert.Error checks with awserr.Error type assertion and AccessDenied code verification, matching the pattern used in other IAM integration tests. * fix: propagate GetGroups errors in ShowGroups handler getGroupsPageData was swallowing errors and returning an empty page with 200 status. Now returns the error so ShowGroups can respond with a proper error status. * fix: reject AttachGroupPolicy when credential manager is nil Previously skipped policy existence validation when credentialManager was nil, allowing attachment of nonexistent policies. Now returns a ServiceFailureException error. * fix: preserve groups during partial MergeS3ApiConfiguration updates UpsertIdentity calls MergeS3ApiConfiguration with a partial config containing only the updated identity (nil Groups). This was wiping all in-memory group state. Now only replaces groups when config.Groups is non-nil (full config reload). * fix: propagate errors from group lookup in GetObjectStoreUserDetails ListGroups and GetGroup errors were silently ignored, potentially showing incomplete group data in the UI. * fix: use DOM APIs for group badge remove button to prevent XSS Replace innerHTML with onclick string interpolation with DOM createElement + addEventListener pattern. Also add aria-label and title to the add-to-group button. * fix: snapshot group policies under RLock to prevent concurrent map access evaluateIAMPolicies was copying the map reference via groupMap := iam.groups under RLock then iterating after RUnlock, while PutGroup mutates the map in-place. Now copies the needed policy names into a slice while holding the lock. * fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers Match the nil guard pattern used by PutPolicy/DeletePolicy to prevent nil pointer dereference when IAM is not initialized.	2026-03-09 11:54:32 -07:00
dependabot[bot]	1272612bbd	build(deps): bump docker/setup-qemu-action from 3 to 4 (#8574 ) Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](https://github.com/docker/setup-qemu-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 11:33:38 -07:00
dependabot[bot]	e568d85a5c	build(deps): bump docker/build-push-action from 6 to 7 (#8572 ) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 11:33:17 -07:00
dependabot[bot]	f79ba1eb37	build(deps): bump docker/login-action from 3 to 4 (#8569 ) Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 11:13:06 -07:00
dependabot[bot]	b132232895	build(deps): bump docker/setup-buildx-action from 3 to 4 (#8570 ) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 11:12:56 -07:00
dependabot[bot]	d765ff50e6	build(deps): bump actions/dependency-review-action from 4.8.3 to 4.9.0 (#8571 ) Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.8.3 to 4.9.0. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](`05fe457637...2031cfc080`) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-version: 4.9.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 11:12:40 -07:00
Chris Lu	f249fb7e63	ci: build _full and _large_disk_full images for arm64 (#8548 ) The _full and _large_disk_full Docker image variants were only built for linux/amd64, preventing ARM64 users from using features like gocdk_pub_sub (RabbitMQ notifications) that require the gocdk build tag. Add linux/arm64 platform target to these variants. Closes #8546	2026-03-07 13:55:37 -08:00
dependabot[bot]	74593f7065	build(deps): bump actions/upload-artifact from 6 to 7 (#8482 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v6...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-02 10:27:57 -08:00
Chris Lu	340339f678	Add Apache Polaris integration tests (#8478 ) * test: add polaris integration test harness * test: add polaris integration coverage * ci: run polaris s3 tables tests * test: harden polaris harness * test: DRY polaris integration tests * ci: pre-pull Polaris image * test: extend Polaris pull timeout * test: refine polaris credentials selection * test: keep Polaris tables inside allowed location * test: use fresh context for polaris cleanup * test: prefer specific Polaris storage credential * test: tolerate Polaris credential variants * test: request Polaris vended credentials * test: load Polaris table credentials * test: allow polaris vended access via bucket policy * test: align Polaris object keys with table location	2026-03-01 23:08:50 -08:00
blitt001	3d81d5bef7	Fix S3 signature verification behind reverse proxies (#8444 ) * Fix S3 signature verification behind reverse proxies When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong, Traefik), AWS S3 Signature V4 verification fails because the Host header the client signed with (e.g. "localhost:9000") differs from the Host header SeaweedFS receives on the backend (e.g. "seaweedfs:8333"). This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL environment variable) that tells SeaweedFS what public-facing URL clients use to connect. When set, SeaweedFS uses this host value for signature verification instead of the Host header from the incoming request. New parameter: -s3.externalUrl (flag) or S3_EXTERNAL_URL (environment variable) Example: -s3.externalUrl=http://localhost:9000 Example: S3_EXTERNAL_URL=https://s3.example.com The environment variable is particularly useful in Docker/Kubernetes deployments where the external URL is injected via container config. The flag takes precedence over the environment variable when both are set. At startup, the URL is parsed and default ports are stripped to match AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so "http://s3.example.com:80" and "http://s3.example.com" are equivalent. Bugs fixed: - Default port stripping was removed by a prior PR, causing signature mismatches when clients connect on standard ports (80/443) - X-Forwarded-Port was ignored when X-Forwarded-Host was not present - Scheme detection now uses proper precedence: X-Forwarded-Proto > TLS connection > URL scheme > "http" - Test expectations for standard port stripping were incorrect - expectedHost field in TestSignatureV4WithForwardedPort was declared but never actually checked (self-referential test) * Add Docker integration test for S3 proxy signature verification Docker Compose setup with nginx reverse proxy to validate that the -s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly resolves S3 signature verification when SeaweedFS runs behind a proxy. The test uses nginx proxying port 9000 to SeaweedFS on port 8333, with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000" for signature verification, matching what the AWS CLI signs with. The test can be run with aws CLI on the host or without it by using the amazon/aws-cli Docker image with --network host. Test covers: create-bucket, list-buckets, put-object, head-object, list-objects-v2, get-object, content round-trip integrity, delete-object, and delete-bucket — all through the reverse proxy. * Create s3-proxy-signature-tests.yml * fix CLI * fix CI * Update s3-proxy-signature-tests.yml * address comments * Update Dockerfile * add user * no need for fuse * Update s3-proxy-signature-tests.yml * debug * weed mini * fix health check * health check * fix health checking --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-02-26 14:20:42 -08:00
Chris Lu	453310b057	Add plugin worker integration tests for erasure coding (#8450 ) * test: add plugin worker integration harness * test: add erasure coding detection integration tests * test: add erasure coding execution integration tests * ci: add plugin worker integration workflow * test: extend fake volume server for vacuum and balance * test: expand erasure coding detection topologies * test: add large erasure coding detection topology * test: add vacuum plugin worker integration tests * test: add volume balance plugin worker integration tests * ci: run plugin worker tests per worker * fixes * erasure coding: stop after placement failures * erasure coding: record hasMore when early stopping * erasure coding: relax large topology expectations	2026-02-25 22:11:41 -08:00
dependabot[bot]	c5e8e4f049	build(deps): bump actions/dependency-review-action from 4.8.2 to 4.8.3 (#8414 ) Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.8.2 to 4.8.3. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](`3c4e3dcb1a...05fe457637`) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-version: 4.8.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 13:41:57 -08:00
dependabot[bot]	b033823611	build(deps): bump helm/kind-action from 1.13.0 to 1.14.0 (#8412 ) Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.13.0 to 1.14.0. - [Release notes](https://github.com/helm/kind-action/releases) - [Commits](https://github.com/helm/kind-action/compare/v1.13.0...v1.14.0) --- updated-dependencies: - dependency-name: helm/kind-action dependency-version: 1.14.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 13:41:40 -08:00
Michał Szynkiewicz	2f837c4780	Fix error on deleting non-empty bucket (#8376 ) * Move check for non-empty bucket deletion out of `WithFilerClient` call * Added proper checking if a bucket has "user" objects	2026-02-19 22:56:50 -08:00
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	2026-02-16 13:59:13 -08:00
Chris Lu	f1bf60d288	faster	2026-02-13 14:09:30 -08:00
Chris Lu	beeb375a88	Add volume server integration test suite and CI workflow (#8322 ) * docs(volume_server): add integration test development plan * test(volume_server): add integration harness and profile matrix * test(volume_server/http): add admin and options integration coverage * test(volume_server/grpc): add state and status integration coverage * test(volume_server): auto-build weed binary and harden cluster startup * test(volume_server/http): add upload read range head delete coverage * test(volume_server/grpc): expand admin lifecycle and state coverage * docs(volume_server): update progress tracker for implemented tests * test(volume_server/http): cover if-none-match and invalid-range branches * test(volume_server/grpc): add batch delete integration coverage * docs(volume_server): log latest HTTP and gRPC test coverage * ci(volume_server): run volume server integration tests in github actions * test(volume_server/grpc): add needle status configure ping and leave coverage * docs(volume_server): record additional grpc coverage progress * test(volume_server/grpc): add vacuum integration coverage * docs(volume_server): record vacuum test coverage progress * test(volume_server/grpc): add read and write needle blob error-path coverage * docs(volume_server): record data rw grpc coverage progress * test(volume_server/http): add jwt auth integration coverage * test(volume_server/grpc): add sync copy and stream error-path coverage * docs(volume_server): record jwt and sync/copy test coverage * test(volume_server/grpc): add scrub and query integration coverage * test(volume_server/grpc): add volume tail sender and receiver coverage * docs(volume_server): record scrub query and tail test progress * test(volume_server/grpc): add readonly writable and collection lifecycle coverage * test(volume_server/http): add public-port cors and method parity coverage * test(volume_server/grpc): add blob meta and read-all success path coverage * test(volume_server/grpc): expand scrub and query variation coverage * test(volume_server/grpc): add tiering and remote fetch error-path coverage * test(volume_server/http): add unchanged write and delete edge-case coverage * test(volume_server/grpc): add ping unknown and unreachable target coverage * test(volume_server/grpc): add volume delete only-empty variation coverage * test(volume_server/http): add jwt fid-mismatch auth coverage * test(volume_server/grpc): add scrub ec auto-select empty coverage * test(volume_server/grpc): stabilize ping timestamp assertion * docs(volume_server): update integration coverage progress log * test(volume_server/grpc): add tier remote backend and config variation coverage * docs(volume_server): record tier remote variation progress * test(volume_server/grpc): add incremental copy and receive-file protocol coverage * test(volume_server/http): add read path shape and if-modified-since coverage * test(volume_server/grpc): add copy-file compaction and receive-file success coverage * test(volume_server/http): add passthrough headers and static asset coverage * test(volume_server/grpc): add ping filer unreachable coverage * docs(volume_server): record copy receive and http variant progress * test(volume_server/grpc): add erasure coding maintenance and missing-path coverage * docs(volume_server): record initial erasure coding rpc coverage * test(volume_server/http): add multi-range multipart response coverage * docs(volume_server): record multi-range http coverage progress * test(volume_server/grpc): add query empty-stripe no-match coverage * docs(volume_server): record query no-match stream behavior coverage * test(volume_server/http): add upload throttling timeout and replicate bypass coverage * docs(volume_server): record upload throttling coverage progress * test(volume_server/http): add download throttling timeout coverage * docs(volume_server): record download throttling coverage progress * test(volume_server/http): add jwt wrong-cookie fid mismatch coverage * docs(volume_server): record jwt wrong-cookie mismatch coverage * test(volume_server/http): add jwt expired-token rejection coverage * docs(volume_server): record jwt expired-token coverage * test(volume_server/http): add jwt query and cookie transport coverage * docs(volume_server): record jwt token transport coverage * test(volume_server/http): add jwt token-source precedence coverage * docs(volume_server): record jwt token-source precedence coverage * test(volume_server/http): add jwt header-over-cookie precedence coverage * docs(volume_server): record jwt header cookie precedence coverage * test(volume_server/http): add jwt query-over-cookie precedence coverage * docs(volume_server): record jwt query cookie precedence coverage * test(volume_server/grpc): add setstate version mismatch and nil-state coverage * docs(volume_server): record setstate validation coverage * test(volume_server/grpc): add readonly persist-true lifecycle coverage * docs(volume_server): record readonly persist variation coverage * test(volume_server/http): add options origin cors header coverage * docs(volume_server): record options origin cors coverage * test(volume_server/http): add trace unsupported-method parity coverage * docs(volume_server): record trace method parity coverage * test(volume_server/grpc): add batch delete cookie-check variation coverage * docs(volume_server): record batch delete cookie-check coverage * test(volume_server/grpc): add admin lifecycle missing and maintenance variants * docs(volume_server): record admin lifecycle edge-case coverage * test(volume_server/grpc): add mixed batch delete status matrix coverage * docs(volume_server): record mixed batch delete matrix coverage * test(volume_server/http): add jwt-profile ui access gating coverage * docs(volume_server): record jwt ui-gating http coverage * test(volume_server/http): add propfind unsupported-method parity coverage * docs(volume_server): record propfind method parity coverage * test(volume_server/grpc): add volume configure success and rollback-path coverage * docs(volume_server): record volume configure branch coverage * test(volume_server/grpc): add volume needle status missing-path coverage * docs(volume_server): record volume needle status error-path coverage * test(volume_server/http): add readDeleted query behavior coverage * docs(volume_server): record readDeleted http behavior coverage * test(volume_server/http): add delete ts override parity coverage * docs(volume_server): record delete ts parity coverage * test(volume_server/grpc): add invalid blob/meta offset coverage * docs(volume_server): record invalid blob/meta offset coverage * test(volume_server/grpc): add read-all mixed volume abort coverage * docs(volume_server): record read-all mixed-volume abort coverage * test(volume_server/http): assert head response body parity * docs(volume_server): record head body parity assertion * test(volume_server/grpc): assert status state and memory payload completeness * docs(volume_server): record volume server status payload coverage * test(volume_server/grpc): add batch delete chunk-manifest rejection coverage * docs(volume_server): record batch delete chunk-manifest coverage * test(volume_server/grpc): add query cookie-mismatch eof parity coverage * docs(volume_server): record query cookie-mismatch parity coverage * test(volume_server/grpc): add ping master success target coverage * docs(volume_server): record ping master success coverage * test(volume_server/http): add head if-none-match conditional parity * docs(volume_server): record head if-none-match parity coverage * test(volume_server/http): add head if-modified-since parity coverage * docs(volume_server): record head if-modified-since parity coverage * test(volume_server/http): add connect unsupported-method parity coverage * docs(volume_server): record connect method parity coverage * test(volume_server/http): assert options allow-headers cors parity * docs(volume_server): record options allow-headers coverage * test(volume_server/framework): add dual volume cluster integration harness * test(volume_server/http): add missing-local read mode proxy redirect local coverage * docs(volume_server): record read mode missing-local matrix coverage * test(volume_server/http): add download over-limit replica proxy fallback coverage * docs(volume_server): record download replica fallback coverage * test(volume_server/http): add missing-local readDeleted proxy redirect parity coverage * docs(volume_server): record missing-local readDeleted mode coverage * test(volume_server/framework): add single-volume cluster with filer harness * test(volume_server/grpc): add ping filer success target coverage * docs(volume_server): record ping filer success coverage * test(volume_server/http): add proxied-loop guard download timeout coverage * docs(volume_server): record proxied-loop download coverage * test(volume_server/http): add disabled upload and download limit coverage * docs(volume_server): record disabled throttling path coverage * test(volume_server/grpc): add idempotent volume server leave coverage * docs(volume_server): record leave idempotence coverage * test(volume_server/http): add redirect collection query preservation coverage * docs(volume_server): record redirect collection query coverage * test(volume_server/http): assert admin server headers on status and health * docs(volume_server): record admin server header coverage * test(volume_server/http): assert healthz request-id echo parity * docs(volume_server): record healthz request-id parity coverage * test(volume_server/http): add over-limit invalid-vid download branch coverage * docs(volume_server): record over-limit invalid-vid branch coverage * test(volume_server/http): add public-port static asset coverage * docs(volume_server): record public static endpoint coverage * test(volume_server/http): add public head method parity coverage * docs(volume_server): record public head parity coverage * test(volume_server/http): add throttling wait-then-proceed path coverage * docs(volume_server): record throttling wait-then-proceed coverage * test(volume_server/http): add read cookie-mismatch not-found coverage * docs(volume_server): record read cookie-mismatch coverage * test(volume_server/http): add throttling timeout-recovery coverage * docs(volume_server): record throttling timeout-recovery coverage * test(volume_server/grpc): add ec generate mount info unmount lifecycle coverage * docs(volume_server): record ec positive lifecycle coverage * test(volume_server/grpc): add ec shard read and blob delete lifecycle coverage * docs(volume_server): record ec shard read/blob delete lifecycle coverage * test(volume_server/grpc): add ec rebuild and to-volume error branch coverage * docs(volume_server): record ec rebuild and to-volume branch coverage * test(volume_server/grpc): add ec shards-to-volume success roundtrip coverage * docs(volume_server): record ec shards-to-volume success coverage * test(volume_server/grpc): add ec receive and copy-file missing-source coverage * docs(volume_server): record ec receive and copy-file coverage * test(volume_server/grpc): add ec last-shard delete cleanup coverage * docs(volume_server): record ec last-shard delete cleanup coverage * test(volume_server/grpc): add volume copy success path coverage * docs(volume_server): record volume copy success coverage * test(volume_server/grpc): add volume copy overwrite-destination coverage * docs(volume_server): record volume copy overwrite coverage * test(volume_server/http): add write error-path variant coverage * docs(volume_server): record http write error-path coverage * test(volume_server/http): add conditional header precedence coverage * docs(volume_server): record conditional header precedence coverage * test(volume_server/http): add oversized combined range guard coverage * docs(volume_server): record oversized range guard coverage * test(volume_server/http): add image resize and crop read coverage * docs(volume_server): record image transform coverage * test(volume_server/http): add chunk-manifest expansion and bypass coverage * docs(volume_server): record chunk-manifest read coverage * test(volume_server/http): add compressed read encoding matrix coverage * docs(volume_server): record compressed read matrix coverage * test(volume_server/grpc): add tail receiver source replication coverage * docs(volume_server): record tail receiver replication coverage * test(volume_server/grpc): add tail sender large-needle chunking coverage * docs(volume_server): record tail sender chunking coverage * test(volume_server/grpc): add ec-backed volume needle status coverage * docs(volume_server): record ec-backed needle status coverage * test(volume_server/grpc): add ec shard copy from peer success coverage * docs(volume_server): record ec shard copy success coverage * test(volume_server/http): add chunk-manifest delete child cleanup coverage * docs(volume_server): record chunk-manifest delete cleanup coverage * test(volume_server/http): add chunk-manifest delete failure-path coverage * docs(volume_server): record chunk-manifest delete failure coverage * test(volume_server/grpc): add ec shard copy source-unavailable coverage * docs(volume_server): record ec shard copy source-unavailable coverage * parallel	2026-02-13 00:40:56 -08:00
Chris Lu	796f23f68a	Fix STS InvalidAccessKeyId and request body consumption issues (#8328 ) * Fix STS InvalidAccessKeyId and request body consumption in Lakekeeper integration test * Remove debug prints * Add Lakekeeper integration tests to CI * Fix connection refused in CI by binding to 0.0.0.0 * Add timeout to docker run in Lakekeeper integration test * Update weed/s3api/auth_credentials.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-12 17:37:07 -08:00
Chris Lu	c1a9263e37	Fix STS AssumeRole with POST body param (#8320 ) * Fix STS AssumeRole with POST body param and add integration test * Add STS integration test to CI workflow * Address code review feedback: fix HPP vulnerability and style issues * Refactor: address code review feedback - Fix HTTP Parameter Pollution vulnerability in UnifiedPostHandler - Refactor permission check logic for better readability - Extract test helpers to testutil/docker.go to reduce duplication - Clean up imports and simplify context setting * Add SigV4-style test variant for AssumeRole POST body routing - Added ActionInBodyWithSigV4Style test case to validate real-world scenario - Test confirms routing works correctly for AWS SigV4-signed requests - Addresses code review feedback about testing with SigV4 signatures * Fix: always set identity in context when non-nil - Ensure UnifiedPostHandler always calls SetIdentityInContext when identity is non-nil - Only call SetIdentityNameInContext when identity.Name is non-empty - This ensures downstream handlers (embeddedIam.DoActions) always have access to identity - Addresses potential issue where empty identity.Name would skip context setting	2026-02-12 12:04:07 -08:00
Chris Lu	b8ef48c8f1	Add RisingWave catalog tests (#8308 ) * Add RisingWave catalog tests for S3 tables * Add RisingWave catalog integration tests to CI workflow * Refactor RisingWave catalog tests based on PR feedback * Address PR feedback: optimize checks, cleanup logs * fix tests * consistent	2026-02-11 22:00:06 -08:00
Chris Lu	ac242d04ee	one time manual run	2026-02-11 13:20:52 -08:00
Chris Lu	21543134c8	fix manual build process	2026-02-11 13:20:41 -08:00
Chris Lu	2d97685390	ci: fix container_release_unified manual dispatch and workflow parsing (#8293 ) ci: fix unified release workflow dispatch matrix filtering	2026-02-10 13:15:52 -08:00
Chris Lu	b73bd08470	ci: move manual container builds to unified release workflow (#8290 ) * ci: move manual dev container build into unified release workflow * ci: make unified manual container build release-tag based	2026-02-10 12:47:39 -08:00
Chris Lu	b261c89675	Fix RocksDB container build compatibility and add manual rocksdb dispatch (#8288 ) * Fix RocksDB build compatibility and add manual rocksdb trigger * Upgrade RocksDB defaults and keep grocksdb v1.10.7 * Add manual latest-image trigger inputs for ref and variant * Allow manual latest build to set image tag and source ref * Fix manual variant selection using setup job matrix output	2026-02-10 11:48:42 -08:00
Chris Lu	c9428c2c0f	Modify AI review comments checklist in PR template Updated AI code review checklist to include potential additional reviews.	2026-02-09 08:58:35 -08:00
Chris Lu	458c12fb99	test: add Spark S3 integration regression for issue #8234 (#8249 ) * test: add Spark S3 regression integration test for issue 8234 * Update test/s3/spark/setup_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-08 21:13:31 -08:00

1 2 3 4 5 ...

580 Commits