Commit Graph

41 Commits

Author SHA1 Message Date
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
ced2236cc6 Adjust rename events metadata format (#8854)
* rename metadata events

* fix subscription filter to use NewEntry.Name for rename path matching

The server-side subscription filter constructed the new path using
OldEntry.Name instead of NewEntry.Name when checking if a rename
event's destination matches the subscriber's path prefix. This could
cause events to be incorrectly filtered when a rename changes the
file name.

* fix bucket events to handle rename of bucket directories

onBucketEvents only checked IsCreate and IsDelete. A bucket directory
rename via AtomicRenameEntry now emits a single rename event (both
OldEntry and NewEntry non-nil), which matched neither check. Handle
IsRename by deleting the old bucket and creating the new one.

* fix replicator to handle rename events across directory boundaries

Two issues fixed:

1. The replicator filtered events by checking if the key (old path)
   was under the source directory. Rename events now use the old path
   as key, so renames from outside into the watched directory were
   silently dropped. Now both old and new paths are checked, and
   cross-boundary renames are converted to create or delete.

2. NewParentPath was passed to the sink without remapping to the
   sink's target directory structure, causing the sink to write
   entries at the wrong location. Now NewParentPath is remapped
   alongside the key.

* fix filer sync to handle rename events crossing directory boundaries

The early directory-prefix filter only checked resp.Directory (old
parent). Rename events now carry the old parent as Directory, so
renames from outside the source path into it were dropped before
reaching the existing cross-boundary handling logic. Check both old
and new directories against sourcePath and excludePaths so the
downstream old-key/new-key logic can properly convert these to
create or delete operations.

* fix metadata event path matching

* fix metadata event consumers for rename targets

* Fix replication rename target keys

Logical rename events now reach replication sinks with distinct source and target paths.\n\nHandle non-filer sinks as delete-plus-create on the translated target key, and make the rename fallback path create at the translated target key too.\n\nAdd focused tests covering non-filer renames, filer rename updates, and the fallback path.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix filer sync rename path scoping

Use directory-boundary matching instead of raw prefix checks when classifying source and target paths during filer sync.\n\nAlso apply excludePaths per side so renames across excluded boundaries downgrade cleanly to create/delete instead of being misclassified as in-scope updates.\n\nAdd focused tests for boundary matching and rename classification.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix replicator directory boundary checks

Use directory-boundary matching instead of raw prefix checks when deciding whether a source or target path is inside the watched tree or an excluded subtree.\n\nThis prevents sibling paths such as /foo and /foobar from being misclassified during rename handling, and preserves the earlier rename-target-key fix.\n\nAdd focused tests for boundary matching and rename classification across sibling/excluded directories.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix etc-remote rename-out handling

Use boundary-safe source/target directory membership when classifying metadata events under DirectoryEtcRemote.\n\nThis prevents rename-out events from being processed as config updates, while still treating them as removals where appropriate for the remote sync and remote gateway command paths.\n\nAdd focused tests for update/removal classification and sibling-prefix handling.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Defer rename events until commit

Queue logical rename metadata events during atomic and streaming renames and publish them only after the transaction commits successfully.\n\nThis prevents subscribers from seeing delete or logical rename events for operations that later fail during delete or commit.\n\nAlso serialize notification.Queue swaps in rename tests and add failure-path coverage.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Skip descendant rename target lookups

Avoid redundant target lookups during recursive directory renames once the destination subtree is known absent.\n\nThe recursive move path now inserts known-absent descendants directly, and the test harness exercises prefixed directory listing so the optimization is covered by a directory rename regression test.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tighten rename review tests

Return filer_pb.ErrNotFound from the bucket tracking store test stub so it follows the FilerStore contract, and add a webhook filter case for same-name renames across parent directories.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix HardLinkId format verb in InsertEntryKnownAbsent error

HardLinkId is a byte slice. %d prints each byte as a decimal number
which is not useful for an identifier. Use %x to match the log line
two lines above.

* only skip descendant target lookup when source and dest use same store

moveFolderSubEntries unconditionally passed skipTargetLookup=true for
every descendant. This is safe when all paths resolve to the same
underlying store, but with path-specific store configuration a child's
destination may map to a different backend that already holds an entry
at that path. Use FilerStoreWrapper.SameActualStore to check per-child
and fall back to the full CreateEntry path when stores differ.

* add nil and create edge-case tests for metadata event scope helpers

* extract pathIsEqualOrUnder into util.IsEqualOrUnder

Identical implementations existed in both replication/replicator.go and
command/filer_sync.go. Move to util.IsEqualOrUnder (alongside the
existing FullPath.IsUnder) and remove the duplicates.

* use MetadataEventTargetDirectory for new-side directory in filer sync

The new-side directory checks and sourceNewKey computation used
message.NewParentPath directly. If NewParentPath were empty (legacy
events, older filer versions during rolling upgrades), sourceNewKey
would be wrong (/filename instead of /dir/filename) and the
UpdateEntry parent path rewrite would panic on slice bounds.

Derive targetDir once from MetadataEventTargetDirectory, which falls
back to resp.Directory when NewParentPath is empty, and use it
consistently for all new-side checks and the sink parent path.
2026-03-30 18:25:11 -07:00
Chris Lu
0b3867dca3 filer: add structured error codes to CreateEntryResponse (#8767)
* filer: add FilerError enum and error_code field to CreateEntryResponse

Add a machine-readable error code alongside the existing string error
field. This follows the precedent set by PublishMessageResponse in the
MQ broker proto. The string field is kept for human readability and
backward compatibility.

Defined codes: OK, ENTRY_NAME_TOO_LONG, PARENT_IS_FILE,
EXISTING_IS_DIRECTORY, EXISTING_IS_FILE, ENTRY_ALREADY_EXISTS.

* filer: add sentinel errors and error code mapping in filer_pb

Define sentinel errors (ErrEntryNameTooLong, ErrParentIsFile, etc.) in
the filer_pb package so both the filer and consumers can reference them
without circular imports.

Add FilerErrorToSentinel() to map proto error codes to sentinels, and
update CreateEntryWithResponse() to check error_code first, falling back
to the string-based path for backward compatibility with old servers.

* filer: return wrapped sentinel errors and set proto error codes

Replace fmt.Errorf string errors in filer.CreateEntry, UpdateEntry, and
ensureParentDirectoryEntry with wrapped filer_pb sentinel errors (using
%w). This preserves errors.Is() traversal on the server side.

In the gRPC CreateEntry handler, map sentinel errors to the
corresponding FilerError proto codes using errors.Is(), setting both
resp.Error (string, for backward compat) and resp.ErrorCode (enum).

* S3: use errors.Is() with filer sentinels instead of string matching

Replace fragile string-based error matching in filerErrorToS3Error and
other S3 API consumers with errors.Is() checks against filer_pb sentinel
errors. This works because the updated CreateEntryWithResponse helper
reconstructs sentinel errors from the proto FilerError code.

Update iceberg stage_create and metadata_files to check resp.ErrorCode
instead of parsing resp.Error strings. Update SSE-S3 to use errors.Is()
for the already-exists check.

String matching is retained only for non-filer errors (gRPC transport
errors, checksum validation) that don't go through CreateEntryResponse.

* filer: remove backward-compat string fallbacks for error codes

Clients and servers are always deployed together, so there is no need
for backward-compatibility fallback paths that parse resp.Error strings
when resp.ErrorCode is unset. Simplify all consumers to rely solely on
the structured error code.

* iceberg: ensure unknown non-OK error codes are not silently ignored

When FilerErrorToSentinel returns nil for an unrecognized error code,
return an error including the code and message rather than falling
through to return nil.

* filer: fix redundant error message and restore error wrapping in helper

Use request path instead of resp.Error in the sentinel error format
string to avoid duplicating the sentinel message (e.g. "entry already
exists: entry already exists"). Restore %w wrapping with errors.New()
in the fallback paths so callers can use errors.Is()/errors.As().

* filer: promote file to directory on path conflict instead of erroring

S3 allows both "foo/bar" (object) and "foo/bar/xyzzy" (another object)
to coexist because S3 has a flat key space. When ensureParentDirectoryEntry
finds a parent path that is a file instead of a directory, promote it to
a directory by setting ModeDir while preserving the original content and
chunks. Use Store.UpdateEntry directly to bypass the Filer.UpdateEntry
type-change guard.

This fixes the S3 compatibility test failures where creating overlapping
keys (e.g. "foo/bar" then "foo/bar/xyzzy") returned ExistingObjectIsFile.
2026-03-24 17:08:22 -07:00
Chris Lu
3f946fc0c0 mount: make metadata cache rebuilds snapshot-consistent (#8531)
* filer: expose metadata events and list snapshots

* mount: invalidate hot directory caches

* mount: read hot directories directly from filer

* mount: add sequenced metadata cache applier

* mount: apply metadata responses through cache applier

* mount: replay snapshot-consistent directory builds

* mount: dedupe self metadata events

* mount: factor directory build cleanup

* mount: replace proto marshal dedup with composite key and ring buffer

The dedup logic was doing a full deterministic proto.Marshal on every
metadata event just to produce a dedup key. Replace with a cheap
composite string key (TsNs|Directory|OldName|NewName).

Also replace the sliding-window slice (which leaked the backing array
unboundedly) with a fixed-size ring buffer that reuses the same array.

* filer: remove mutex and proto.Clone from request-scoped MetadataEventSink

MetadataEventSink is created per-request and only accessed by the
goroutine handling the gRPC call. The mutex and double proto.Clone
(once in Record, once in Last) were unnecessary overhead on every
filer write operation. Store the pointer directly instead.

* mount: skip proto.Clone for caller-owned metadata events

Add ApplyMetadataResponseOwned that takes ownership of the response
without cloning. Local metadata events (mkdir, create, flush, etc.)
are freshly constructed and never shared, so the clone is unnecessary.

* filer: only populate MetadataEvent on successful DeleteEntry

Avoid calling eventSink.Last() on error paths where the sink may
contain a partial event from an intermediate child deletion during
recursive deletes.

* mount: avoid map allocation in collectDirectoryNotifications

Replace the map with a fixed-size array and linear dedup. There are
at most 3 directories to notify (old parent, new parent, new child
if directory), so a 3-element array avoids the heap allocation on
every metadata event.

* mount: fix potential deadlock in enqueueApplyRequest

Release applyStateMu before the blocking channel send. Previously,
if the channel was full (cap 128), the send would block while holding
the mutex, preventing Shutdown from acquiring it to set applyClosed.

* mount: restore signature-based self-event filtering as fast path

Re-add the signature check that was removed when content-based dedup
was introduced. Checking signatures is O(1) on a small slice and
avoids enqueuing and processing events that originated from this
mount instance. The content-based dedup remains as a fallback.

* filer: send snapshotTsNs only in first ListEntries response

The snapshot timestamp is identical for every entry in a single
ListEntries stream. Sending it in every response message wastes
wire bandwidth for large directories. The client already reads
it only from the first response.

* mount: exit read-through mode after successful full directory listing

MarkDirectoryRefreshed was defined but never called, so directories
that entered read-through mode (hot invalidation threshold) stayed
there permanently, hitting the filer on every readdir even when cold.
Call it after a complete read-through listing finishes.

* mount: include event shape and full paths in dedup key

The previous dedup key only used Names, which could collapse distinct
rename targets. Include the event shape (C/D/U/R), source directory,
new parent path, and both entry names so structurally different events
are never treated as duplicates.

* mount: drain pending requests on shutdown in runApplyLoop

After receiving the shutdown sentinel, drain any remaining requests
from applyCh non-blockingly and signal each with errMetaCacheClosed
so callers waiting on req.done are released.

* mount: include IsDirectory in synthetic delete events

metadataDeleteEvent now accepts an isDirectory parameter so the
applier can distinguish directory deletes from file deletes. Rmdir
passes true, Unlink passes false.

* mount: fall back to synthetic event when MetadataEvent is nil

In mknod and mkdir, if the filer response omits MetadataEvent (e.g.
older filer without the field), synthesize an equivalent local
metadata event so the cache is always updated.

* mount: make Flush metadata apply best-effort after successful commit

After filer_pb.CreateEntryWithResponse succeeds, the entry is
persisted. Don't fail the Flush syscall if the local metadata cache
apply fails — log and invalidate the directory cache instead.
Also fall back to a synthetic event when MetadataEvent is nil.

* mount: make Rename metadata apply best-effort

The rename has already succeeded on the filer by the time we apply
the local metadata event. Log failures instead of returning errors
that would be dropped by the caller anyway.

* mount: make saveEntry metadata apply best-effort with fallback

After UpdateEntryWithResponse succeeds, treat local metadata apply
as non-fatal. Log and invalidate the directory cache on failure.
Also fall back to a synthetic event when MetadataEvent is nil.

* filer_pb: preserve snapshotTsNs on error in ReadDirAllEntriesWithSnapshot

Return the snapshot timestamp even when the first page fails, so
callers receive the snapshot boundary when partial data was received.

* filer: send snapshot token for empty directory listings

When no entries are streamed, send a final ListEntriesResponse with
only SnapshotTsNs so clients always receive the snapshot boundary.

* mount: distinguish not-found vs transient errors in lookupEntry

Return fuse.EIO for non-not-found filer errors instead of
unconditionally returning ENOENT, so transient failures don't
masquerade as missing entries.

* mount: make CacheRemoteObject metadata apply best-effort

The file content has already been cached successfully. Don't fail
the read if the local metadata cache update fails.

* mount: use consistent snapshot for readdir in direct mode

Capture the SnapshotTsNs from the first loadDirectoryEntriesDirect
call and store it on the DirectoryHandle. Subsequent batch loads
pass this stored timestamp so all batches use the same snapshot.

Also export DoSeaweedListWithSnapshot so mount can use it directly
with snapshot passthrough.

* filer_pb: fix test fake to send SnapshotTsNs only on first response

Match the server behavior: only the first ListEntriesResponse in a
page carries the snapshot timestamp, subsequent entries leave it zero.

* Fix nil pointer dereference in ListEntries stream consumers

Remove the empty-directory snapshot-only response from ListEntries
that sent a ListEntriesResponse with Entry==nil, which crashed every
raw stream consumer that assumed resp.Entry is always non-nil.

Also add defensive nil checks for resp.Entry in all raw ListEntries
stream consumers across: S3 listing, broker topic lookup, broker
topic config, admin dashboard, topic retention, hybrid message
scanner, Kafka integration, and consumer offset storage.

* Add nil guards for resp.Entry in remaining ListEntries stream consumers

Covers: S3 object lock check, MQ management dashboard (version/
partition/offset loops), and topic retention version loop.

* Make applyLocalMetadataEvent best-effort in Link and Symlink

The filer operations already succeeded; failing the syscall because
the local cache apply failed is wrong. Log a warning and invalidate
the parent directory cache instead.

* Make applyLocalMetadataEvent best-effort in Mkdir/Rmdir/Mknod/Unlink

The filer RPC already committed; don't fail the syscall when the
local metadata cache apply fails. Log a warning and invalidate the
parent directory cache to force a re-fetch on next access.

* flushFileMetadata: add nil-fallback for metadata event and best-effort apply

Synthesize a metadata event when resp.GetMetadataEvent() is nil
(matching doFlush), and make the apply best-effort with cache
invalidation on failure.

* Prevent double-invocation of cleanupBuild in doEnsureVisited

Add a cleanupDone guard so the deferred cleanup and inline error-path
cleanup don't both call DeleteFolderChildren/AbortDirectoryBuild.

* Fix comment: signature check is O(n) not O(1)

* Prevent deferred cleanup after successful CompleteDirectoryBuild

Set cleanupDone before returning from the success path so the
deferred context-cancellation check cannot undo a published build.

* Invalidate parent directory caches on rename metadata apply failure

When applyLocalMetadataEvent fails during rename, invalidate the
source and destination parent directory caches so subsequent accesses
trigger a re-fetch from the filer.

* Add event nil-fallback and cache invalidation to Link and Symlink

Synthesize metadata events when the server doesn't return one, and
invalidate parent directory caches on apply failure.

* Match requested partition when scanning partition directories

Parse the partition range format (NNNN-NNNN) and match against the
requested partition parameter instead of using the first directory.

* Preserve snapshot timestamp across empty directory listings

Initialize actualSnapshotTsNs from the caller-requested value so it
isn't lost when the server returns no entries. Re-add the server-side
snapshot-only response for empty directories (all raw stream consumers
now have nil guards for Entry).

* Fix CreateEntry error wrapping to support errors.Is/errors.As

Use errors.New + %w instead of %v for resp.Error so callers can
unwrap the underlying error.

* Fix object lock pagination: only advance on non-nil entries

Move entriesReceived inside the nil check so nil entries don't
cause repeated ListEntries calls with the same lastFileName.

* Guard Attributes nil check before accessing Mtime in MQ management

* Do not send nil-Entry response for empty directory listings

The snapshot-only ListEntriesResponse (with Entry == nil) for empty
directories breaks consumers that treat any received response as an
entry (Java FilerClient, S3 listing). The Go client-side
DoSeaweedListWithSnapshot already preserves the caller-requested
snapshot via actualSnapshotTsNs initialization, so the server-side
send is unnecessary.

* Fix review findings: subscriber dedup, invalidation normalization, nil guards, shutdown race

- Remove self-signature early-return in processEventFn so all events
  flow through the applier (directory-build buffering sees self-originated
  events that arrive after a snapshot)
- Normalize NewParentPath in collectEntryInvalidations to avoid duplicate
  invalidations when NewParentPath is empty (same-directory update)
- Guard resp.Entry.Attributes for nil in admin_server.go and
  topic_retention.go to prevent panics on entries without attributes
- Fix enqueueApplyRequest race with shutdown by using select on both
  applyCh and applyDone, preventing sends after the apply loop exits
- Add cleanupDone check to deferred cleanup in meta_cache_init.go for
  clarity alongside the existing guard in cleanupBuild
- Add empty directory test case for snapshot consistency

* Propagate authoritative metadata event from CacheRemoteObjectToLocalCluster and generate client-side snapshot for empty directories

- Add metadata_event field to CacheRemoteObjectToLocalClusterResponse
  proto so the filer-emitted event is available to callers
- Use WithMetadataEventSink in the server handler to capture the event
  from NotifyUpdateEvent and return it on the response
- Update filehandle_read.go to prefer the RPC's metadata event over
  a locally fabricated one, falling back to metadataUpdateEvent when
  the server doesn't provide one (e.g., older filers)
- Generate a client-side snapshot cutoff in DoSeaweedListWithSnapshot
  when the server sends no snapshot (empty directory), so callers like
  CompleteDirectoryBuild get a meaningful boundary for filtering
  buffered events

* Skip directory notifications for dirs being built to prevent mid-build cache wipe

When a metadata event is buffered during a directory build,
applyMetadataSideEffects was still firing noteDirectoryUpdate for the
building directory. If the directory accumulated enough updates to
become "hot", markDirectoryReadThrough would call DeleteFolderChildren,
wiping entries that EnsureVisited had already inserted. The build would
then complete and mark the directory cached with incomplete data.

Fix by using applyMetadataSideEffectsSkippingBuildingDirs for buffered
events, which suppresses directory notifications for dirs currently in
buildingDirs while still applying entry invalidations.

* Add test for directory notification suppression during active build

TestDirectoryNotificationsSuppressedDuringBuild verifies that metadata
events targeting a directory under active EnsureVisited build do NOT
fire onDirectoryUpdate for that directory. In production, this prevents
markDirectoryReadThrough from calling DeleteFolderChildren mid-build,
which would wipe entries already inserted by the listing.

The test inserts an entry during a build, sends multiple metadata events
for the building directory, asserts no notifications fired for it,
verifies the entry survives, and confirms buffered events are replayed
after CompleteDirectoryBuild.

* Fix create invalidations, build guard, event shape, context, and snapshot error path

- collectEntryInvalidations: invalidate FUSE kernel cache on pure
  create events (OldEntry==nil && NewEntry!=nil), not just updates
  and deletes
- completeDirectoryBuildNow: only call markCachedFn when an active
  build existed (state != nil), preventing an unpopulated directory
  from being marked as cached
- Add metadataCreateEvent helper that produces a create-shaped event
  (NewEntry only, no OldEntry) and use it in mkdir, mknod, symlink,
  and hardlink create fallback paths instead of metadataUpdateEvent
  which incorrectly set both OldEntry and NewEntry
- applyMetadataResponseEnqueue: use context.Background() for the
  queued mutation so a cancelled caller context cannot abort the
  apply loop mid-write
- DoSeaweedListWithSnapshot: move snapshot initialization before
  ListEntries call so the error path returns the preserved snapshot
  instead of 0

* Fix review findings: test loop, cache race, context safety, snapshot consistency

- Fix build test loop starting at i=1 instead of i=0, missing new-0.txt verification
- Re-check IsDirectoryCached after cache miss to avoid ENOENT race with markDirectoryReadThrough
- Use context.Background() in enqueueAndWait so caller cancellation can't abort build/complete mid-way
- Pass dh.snapshotTsNs in skip-batch loadDirectoryEntriesDirect for snapshot consistency
- Prefer resp.MetadataEvent over fallback in Unlink event derivation
- Add comment on MetadataEventSink.Record single-event assumption

* Fix empty-directory snapshot clock skew and build cancellation race

Empty-directory snapshot: Remove client-side time.Now() synthesis when
the server returns no entries. Instead return snapshotTsNs=0, and in
completeDirectoryBuildNow replay ALL buffered events when snapshot is 0.
This eliminates the clock-skew bug where a client ahead of the filer
would filter out legitimate post-list events.

Build cancellation: Use context.Background() for BeginDirectoryBuild
and CompleteDirectoryBuild calls in doEnsureVisited, so errgroup
cancellation doesn't cause enqueueAndWait to return early and trigger
cleanupBuild while the operation is still queued.

* Add tests for empty-directory build replay and cancellation resilience

TestEmptyDirectoryBuildReplaysAllBufferedEvents: verifies that when
CompleteDirectoryBuild receives snapshotTsNs=0 (empty directory, no
server snapshot), ALL buffered events are replayed regardless of their
TsNs values — no clock-skew-sensitive filtering occurs.

TestBuildCompletionSurvivesCallerCancellation: verifies that once
CompleteDirectoryBuild is enqueued, a cancelled caller context does not
prevent the build from completing. The apply loop runs with
context.Background(), so the directory becomes cached and buffered
events are replayed even when the caller gives up waiting.

* Fix directory subtree cleanup, Link rollback, test robustness

- applyMetadataResponseLocked: when a directory entry is deleted or
  moved, call DeleteFolderChildren on the old path so cached descendants
  don't leak as stale entries.

- Link: save original HardLinkId/Counter before mutation. If
  CreateEntryWithResponse fails after the source was already updated,
  rollback the source entry to its original state via UpdateEntry.

- TestBuildCompletionSurvivesCallerCancellation: replace fixed
  time.Sleep(50ms) with a deadline-based poll that checks
  IsDirectoryCached in a loop, failing only after 2s timeout.

- TestReadDirAllEntriesWithSnapshotEmptyDirectory: assert that
  ListEntries was actually invoked on the mock client so the test
  exercises the RPC path.

- newMetadataEvent: add early return when both oldEntry and newEntry are
  nil to avoid emitting events with empty Directory.

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-07 09:19:40 -08:00
Chris Lu
ca84a8a713 S3: Directly read write volume servers (#7481)
* Lazy Versioning Check, Conditional SSE Entry Fetch, HEAD Request Optimization

* revert

Reverted the conditional versioning check to always check versioning status
Reverted the conditional SSE entry fetch to always fetch entry metadata
Reverted the conditional versioning check to always check versioning status
Reverted the conditional SSE entry fetch to always fetch entry metadata

* Lazy Entry Fetch for SSE, Skip Conditional Header Check

* SSE-KMS headers are present, this is not an SSE-C request (mutually exclusive)

* SSE-C is mutually exclusive with SSE-S3 and SSE-KMS

* refactor

* Removed Premature Mutual Exclusivity Check

* check for the presence of the X-Amz-Server-Side-Encryption header

* not used

* fmt

* directly read write volume servers

* HTTP Range Request Support

* set header

* md5

* copy object

* fix sse

* fmt

* implement sse

* sse continue

* fixed the suffix range bug (bytes=-N for "last N bytes")

* debug logs

* Missing PartsCount Header

* profiling

* url encoding

* test_multipart_get_part

* headers

* debug

* adjust log level

* handle part number

* Update s3api_object_handlers.go

* nil safety

* set ModifiedTsNs

* remove

* nil check

* fix sse header

* same logic as filer

* decode values

* decode ivBase64

* s3: Fix SSE decryption JWT authentication and streaming errors

Critical fix for SSE (Server-Side Encryption) test failures:

1. **JWT Authentication Bug** (Root Cause):
   - Changed from GenJwtForFilerServer to GenJwtForVolumeServer
   - S3 API now uses correct JWT when directly reading from volume servers
   - Matches filer's authentication pattern for direct volume access
   - Fixes 'unexpected EOF' and 500 errors in SSE tests

2. **Streaming Error Handling**:
   - Added error propagation in getEncryptedStreamFromVolumes goroutine
   - Use CloseWithError() to properly communicate stream failures
   - Added debug logging for streaming errors

3. **Response Header Timing**:
   - Removed premature WriteHeader(http.StatusOK) call
   - Let Go's http package write status automatically on first write
   - Prevents header lock when errors occur during streaming

4. **Enhanced SSE Decryption Debugging**:
   - Added IV/Key validation and logging for SSE-C, SSE-KMS, SSE-S3
   - Better error messages for missing or invalid encryption metadata
   - Added glog.V(2) debugging for decryption setup

This fixes SSE integration test failures where encrypted objects
could not be retrieved due to volume server authentication failures.
The JWT bug was causing volume servers to reject requests, resulting
in truncated/empty streams (EOF) or internal errors.

* s3: Fix SSE multipart upload metadata preservation

Critical fix for SSE multipart upload test failures (SSE-C and SSE-KMS):

**Root Cause - Incomplete SSE Metadata Copying**:
The old code only tried to copy 'SeaweedFSSSEKMSKey' from the first
part to the completed object. This had TWO bugs:

1. **Wrong Constant Name** (Key Mismatch Bug):
   - Storage uses: SeaweedFSSSEKMSKeyHeader = 'X-SeaweedFS-SSE-KMS-Key'
   - Old code read: SeaweedFSSSEKMSKey = 'x-seaweedfs-sse-kms-key'
   - Result: SSE-KMS metadata was NEVER copied → 500 errors

2. **Missing SSE-C and SSE-S3 Headers**:
   - SSE-C requires: IV, Algorithm, KeyMD5
   - SSE-S3 requires: encrypted key data + standard headers
   - Old code: copied nothing for SSE-C/SSE-S3 → decryption failures

**Fix - Complete SSE Header Preservation**:
Now copies ALL SSE headers from first part to completed object:

- SSE-C: SeaweedFSSSEIV, CustomerAlgorithm, CustomerKeyMD5
- SSE-KMS: SeaweedFSSSEKMSKeyHeader, AwsKmsKeyId, ServerSideEncryption
- SSE-S3: SeaweedFSSSES3Key, ServerSideEncryption

Applied consistently to all 3 code paths:
1. Versioned buckets (creates version file)
2. Suspended versioning (creates main object with null versionId)
3. Non-versioned buckets (creates main object)

**Why This Is Correct**:
The headers copied EXACTLY match what putToFiler stores during part
upload (lines 496-521 in s3api_object_handlers_put.go). This ensures
detectPrimarySSEType() can correctly identify encrypted multipart
objects and trigger inline decryption with proper metadata.

Fixes: TestSSEMultipartUploadIntegration (SSE-C and SSE-KMS subtests)

* s3: Add debug logging for versioning state diagnosis

Temporary debug logging to diagnose test_versioning_obj_plain_null_version_overwrite_suspended failure.

Added glog.V(0) logging to show:
1. setBucketVersioningStatus: when versioning status is changed
2. PutObjectHandler: what versioning state is detected (Enabled/Suspended/none)
3. PutObjectHandler: which code path is taken (putVersionedObject vs putSuspendedVersioningObject)

This will help identify if:
- The versioning status is being set correctly in bucket config
- The cache is returning stale/incorrect versioning state
- The switch statement is correctly routing to suspended vs enabled handlers

* s3: Enhanced versioning state tracing for suspended versioning diagnosis

Added comprehensive logging across the entire versioning state flow:

PutBucketVersioningHandler:
- Log requested status (Enabled/Suspended)
- Log when calling setBucketVersioningStatus
- Log success/failure of status change

setBucketVersioningStatus:
- Log bucket and status being set
- Log when config is updated
- Log completion with error code

updateBucketConfig:
- Log versioning state being written to cache
- Immediate cache verification after Set
- Log if cache verification fails

getVersioningState:
- Log bucket name and state being returned
- Log if object lock forces VersioningEnabled
- Log errors

This will reveal:
1. If PutBucketVersioning(Suspended) is reaching the handler
2. If the cache update succeeds
3. What state getVersioningState returns during PUT
4. Any cache consistency issues

Expected to show why bucket still reports 'Enabled' after 'Suspended' call.

* s3: Add SSE chunk detection debugging for multipart uploads

Added comprehensive logging to diagnose why TestSSEMultipartUploadIntegration fails:

detectPrimarySSEType now logs:
1. Total chunk count and extended header count
2. All extended headers with 'sse'/'SSE'/'encryption' in the name
3. For each chunk: index, SseType, and whether it has metadata
4. Final SSE type counts (SSE-C, SSE-KMS, SSE-S3)

This will reveal if:
- Chunks are missing SSE metadata after multipart completion
- Extended headers are copied correctly from first part
- The SSE detection logic is working correctly

Expected to show if chunks have SseType=0 (none) or proper SSE types set.

* s3: Trace SSE chunk metadata through multipart completion and retrieval

Added end-to-end logging to track SSE chunk metadata lifecycle:

**During Multipart Completion (filer_multipart.go)**:
1. Log finalParts chunks BEFORE mkFile - shows SseType and metadata
2. Log versionEntry.Chunks INSIDE mkFile callback - shows if mkFile preserves SSE info
3. Log success after mkFile completes

**During GET Retrieval (s3api_object_handlers.go)**:
1. Log retrieved entry chunks - shows SseType and metadata after retrieval
2. Log detected SSE type result

This will reveal at which point SSE chunk metadata is lost:
- If finalParts have SSE metadata but versionEntry.Chunks don't → mkFile bug
- If versionEntry.Chunks have SSE metadata but retrieved chunks don't → storage/retrieval bug
- If chunks never have SSE metadata → multipart completion SSE processing bug

Expected to show chunks with SseType=NONE during retrieval even though
they were created with proper SseType during multipart completion.

* s3: Fix SSE-C multipart IV base64 decoding bug

**Critical Bug Found**: SSE-C multipart uploads were failing because:

Root Cause:
- entry.Extended[SeaweedFSSSEIV] stores base64-encoded IV (24 bytes for 16-byte IV)
- SerializeSSECMetadata expects raw IV bytes (16 bytes)
- During multipart completion, we were passing base64 IV directly → serialization error

Error Message:
"Failed to serialize SSE-C metadata for chunk in part X: invalid IV length: expected 16 bytes, got 24"

Fix:
- Base64-decode IV before passing to SerializeSSECMetadata
- Added error handling for decode failures

Impact:
- SSE-C multipart uploads will now correctly serialize chunk metadata
- Chunks will have proper SSE metadata for decryption during GET

This fixes the SSE-C subtest of TestSSEMultipartUploadIntegration.
SSE-KMS still has a separate issue (error code 23) being investigated.

* fixes

* kms sse

* handle retry if not found in .versions folder and should read the normal object

* quick check (no retries) to see if the .versions/ directory exists

* skip retry if object is not found

* explicit update to avoid sync delay

* fix map update lock

* Remove fmt.Printf debug statements

* Fix SSE-KMS multipart base IV fallback to fail instead of regenerating

* fmt

* Fix ACL grants storage logic

* header handling

* nil handling

* range read for sse content

* test range requests for sse objects

* fmt

* unused code

* upload in chunks

* header case

* fix url

* bucket policy error vs bucket not found

* jwt handling

* fmt

* jwt in request header

* Optimize Case-Insensitive Prefix Check

* dead code

* Eliminated Unnecessary Stream Prefetch for Multipart SSE

* range sse

* sse

* refactor

* context

* fmt

* fix type

* fix SSE-C IV Mismatch

* Fix Headers Being Set After WriteHeader

* fix url parsing

* propergate sse headers

* multipart sse-s3

* aws sig v4 authen

* sse kms

* set content range

* better errors

* Update s3api_object_handlers_copy.go

* Update s3api_object_handlers.go

* Update s3api_object_handlers.go

* avoid magic number

* clean up

* Update s3api_bucket_policy_handlers.go

* fix url parsing

* context

* data and metadata both use background context

* adjust the offset

* SSE Range Request IV Calculation

* adjust logs

* IV relative to offset in each part, not the whole file

* collect logs

* offset

* fix offset

* fix url

* logs

* variable

* jwt

* Multipart ETag semantics: conditionally set object-level Md5 for single-chunk uploads only.

* sse

* adjust IV and offset

* multipart boundaries

* ensures PUT and GET operations return consistent ETags

* Metadata Header Case

* CommonPrefixes Sorting with URL Encoding

* always sort

* remove the extra PathUnescape call

* fix the multipart get part ETag

* the FileChunk is created without setting ModifiedTsNs

* Sort CommonPrefixes lexicographically to match AWS S3 behavior

* set md5 for multipart uploads

* prevents any potential data loss or corruption in the small-file inline storage path

* compiles correctly

* decryptedReader will now be properly closed after use

* Fixed URL encoding and sort order for CommonPrefixes

* Update s3api_object_handlers_list.go

* SSE-x Chunk View Decryption

* Different IV offset calculations for single-part vs multipart objects

* still too verbose in logs

* less logs

* ensure correct conversion

* fix listing

* nil check

* minor fixes

* nil check

* single character delimiter

* optimize

* range on empty object or zero-length

* correct IV based on its position within that part, not its position in the entire object

* adjust offset

* offset

Fetch FULL encrypted chunk (not just the range)
Adjust IV by PartOffset/ChunkOffset only
Decrypt full chunk
Skip in the DECRYPTED stream to reach OffsetInChunk

* look breaking

* refactor

* error on no content

* handle intra-block byte skipping

* Incomplete HTTP Response Error Handling

* multipart SSE

* Update s3api_object_handlers.go

* address comments

* less logs

* handling directory

* Optimized rejectDirectoryObjectWithoutSlash() to avoid unnecessary lookups

* Revert "handling directory"

This reverts commit 3a335f0ac33c63f51975abc63c40e5328857a74b.

* constant

* Consolidate nil entry checks in GetObjectHandler

* add range tests

* Consolidate redundant nil entry checks in HeadObjectHandler

* adjust logs

* SSE type

* large files

* large files

Reverted the plain-object range test

* ErrNoEncryptionConfig

* Fixed SSERangeReader Infinite Loop Vulnerability

* Fixed SSE-KMS Multipart ChunkReader HTTP Body Leak

* handle empty directory in S3, added PyArrow tests

* purge unused code

* Update s3_parquet_test.py

* Update requirements.txt

* According to S3 specifications, when both partNumber and Range are present, the Range should apply within the selected part's boundaries, not to the full object.

* handle errors

* errors after writing header

* https

* fix: Wait for volume assignment readiness before running Parquet tests

The test-implicit-dir-with-server test was failing with an Internal Error
because volume assignment was not ready when tests started. This fix adds
a check that attempts a volume assignment and waits for it to succeed
before proceeding with tests.

This ensures that:
1. Volume servers are registered with the master
2. Volume growth is triggered if needed
3. The system can successfully assign volumes for writes

Fixes the timeout issue where boto3 would retry 4 times and fail with
'We encountered an internal error, please try again.'

* sse tests

* store derived IV

* fix: Clean up gRPC ports between tests to prevent port conflicts

The second test (test-implicit-dir-with-server) was failing because the
volume server's gRPC port (18080 = VOLUME_PORT + 10000) was still in use
from the first test. The cleanup code only killed HTTP port processes,
not gRPC port processes.

Added cleanup for gRPC ports in all stop targets:
- Master gRPC: MASTER_PORT + 10000 (19333)
- Volume gRPC: VOLUME_PORT + 10000 (18080)
- Filer gRPC: FILER_PORT + 10000 (18888)

This ensures clean state between test runs in CI.

* add import

* address comments

* docs: Add placeholder documentation files for Parquet test suite

Added three missing documentation files referenced in test/s3/parquet/README.md:

1. TEST_COVERAGE.md - Documents 43 total test cases (17 Go unit tests,
   6 Python integration tests, 20 Python end-to-end tests)

2. FINAL_ROOT_CAUSE_ANALYSIS.md - Explains the s3fs compatibility issue
   with PyArrow, the implicit directory problem, and how the fix works

3. MINIO_DIRECTORY_HANDLING.md - Compares MinIO's directory handling
   approach with SeaweedFS's implementation

Each file contains:
- Title and overview
- Key technical details relevant to the topic
- TODO sections for future expansion

These placeholder files resolve the broken README links and provide
structure for future detailed documentation.

* clean up if metadata operation failed

* Update s3_parquet_test.py

* clean up

* Update Makefile

* Update s3_parquet_test.py

* Update Makefile

* Handle ivSkip for non-block-aligned offsets

* Update README.md

* stop volume server faster

* stop volume server in 1 second

* different IV for each chunk in SSE-S3 and SSE-KMS

* clean up if fails

* testing upload

* error propagation

* fmt

* simplify

* fix copying

* less logs

* endian

* Added marshaling error handling

* handling invalid ranges

* error handling for adding to log buffer

* fix logging

* avoid returning too quickly and ensure proper cleaning up

* Activity Tracking for Disk Reads

* Cleanup Unused Parameters

* Activity Tracking for Kafka Publishers

* Proper Test Error Reporting

* refactoring

* less logs

* less logs

* go fmt

* guard it with if entry.Attributes.TtlSec > 0 to match the pattern used elsewhere.

* Handle bucket-default encryption config errors explicitly for multipart

* consistent activity tracking

* obsolete code for s3 on filer read/write handlers

* Update weed/s3api/s3api_object_handlers_list.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 23:18:35 -08:00
Konstantin Lebedev
084b377f87 do delete expired entries on s3 list request (#7426)
* do delete expired entries on s3 list request
https://github.com/seaweedfs/seaweedfs/issues/6837

* disable delete expires s3 entry in filer

* pass opt allowDeleteObjectsByTTL to all servers

* delete on get and head

* add lifecycle expiration s3 tests

* fix opt allowDeleteObjectsByTTL for server

* fix test lifecycle expiration

* fix IsExpired

* fix locationPrefix for updateEntriesTTL

* fix s3tests

* resolv  coderabbitai

* GetS3ExpireTime on filer

* go mod

* clear TtlSeconds for volume

* move s3 delete expired entry to filer

* filer delete meta and data

* del unusing func removeExpiredObject

* test s3 put

* test s3 put multipart

* allowDeleteObjectsByTTL by default

* fix pipline tests

* rm dublicate SeaweedFSExpiresS3

* revert expiration tests

* fix updateTTL

* rm log

* resolv comment

* fix delete version object

* fix S3Versioning

* fix delete on FindEntry

* fix delete chunks

* fix sqlite not support concurrent writes/reads

* move deletion out of listing transaction; delete entries and empty folders

* Revert "fix sqlite not support concurrent writes/reads"

This reverts commit 5d5da14e0ed91c613fe5c0ed058f58bb04fba6f0.

* clearer handling on recursive empty directory deletion

* handle listing errors

* strut copying

* reuse code to delete empty folders

* use iterative approach with a queue to avoid recursive WithFilerClient calls

* stop a gRPC stream from the client-side callback is to return a specific error, e.g., io.EOF

* still issue UpdateEntry when the flag must be added

* errors join

* join path

* cleaner

* add context, sort directories by depth (deepest first) to avoid redundant checks

* batched operation, refactoring

* prevent deleting bucket

* constant

* reuse code

* more logging

* refactoring

* s3 TTL time

* Safety check

---------

Co-authored-by: chrislu <chris.lu@gmail.com>
2025-11-05 22:05:54 -08:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
Aleksey Kosov
4511c2cc1f Changes logging function (#6919)
* updated logging methods for stores

* updated logging methods for stores

* updated logging methods for filer

* updated logging methods for uploader and http_util

* updated logging methods for weed server

---------

Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-06-24 08:44:06 -07:00
Aleksey Kosov
165af32d6b added context to filer_client method calls (#6808)
Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-05-22 09:46:49 -07:00
Konstantin Lebedev
d75a7b7f62 allow deleting only older empty dir without recursion (#4430) 2023-04-25 08:31:14 -07:00
chrislu
70a4c98b00 refactor filer_pb.Entry and filer.Entry to use GetChunks()
for later locking on reading chunks
2022-11-15 06:33:36 -08:00
chrislu
eaeb141b09 move proto package 2022-08-17 12:05:07 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
Konstantin Lebedev
c07820178f fix s3 tests
bucket_list_delimiter_prefix
bucket_list_delimiter_prefix_underscore
bucket_list_delimiter_prefix_ends_with_delimiter
2022-06-07 14:43:10 +05:00
chrislu
202a29d014 refactoring 2022-02-25 01:17:26 -08:00
chrislu
91d6785cf3 define metadata action types 2022-02-25 00:54:16 -08:00
chrislu
fec8428fd8 POSIX: different inode for same named different file types 2022-01-12 11:51:13 -08:00
Chris Lu
05a648bb96 refactor: separating out remote.proto 2021-08-26 15:18:34 -07:00
Chris Lu
69655ba8e5 mount: cache on reading remote storage 2021-08-09 22:11:57 -07:00
Chris Lu
9df7d16791 read <- remote_storage 2021-07-31 22:39:38 -07:00
Chris Lu
9acda432fe fix import cycle 2020-12-06 20:12:52 -08:00
Chris Lu
ae5eb85a06 refactoring 2020-12-06 20:05:06 -08:00
Chris Lu
ee2fa14dbe filer conf: delete location specific configuration 2020-11-15 20:15:47 -08:00
Chris Lu
0ea5c087ce go fmt 2020-11-15 16:59:28 -08:00
Chris Lu
0a406f652e load filer conf and match by prefix 2020-11-15 00:26:05 -08:00
Chris Lu
68043cfcac add reference implementation to detect create/update/delete/rename events 2020-11-14 21:21:58 -08:00
Chris Lu
5e239afdfc hardlink works now 2020-09-24 03:06:48 -07:00
Chris Lu
387ab6796f filer: cross cluster synchronization 2020-09-09 11:21:23 -07:00
Chris Lu
208849702d logs 2020-08-18 12:52:54 -07:00
Chris Lu
003d48da21 adjust logs 2020-08-15 19:55:28 -07:00
Chris Lu
afb20de14c breaks dependency loop 2020-03-07 17:01:39 -08:00
Chris Lu
8645283a7b fuse mount: avoid lookup nil entry
fix https://github.com/chrislusf/seaweedfs/issues/1221
2020-03-07 16:51:46 -08:00
Chris Lu
892e726eb9 avoid reusing context object
fix https://github.com/chrislusf/seaweedfs/issues/1182
2020-02-25 21:50:12 -08:00
Chris Lu
bd3254b53f adjust logging 2020-02-25 17:24:08 -08:00
Chris Lu
0841bedb15 move filer assign volume grpc errror to response 2020-02-25 17:15:09 -08:00
Chris Lu
c48fc8b4de grpc send error via response instead of grpc error 2020-01-25 09:17:19 -08:00
Chris Lu
1babec00e7 check deleted chunks faster 2019-06-22 13:22:22 -07:00
Chris Lu
cd45ab072a fix compilation error 2019-06-22 12:30:08 -07:00
Chris Lu
a111f26fe6 avoid nil
fix https://github.com/chrislusf/seaweedfs/issues/988
2019-06-21 20:56:27 -07:00
Chris Lu
059ef879a8 fix issue 986
fix issue 986
2019-06-21 13:06:04 -07:00
Chris Lu
82b0759493 filer: migrating filer store from persisting shorter structured file id instead of a string 2019-05-17 02:03:23 -07:00