* dlm: replace modulo hashing with consistent hash ring
Introduce HashRing with virtual nodes (CRC32-based consistent hashing)
to replace the modulo-based hashKeyToServer. When a filer node is
removed, only keys that hashed to that node are remapped to the next
server on the ring, leaving all other mappings stable. This is the
foundation for backup replication — the successor on the ring is
always the natural takeover node.
* dlm: add Generation and IsBackup fields to Lock
Lock now carries IsBackup (whether this node holds the lock as a backup
replica) and Generation (a monotonic fencing token that increments on
each fresh acquisition, stays the same on renewal). Add helper methods:
AllLocks, PromoteLock, DemoteLock, InsertBackupLock, RemoveLock, GetLock.
* dlm: add ReplicateLock RPC and generation/is_backup proto fields
Add generation field to LockResponse for fencing tokens.
Add generation and is_backup fields to Lock message.
Add ReplicateLock RPC for primary-to-backup lock replication.
Add ReplicateLockRequest/ReplicateLockResponse messages.
* dlm: add async backup replication to DistributedLockManager
Route lock/unlock via consistent hash ring's GetPrimaryAndBackup().
After a successful lock or unlock on the primary, asynchronously
replicate the operation to the backup server via ReplicateFunc
callback. Single-server deployments skip replication.
* dlm: add ReplicateLock handler and backup-aware topology changes
Add ReplicateLock gRPC handler for primary-to-backup replication.
Revise OnDlmChangeSnapshot to handle three cases on topology change:
- Promote backup locks when this node becomes primary
- Demote primary locks when this node becomes backup
- Transfer locks when this node is neither primary nor backup
Wire up SetupDlmReplication during filer server initialization.
* dlm: expose generation fencing token in lock client
LiveLock now captures the generation from LockResponse and exposes it
via Generation() method. Consumers can use this as a fencing token to
detect stale lock holders.
* dlm: update empty folder cleaner to use consistent hash ring
Replace local modulo-based hashKeyToServer with LockRing.GetPrimary()
which uses the shared consistent hash ring for folder ownership.
* dlm: add unit tests for consistent hash ring
Test basic operations, consistency on server removal (only keys from
removed server move), backup-is-successor property (backup becomes
new primary when primary is removed), and key distribution balance.
* dlm: add integration tests for lock replication failure scenarios
Test cases:
- Primary crash with backup promotion (backup has valid token)
- Backup crash with primary continuing
- Both primary and backup crash (lock lost, re-acquirable)
- Rolling restart across all nodes
- Generation fencing token increments on new acquisition
- Replication failure (primary still works independently)
- Unlock replicates deletion to backup
- Lock survives server addition (topology change)
- Consistent hashing minimal disruption (only removed server's keys move)
* dlm: address PR review findings
1. Causal replication ordering: Add per-lock sequence number (Seq) that
increments on every mutation. Backup rejects incoming mutations with
seq <= current seq, preventing stale async replications from
overwriting newer state. Unlock replication also carries seq and is
rejected if stale.
2. Demote-after-handoff: OnDlmChangeSnapshot now transfers the lock to
the new primary first and only demotes to backup after a successful
TransferLocks RPC. If the transfer fails, the lock stays as primary
on this node.
3. SetSnapshot candidateServers leak: Replace the candidateServers map
entirely instead of appending, so removed servers don't linger.
4. TransferLocks preserves Generation and Seq: InsertLock now accepts
generation and seq parameters. After accepting a transferred lock,
the receiving node re-replicates to its backup.
5. Rolling restart test: Add re-replication step after promotion and
assert survivedCount > 0. Add TestDLM_StaleReplicationRejected.
6. Mixed-version upgrade note: Add comment on HashRing documenting that
all filer nodes must be upgraded together.
* dlm: serve renewals locally during transfer window on node join
When a new node joins and steals hash ranges from surviving nodes,
there's a window between ring update and lock transfer where the
client gets redirected to a node that doesn't have the lock yet.
Fix: if the ring says primary != self but we still hold the lock
locally (non-backup, matching token), serve the renewal/unlock here
rather than redirecting. The lock will be transferred by
OnDlmChangeSnapshot, and subsequent requests will go to the new
primary once the transfer completes.
Add tests:
- TestDLM_NodeDropAndJoin_OwnershipDisruption: measures disruption
when a node drops and a new one joins (14/100 surviving-node locks
disrupted, all handled by transfer logic)
- TestDLM_RenewalDuringTransferWindow: verifies renewal succeeds on
old primary during the transfer window
* dlm: master-managed lock ring with stabilization batching
The master now owns the lock ring membership. Instead of filers
independently reacting to individual ClusterNodeUpdate add/remove
events, the master:
1. Tracks filer membership in LockRingManager
2. Batches rapid changes with a 1-second stabilization timer
(e.g., a node drop + join within 1 second → single ring update)
3. Broadcasts the complete ring snapshot atomically via the new
LockRingUpdate message in KeepConnectedResponse
Filers receive the ring as a complete snapshot and apply it via
SetSnapshot, ensuring all filers converge to the same ring state
without intermediate churn.
This eliminates the double-churn problem where a rapid drop+join
would fire two separate ring mutations, each triggering lock
transfers and disrupting ownership on surviving nodes.
* dlm: track ring version, reject stale updates, remove dead code
SetSnapshot now takes a version parameter from the master. Stale
updates (version < current) are rejected, preventing reordered
messages from overwriting a newer ring state. Version 0 is always
accepted for bootstrap.
Remove AddServer/RemoveServer from LockRing — the ring is now
exclusively managed by the master via SetSnapshot. Remove the
candidateServers map that was only used by those methods.
* dlm: fix SelectLocks data race, advance generation on backup insert
- SelectLocks: change RLock to Lock since the function deletes map
entries, which is a write operation and causes a data race under RLock.
- InsertBackupLock: advance nextGeneration to at least the incoming
generation so that after failover promotion, new lock acquisitions
get a generation strictly greater than any replicated lock.
- Bump replication failure log from V(1) to Warningf for production
visibility.
* dlm: fix SetSnapshot race, test reliability, timer edge cases
- SetSnapshot: hold LockRing lock through both version update and
Ring.SetServers() so they're atomic. Prevents a concurrent caller
from seeing the new version but applying stale servers.
- Transfer window test: search for a key that actually moves primary
when filer4 joins, instead of relying on a fixed key that may not.
- renewLock redirect: pass the existing token to the new primary
instead of empty string, so redirected renewals work correctly.
- scheduleBroadcast: check timer.Stop() return value. If the timer
already fired, the callback picks up latest state.
- FlushPending: only broadcast if timer.Stop() returns true (timer
was still pending). If false, the callback is already running.
- Fix test comment: "idempotent" → "accepted, state-changing".
* dlm: use wall-clock nanoseconds for lock ring version
The lock ring version was an in-memory counter that reset to 0 on
master restart. A filer that had seen version 5 would reject version 1
from the restarted master.
Fix: use time.Now().UnixNano() as the version. This survives master
restarts without persistence — the restarted master produces a
version greater than any pre-restart value.
* dlm: treat expired lock owners as missing
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* dlm: reject stale lock transfers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* dlm: order replication by generation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* dlm: bootstrap lock ring on reconnect
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* filer: add StreamMutateEntry bidi streaming RPC
Add a bidirectional streaming RPC that carries all filer mutation
types (create, update, delete, rename) over a single ordered stream.
This eliminates per-request connection overhead for pipelined
operations and guarantees mutation ordering within a stream.
The server handler delegates each request to the existing unary
handlers (CreateEntry, UpdateEntry, DeleteEntry) and uses a proxy
stream adapter for rename operations to reuse StreamRenameEntry logic.
The is_last field signals completion for multi-response operations
(rename sends multiple events per request; create/update/delete
always send exactly one response with is_last=true).
* mount: add streaming mutation multiplexer (streamMutateMux)
Implement a client-side multiplexer that routes all filer mutation RPCs
(create, update, delete, rename) over a single bidirectional gRPC
stream. Multiple goroutines submit requests through a send channel;
a dedicated sendLoop serializes them on the stream; a recvLoop
dispatches responses to waiting callers via per-request channels.
Key features:
- Lazy stream opening on first use
- Automatic reconnection on stream failure
- Permanent fallback to unary RPCs if filer returns Unimplemented
- Monotonic request_id for response correlation
- Multi-response support for rename operations (is_last signaling)
The mux is initialized on WFS and closed during unmount cleanup.
No call sites use it yet — wiring comes in subsequent commits.
* mount: route CreateEntry and UpdateEntry through streaming mux
Wire all CreateEntry call sites to use wfs.streamCreateEntry() which
routes through the StreamMutateEntry stream when available, falling
back to unary RPCs otherwise. Also wire Link's UpdateEntry calls
through wfs.streamUpdateEntry().
Updated call sites:
- flushMetadataToFiler (file flush after write)
- Mkdir (directory creation)
- Symlink (symbolic link creation)
- createRegularFile non-deferred path (Mknod)
- flushFileMetadata (periodic metadata flush)
- Link (hard link: update source + create link + rollback)
* mount: route UpdateEntry and DeleteEntry through streaming mux
Wire remaining mutation call sites through the streaming mux:
- saveEntry (Setattr/chmod/chown/utimes) → streamUpdateEntry
- Unlink → streamDeleteEntry (replaces RemoveWithResponse)
- Rmdir → streamDeleteEntry (replaces RemoveWithResponse)
All filer mutations except Rename now go through StreamMutateEntry
when the filer supports it, with automatic unary RPC fallback.
* mount: route Rename through streaming mux
Wire Rename to use streamMutate.Rename() when available, with
fallback to the existing StreamRenameEntry unary stream.
The streaming mux sends rename as a StreamRenameEntryRequest oneof
variant. The server processes it through the existing rename logic
and sends multiple StreamRenameEntryResponse events (one per moved
entry), with is_last=true on the final response.
All filer mutations now go through a single ordered stream.
* mount: fix stream mux connection ownership
WithGrpcClient(streamingMode=true) closes the gRPC connection when
the callback returns, destroying the stream. Own the connection
directly via pb.GrpcDial so it stays alive for the stream's lifetime.
Close it explicitly in recvLoop on stream failure and in Close on
shutdown.
* mount: fix rename failure for deferred-create files
Three fixes for rename operations over the streaming mux:
1. lookupEntry: fall back to local metadata store when filer returns
"not found" for entries in uncached directories. Files created with
deferFilerCreate=true exist only in the local leveldb store until
flushed; lookupEntry skipped the local store when the parent
directory had never been readdir'd, causing rename to fail with
ENOENT.
2. Rename: wait for pending async flushes and force synchronous flush
of dirty metadata before sending rename to the filer. Covers the
writebackCache case where close() defers the flush to a background
worker that may not complete before rename fires.
3. StreamMutateEntry: propagate rename errors from server to client.
Add error/errno fields to StreamMutateEntryResponse so the mount
can map filer errors to correct FUSE status codes instead of
silently returning OK. Also fix the existing Rename error handler
which could return fuse.OK on unrecognized errors.
* mount: fix streaming mux error handling, sendLoop lifecycle, and fallback
Address PR review comments:
1. Server: populate top-level Error/Errno on StreamMutateEntryResponse for
create/update/delete errors, not just rename. Previously update errors
were silently dropped and create/delete errors were only in nested
response fields that the client didn't check.
2. Client: check nested error fields in CreateEntry (ErrorCode, Error)
and DeleteEntry (Error) responses, matching CreateEntryWithResponse
behavior.
3. Fix sendLoop lifecycle: give each stream generation a stopSend channel.
recvLoop closes it on error to stop the paired sendLoop. Previously a
reconnect left the old sendLoop draining sendCh, breaking ordering.
4. Transparent fallback: stream helpers and doRename fall back to unary
RPCs on transport errors (ErrStreamTransport), including the first
Unimplemented from ensureStream. Previously the first call failed
instead of degrading.
5. Filer rotation in openStream: try all filer addresses on dial failure,
matching WithFilerClient behavior. Stop early on Unimplemented.
6. Pass metadata-bearing context to StreamMutateEntry RPC call so
sw-client-id header is actually sent.
7. Gate lookupEntry local-cache fallback on open dirty handle or pending
async flush to avoid resurrecting deleted/renamed entries.
8. Remove dead code in flushFileMetadata (err=nil followed by if err!=nil).
9. Use string matching for rename error-to-errno mapping in the mount to
stay portable across Linux/macOS (numeric errno values differ).
* mount: make failAllPending idempotent with delete-before-close
Change failAllPending to collect pending entries into a local slice
(deleting from the sync.Map first) before closing channels. This
prevents double-close panics if called concurrently. Also remove the
unused err parameter.
* mount: add stream generation tracking and teardownStream
Introduce a generation counter on streamMutateMux that increments each
time a new stream is created. Requests carry the generation they were
enqueued for so sendLoop can reject stale requests after reconnect.
Add teardownStream(gen) which is idempotent (only acts when gen matches
current generation and stream is non-nil). Both sendLoop and recvLoop
call it on error, replacing the inline cleanup in recvLoop. sendLoop now
actively triggers teardown on send errors instead of silently exiting.
ensureStream waits for the prior generation's recvDone before creating a
new stream, ensuring all old pending waiters are failed before reconnect.
recvLoop now takes the stream, generation, and recvDone channel as
parameters to avoid accessing shared fields without the lock.
* mount: harden Close to prevent races with teardownStream
Nil out stream, cancel, and grpcConn under the lock so that any
concurrent teardownStream call from recvLoop/sendLoop becomes a no-op.
Call failAllPending before closing sendCh to unblock waiters promptly.
Guard recvDone with a nil check for the case where Close is called
before any stream was ever opened.
* mount: make errCh receive ctx-aware in doUnary and Rename
Replace the blocking <-sendReq.errCh with a select that also observes
ctx.Done(). If sendLoop exits via stopSend without consuming a buffered
request, the caller now returns ctx.Err() instead of blocking forever.
The buffered errCh (capacity 1) ensures late acknowledgements from
sendLoop don't block the sender.
* mount: fix sendLoop/Close race and recvLoop/teardown pending channel race
Three related fixes:
1. Stop closing sendCh in Close(). Closing the shared producer channel
races with callers who passed ensureStream() but haven't sent yet,
causing send-on-closed-channel panics. sendCh is now left open;
ensureStream checks m.closed to reject new callers.
2. Drain buffered sendCh items on shutdown. sendLoop defers drainSendCh()
on exit so buffered requests get an ErrStreamTransport on their errCh
instead of blocking forever. Close() drains again for any stragglers
enqueued between sendLoop's drain and the final shutdown.
3. Move failAllPending from teardownStream into recvLoop's defer.
teardownStream (called from sendLoop on send error) was closing
pending response channels while recvLoop could be between
pending.Load and the channel send — a send-on-closed-channel panic.
recvLoop is now the sole closer of pending channels, eliminating
the race. Close() waits on recvDone (with cancel() to guarantee
Recv unblocks) so pending cleanup always completes.
* filer/mount: add debug logging for hardlink lifecycle
Add V(0) logging at every point where a HardLinkId is created, stored,
read, or deleted to trace orphaned hardlink references. Logging covers:
- gRPC server: CreateEntry/UpdateEntry when request carries HardLinkId
- FilerStoreWrapper: InsertEntry/UpdateEntry when entry has HardLinkId
- handleUpdateToHardLinks: entry path, HardLinkId, counter, chunk count
- setHardLink: KvPut with blob size
- maybeReadHardLink: V(1) on read attempt and successful decode
- DeleteHardLink: counter decrement/deletion events
- Mount Link(): when NewHardLinkId is generated and link is created
This helps diagnose how a git pack .rev file ended up with a
HardLinkId during a clone (no hard links should be involved).
* test: add git clone/pull integration test for FUSE mount
Shell script that exercises git operations on a SeaweedFS mount:
1. Creates a bare repo on the mount
2. Clones locally, makes 3 commits, pushes to mount
3. Clones from mount bare repo into an on-mount working dir
4. Verifies clone integrity (files, content, commit hashes)
5. Pushes 2 more commits with renames and deletes
6. Checks out an older revision on the mount clone
7. Returns to branch and pulls with real changes
8. Verifies file content, renames, deletes after pull
9. Checks git log integrity and clean status
27 assertions covering file existence, content, commit hashes,
file counts, renames, deletes, and git status. Run against any
existing mount: bash test-git-on-mount.sh /path/to/mount
* test: add git clone/pull FUSE integration test to CI suite
Add TestGitOperations to the existing fuse_integration test framework.
The test exercises git's full file operation surface on the mount:
1. Creates a bare repo on the mount (acts as remote)
2. Clones locally, makes 3 commits (files, bulk data, renames), pushes
3. Clones from mount bare repo into an on-mount working dir
4. Verifies clone integrity (content, commit hash, file count)
5. Pushes 2 more commits with new files, renames, and deletes
6. Checks out an older revision on the mount clone
7. Returns to branch and pulls with real fast-forward changes
8. Verifies post-pull state: content, renames, deletes, file counts
9. Checks git log integrity (5 commits) and clean status
Runs automatically in the existing fuse-integration.yml CI workflow.
* mount: fix permission check with uid/gid mapping
The permission checks in createRegularFile() and Access() compared
the caller's local uid/gid against the entry's filer-side uid/gid
without applying the uid/gid mapper. With -map.uid 501:0, a directory
created as uid 0 on the filer would not match the local caller uid
501, causing hasAccess() to fall through to "other" permission bits
and reject write access (0755 → other has r-x, no w).
Fix: map entry uid/gid from filer-space to local-space before the
hasAccess() call so both sides are in the same namespace.
This fixes rsync -a failing with "Permission denied" on mkstempat
when using uid/gid mapping.
* mount: fix Mkdir/Symlink returning filer-side uid/gid to kernel
Mkdir and Symlink used `defer wfs.mapPbIdFromFilerToLocal(entry)` to
restore local uid/gid, but `outputPbEntry` writes the kernel response
before the function returns — so the kernel received filer-side
uid/gid (e.g., 0:0). macFUSE then caches these and rejects subsequent
child operations (mkdir, create) because the caller uid (501) doesn't
match the directory owner (0), and "other" bits (0755 → r-x) lack
write permission.
Fix: replace the defer with an explicit call to mapPbIdFromFilerToLocal
before outputPbEntry, so the kernel gets local uid/gid.
Also add nil guards for UidGidMapper in Access and createRegularFile
to prevent panics in tests that don't configure a mapper.
This fixes rsync -a "Permission denied" on mkpathat for nested
directories when using uid/gid mapping.
* mount: fix Link outputting filer-side uid/gid to kernel, add nil guards
Link had the same defer-before-outputPbEntry bug as Mkdir and Symlink:
the kernel received filer-side uid/gid because the defer hadn't run
yet when outputPbEntry wrote the response.
Also add nil guards for UidGidMapper in Access and createRegularFile
so tests without a mapper don't panic.
Audit of all outputPbEntry/outputFilerEntry call sites:
- Mkdir: fixed in prior commit (explicit map before output)
- Symlink: fixed in prior commit (explicit map before output)
- Link: fixed here (explicit map before output)
- Create (existing file): entry from maybeLoadEntry (already mapped)
- Create (deferred): entry has local uid/gid (never mapped to filer)
- Create (non-deferred): createRegularFile defer runs before return
- Mknod: createRegularFile defer runs before return
- Lookup: entry from lookupEntry (already mapped)
- GetAttr: entry from maybeReadEntry/maybeLoadEntry (already mapped)
- readdir: entry from cache (mapIdFromFilerToLocal) or filer (mapped)
- saveEntry: no kernel output
- flushMetadataToFiler: no kernel output
- flushFileMetadata: no kernel output
* test: fix git test for same-filesystem FUSE clone
When both the bare repo and working clone live on the same FUSE mount,
git's local transport uses hardlinks and cross-repo stat calls that
fail on FUSE. Fix:
- Use --no-local on clone to disable local transport optimizations
- Use reset --hard instead of checkout to stay on branch
- Use fetch + reset --hard origin/<branch> instead of git pull to
avoid local transport stat failures during fetch
* adjust logging
* test: use plain git clone/pull to exercise real FUSE behavior
Remove --no-local and fetch+reset workarounds. The test should use
the same git commands users run (clone, reset --hard, pull) so it
reveals real FUSE issues rather than hiding them.
* test: enable V(1) logging for filer/mount and collect logs on failure
- Run filer and mount with -v=1 so hardlink lifecycle logs (V(0):
create/delete/insert, V(1): read attempts) are captured
- On test failure, automatically dump last 16KB of all process logs
(master, volume, filer, mount) to test output
- Copy process logs to /tmp/seaweedfs-fuse-logs/ for CI artifact upload
- Update CI workflow to upload SeaweedFS process logs alongside test output
* mount: clone entry for filer flush to prevent uid/gid race
flushMetadataToFiler and flushFileMetadata used entry.GetEntry() which
returns the file handle's live proto entry pointer, then mutated it
in-place via mapPbIdFromLocalToFiler. During the gRPC call window, a
concurrent Lookup (which takes entryLock.RLock but NOT fhLockTable)
could observe filer-side uid/gid (e.g., 0:0) on the file handle entry
and return it to the kernel. The kernel caches these attributes, so
subsequent opens by the local user (uid 501) fail with EACCES.
Fix: proto.Clone the entry before mapping uid/gid for the filer request.
The file handle's live entry is never mutated, so concurrent Lookup
always sees local uid/gid.
This fixes the intermittent "Permission denied" on .git/FETCH_HEAD
after the first git pull on a mount with uid/gid mapping.
* mount: add debug logging for stale lock file investigation
Add V(0) logging to trace the HEAD.lock recreation issue:
- Create: log when O_EXCL fails (file already exists) with uid/gid/mode
- completeAsyncFlush: log resolved path, saved path, dirtyMetadata,
isDeleted at entry to trace whether async flush fires after rename
- flushMetadataToFiler: log the dir/name/fullpath being flushed
This will show whether the async flush is recreating the lock file
after git renames HEAD.lock → HEAD.
* mount: prevent async flush from recreating renamed .lock files
When git renames HEAD.lock → HEAD, the async flush from the prior
close() can run AFTER the rename and re-insert HEAD.lock into the
meta cache via its CreateEntryRequest response event. The next git
pull then sees HEAD.lock and fails with "File exists".
Fix: add isRenamed flag on FileHandle, set by Rename before waiting
for the pending async flush. The async flush checks this flag and
skips the metadata flush for renamed files (same pattern as isDeleted
for unlinked files). The data pages still flush normally.
The Rename handler flushes deferred metadata synchronously (Case 1)
before setting isRenamed, ensuring the entry exists on the filer for
the rename to proceed. For already-released handles (Case 2), the
entry was created by a prior flush.
* mount: also mark renamed inodes via entry.Attributes.Inode fallback
When GetInode fails (Forget already removed the inode mapping), the
Rename handler couldn't find the pending async flush to set isRenamed.
The async flush then recreated the .lock file on the filer.
Fix: fall back to oldEntry.Attributes.Inode to find the pending async
flush when the inode-to-path mapping is gone. Also extract
MarkInodeRenamed into a method on FileHandleToInode for clarity.
* mount: skip async metadata flush when saved path no longer maps to inode
The isRenamed flag approach failed for refs/remotes/origin/HEAD.lock
because neither GetInode nor oldEntry.Attributes.Inode could find the
inode (Forget already evicted the mapping, and the entry's stored
inode was 0).
Add a direct check in completeAsyncFlush: before flushing metadata,
verify that the saved path still maps to this inode in the inode-to-path
table. If the path was renamed or removed (inode mismatch or not found),
skip the metadata flush to avoid recreating a stale entry.
This catches all rename cases regardless of whether the Rename handler
could set the isRenamed flag.
* mount: wait for pending async flush in Unlink before filer delete
Unlink was deleting the filer entry first, then marking the draining
async-flush handle as deleted. The async flush worker could race between
these two operations and recreate the just-unlinked entry on the filer.
This caused git's .lock files (e.g. refs/remotes/origin/HEAD.lock) to
persist after git pull, breaking subsequent git operations.
Move the isDeleted marking and add waitForPendingAsyncFlush() before
the filer delete so any in-flight flush completes first. Even if the
worker raced past the isDeleted check, the wait ensures it finishes
before the filer delete cleans up any recreated entry.
* mount: reduce async flush and metadata flush log verbosity
Raise completeAsyncFlush entry log, saved-path-mismatch skip log, and
flushMetadataToFiler entry log from V(0) to V(3)/V(4). These fire for
every file close with writebackCache and are too noisy for normal use.
* filer: reduce hardlink debug log verbosity from V(0) to V(4)
HardLinkId logs in filerstore_wrapper, filerstore_hardlink, and
filer_grpc_server fire on every hardlinked file operation (git pack
files use hardlinks extensively) and produce excessive noise.
* mount/filer: reduce noisy V(0) logs for link, rmdir, and empty folder check
- weedfs_link.go: hardlink creation logs V(0) → V(4)
- weedfs_dir_mkrm.go: non-empty folder rmdir error V(0) → V(1)
- empty_folder_cleaner.go: "not empty" check log V(0) → V(4)
* filer: handle missing hardlink KV as expected, not error
A "kv: not found" on hardlink read is normal when the link blob was
already cleaned up but a stale entry still references it. Log at V(1)
for not-found; keep Error level for actual KV failures.
* test: add waitForDir before git pull in FUSE git operations test
After git reset --hard, the FUSE mount's metadata cache may need a
moment to settle on slow CI. The git pull subprocess (unpack-objects)
could fail to stat the working directory. Poll for up to 5s.
* Update git_operations_test.go
* wait
* test: simplify FUSE test framework to use weed mini
Replace the 4-process setup (master + volume + filer + mount) with
2 processes: "weed mini" (all-in-one) + "weed mount". This simplifies
startup, reduces port allocation, and is faster on CI.
* test: fix mini flag -admin → -admin.ui
* Add IAM gRPC service definition
- Add GetConfiguration/PutConfiguration for config management
- Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management
- Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management
- Methods mirror existing IAM HTTP API functionality
* Add IAM gRPC handlers on filer server
- Implement IamGrpcServer with CredentialManager integration
- Handle configuration get/put operations
- Handle user CRUD operations
- Handle access key create/delete operations
- All methods delegate to CredentialManager for actual storage
* Wire IAM gRPC service to filer server
- Add CredentialManager field to FilerOption and FilerServer
- Import credential store implementations in filer command
- Initialize CredentialManager from credential.toml if available
- Register IAM gRPC service on filer gRPC server
- Enable credential management via gRPC alongside existing filer services
* Regenerate IAM protobuf with gRPC service methods
* iam_pb: add Policy Management to protobuf definitions
* credential: implement PolicyManager in credential stores
* filer: implement IAM Policy Management RPCs
* shell: add s3.policy command
* test: add integration test for s3.policy
* test: fix compilation errors in policy_test
* pb
* fmt
* test
* weed shell: add -policies flag to s3.configure
This allows linking/unlinking IAM policies to/from identities
directly from the s3.configure command.
* test: verify s3.configure policy linking and fix port allocation
- Added test case for linking policies to users via s3.configure
- Implemented findAvailablePortPair to ensure HTTP and gRPC ports
are both available, avoiding conflicts with randomized port assignments.
- Updated assertion to match jsonpb output (policyNames)
* credential: add StoreTypeGrpc constant
* credential: add IAM gRPC store boilerplate
* credential: implement identity methods in gRPC store
* credential: implement policy methods in gRPC store
* admin: use gRPC credential store for AdminServer
This ensures that all IAM and policy changes made through the Admin UI
are persisted via the Filer's IAM gRPC service instead of direct file manipulation.
* shell: s3.configure use granular IAM gRPC APIs instead of full config patching
* shell: s3.configure use granular IAM gRPC APIs
* shell: replace deprecated ioutil with os in s3.policy
* filer: use gRPC FailedPrecondition for unconfigured credential manager
* test: improve s3.policy integration tests and fix error checks
* ci: add s3 policy shell integration tests to github workflow
* filer: fix LoadCredentialConfiguration error handling
* credential/grpc: propagate unmarshal errors in GetPolicies
* filer/grpc: improve error handling and validation
* shell: use gRPC status codes in s3.configure
* credential: document PutPolicy as create-or-replace
* credential/postgres: reuse CreatePolicy in PutPolicy to deduplicate logic
* shell: add timeout context and strictly enforce flags in s3.policy
* iam: standardize policy content field naming in gRPC and proto
* shell: extract slice helper functions in s3.configure
* filer: map credential store errors to gRPC status codes
* filer: add input validation for UpdateUser and CreateAccessKey
* iam: improve validation in policy and config handlers
* filer: ensure IAM service registration by defaulting credential manager
* credential: add GetStoreName method to manager
* test: verify policy deletion in integration test
* Implement Policy Attachment support for Object Store Users
- Added policy_names field to iam.proto and regenerated protos.
- Updated S3 API and IAM integration to support direct policy evaluation for users.
- Enhanced Admin UI to allow attaching policies to users via modals.
- Renamed 'policies' to 'policy_names' to clarify that it stores identifiers.
- Fixed syntax error in user_management.go.
* Fix policy dropdown not populating
The API returns {policies: [...]} but JavaScript was treating response as direct array.
Updated loadPolicies() to correctly access data.policies property.
* Add null safety checks for policy dropdowns
Added checks to prevent "undefined" errors when:
- Policy select elements don't exist
- Policy dropdowns haven't loaded yet
- User is being edited before policies are loaded
* Fix policy dropdown by using correct JSON field name
JSON response has lowercase 'name' field but JavaScript was accessing 'Name'.
Changed policy.Name to policy.name to match the IAMPolicy JSON structure.
* Fix policy names not being saved on user update
Changed condition from len(req.PolicyNames) > 0 to req.PolicyNames != nil
to ensure policy names are always updated when present in the request,
even if it's an empty array (to allow clearing policies).
* Add debug logging for policy names update flow
Added console.log in frontend and glog in backend to trace
policy_names data through the update process.
* Temporarily disable auto-reload for debugging
Commented out window.location.reload() so console logs are visible
when updating a user.
* Add detailed debug logging and alert for policy selection
Added console.log for each step and an alert to show policy_names value
to help diagnose why it's not being included in the request.
* Regenerate templ files for object_store_users
Ran templ generate to ensure _templ.go files are up to date with
the latest .templ changes including debug logging.
* Remove debug logging and restore normal functionality
Cleaned up temporary debug code (console.log and alert statements)
and re-enabled automatic page reload after user update.
* Add step-by-step alert debugging for policy update
Added 5 alert checkpoints to trace policy data through the update flow:
1. Check if policiesSelect element exists
2. Show selected policy values
3. Show userData.policy_names
4. Show full request body
5. Confirm server response
Temporarily disabled auto-reload to see alerts.
* Add version check alert on page load
Added alert on DOMContentLoaded to verify new JavaScript is being executed
and not cached by the browser.
* Compile templates using make
Ran make to compile all template files and install the weed binary.
* Add button click detection and make handleUpdateUser global
- Added inline alert on button click to verify click is detected
- Made handleUpdateUser a window-level function to ensure it's accessible
- Added alert at start of handleUpdateUser function
* Fix handleUpdateUser scope issue - remove duplicate definition
Removed duplicate function definition that was inside DOMContentLoaded.
Now handleUpdateUser is defined only once in global scope (line 383)
making it accessible when button onclick fires.
* Remove all duplicate handleUpdateUser definitions
Now handleUpdateUser is defined only once at the very top of the script
block (line 352), before DOMContentLoaded, ensuring it's available when
the button onclick fires.
* Add function existence check and error catching
Added alerts to check if handleUpdateUser is defined and wrapped
the function call in try-catch to capture any JavaScript errors.
Also added console.log statements to verify function definition.
* Simplify handleUpdateUser to non-async for testing
Removed async/await and added early return to test if function
can be called at all. This will help identify if async is causing
the issue.
* Add cache-control headers to prevent browser caching
Added no-cache headers to ShowObjectStoreUsers handler to prevent
aggressive browser caching of inline JavaScript in the HTML page.
* Fix syntax error - make handleUpdateUser async
Changed function back to async to fix 'await is only valid in async functions' error.
The cache-control headers are working - browser is now loading new code.
* Update version check to v3 to verify cache busting
Changed version alert to 'v3 - WITH EARLY RETURN' to confirm
the new code with early return statement is being loaded.
* Remove all debug code - clean implementation
Removed all alerts, console.logs, and test code.
Implemented clean policy update functionality with proper error handling.
* Add ETag header for cache-busting and update walkthrough
* Fix policy pre-selection in Edit User modal
- Updated admin.js editUser function to pre-select policies
- Root cause: duplicate editUser in admin.js overwrote inline version
- Added policy pre-selection logic to match inline template
- Verified working in browser: policies now pre-select correctly
* Fix policy persistence in handleUpdateUser
- Added policy_names field to userData payload in handleUpdateUser
- Policies were being lost because handleUpdateUser only sent email and actions
- Now collects selected policies from editPolicies dropdown
- Verified working: policies persist correctly across updates
* Fix XSS vulnerability in access keys display
- Escape HTML in access key display using escapeHtml utility
- Replace inline onclick handlers with data attributes
- Add event delegation for delete access key buttons
- Prevents script injection via malicious access key values
* Fix additional XSS vulnerabilities in user details display
- Escape HTML in actions badges (line 626)
- Escape HTML in policy_names badges (line 636)
- Prevents script injection via malicious action or policy names
* Fix XSS vulnerability in loadPolicies function
- Replace innerHTML string concatenation with DOM API
- Use createElement and textContent for safe policy name insertion
- Prevents script injection via malicious policy names
- Apply same pattern to both create and edit select elements
* Remove debug logging from UpdateObjectStoreUser
- Removed glog.V(0) debug statements
- Clean up temporary debugging code before production
* Remove duplicate handleUpdateUser function
- Removed inline handleUpdateUser that duplicated admin.js logic
- Removed debug console.log statement
- admin.js version is now the single source of truth
- Eliminates maintenance burden of keeping two versions in sync
* Refine user management and address code review feedback
- Preserve PolicyNames in UpdateUserPolicies
- Allow clearing actions in UpdateObjectStoreUser by checking for nil
- Remove version comment from object_store_users.templ
- Refactor loadPolicies for DRYness using cloneNode while keeping DOM API security
* IAM Authorization for Static Access Keys
* verified XSS Fixes in Templates
* fix div
* fix: disallow file name too long when writing a file
* bool LongerName to MaxFilenameLength
---------
Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>
* fix nomore writables volumes while disk free space is sufficient by time delay
* reset
---------
Co-authored-by: wang wusong <wangwusong@virtaitech.com>