Commit Graph

13355 Commits

Author SHA1 Message Date
Chris Lu
5208c7c727 fix(s3api): improve PutBucketHandler comment for orphaned collection recovery
Clarify the comment and log message for the case where a collection
exists but the bucket directory is missing, explaining the root cause
(partial deletion) more precisely.
2026-03-11 13:49:11 -07:00
Chris Lu
12b360f499 fix(s3api): delete bucket directory before collection to prevent inconsistent state
Reorder DeleteBucketHandler to remove the bucket directory first, then
delete the collection. If collection deletion fails, the bucket is still
effectively deleted and can be recreated. Previously, if directory
deletion succeeded but collection deletion failed, the bucket was left
in an unrecoverable state.
2026-03-11 13:49:03 -07:00
Chris Lu
d1a631123f fix(s3api): allow bucket recreation when orphaned collection exists (#8605)
* fix(s3api): allow bucket recreation when orphaned collection exists (#8601)

When a bucket is deleted, its filer directory is removed but the
underlying collection/volumes may not be fully cleaned up yet. If the
bucket is immediately recreated, PutBucketHandler was returning
ErrBucketAlreadyExists due to the orphaned collection, blocking bucket
recreation and causing subsequent uploads to fail with InternalError.

Allow bucket creation to proceed when a collection exists without a
corresponding bucket directory, since this is a transient orphaned state
from a previous deletion.

* fix(s3api): handle concurrent bucket creation race in mkdir

On mkdir failure, re-check whether the bucket directory now exists
and return BucketAlreadyExists instead of InternalError when another
request created the bucket concurrently.
2026-03-11 13:42:43 -07:00
Chris Lu
b799650357 fix(shell): set LastLocalSyncTsNs in remote.copy.local so remote.uncache works (#8604)
remote.uncache checks LastLocalSyncTsNs to determine if a file has been
synced to remote. remote.copy.local was not setting this field, leaving
it at 0, which caused uncache to skip all files uploaded via
remote.copy.local.

Fixes #8602
2026-03-11 12:55:45 -07:00
Chris Lu
2ff4a07544 Reduce task logger glog noise and remove per-write fsync (#8603)
* Reduce task logger noise: stop duplicating every log entry to glog and stderr

Every task log entry was being tripled: written to the task log file,
forwarded to glog (which writes to /tmp by default with no rotation),
and echoed to stderr. This caused glog files to fill /tmp on long-running
workers.

- Remove INFO/DEBUG forwarding to glog (only ERROR/WARNING remain)
- Remove stderr echo of every log line
- Remove fsync on every single log write (unnecessary for log files)

* Fix glog call depth for correct source file attribution

The call stack is: caller → Error() → log() → writeLogEntry() →
glog.ErrorDepth(), so depth=4 is needed for glog to report the
original caller's file and line number.
2026-03-11 12:42:18 -07:00
Mohamed Sekour
1df6821ec6 Fix topologySpreadConstraints key in sftp-deployment.yaml (#8600) 2026-03-11 03:05:23 -07:00
Chris Lu
4a5243886a 4.17 2026-03-11 02:29:24 -07:00
Chris Lu
e1e4c9437a fix(s3api): ListObjects with trailing-slash prefix matches sibling directories (#8599)
fix(s3api): ListObjects with trailing-slash prefix returns wrong results

When ListObjectsV2 is called with a prefix ending in "/" (e.g., "foo/"),
normalizePrefixMarker strips the trailing slash and splits into
dir="parent" and prefix="foo". The filer then lists entries matching
prefix "foo", which returns both directory "foo" and "foo1000".

The prefixEndsOnDelimiter guard correctly identifies directory "foo" as
the target and recurses into it, but then resets the guard to false.
The loop continues and incorrectly recurses into "foo1000" as well,
causing the listing to return objects from unrelated directories.

Fix: after recursing into the exact directory targeted by the
trailing-slash prefix, return immediately from the listing loop.
There is no reason to process sibling entries since the original
prefix specifically targeted one directory.
2026-03-11 02:28:34 -07:00
Chris Lu
f950a941e3 Fix trust policy validation for specific AWS user principals (#8597)
* Add tests for AWS user principal in AssumeRole trust policies

Add test cases that verify trust policy validation when using specific
AWS user principals (e.g., "arn:aws:iam::000000000000:user/backend")
in the Principal field of trust policies for AssumeRole.

Covers single user, multiple users (array), wildcard, and plain string
principal formats. These tests demonstrate the bug reported in #8588
where specific user principals always fail validation.

* Populate RequestContext in ValidateTrustPolicyForPrincipal

ValidateTrustPolicyForPrincipal was creating an EvaluationContext with
a nil RequestContext. The policy engine's principal matching logic looks
up "aws:PrincipalArn" in RequestContext for non-wildcard principals,
so specific user ARNs like "arn:aws:iam::000000000000:user/backend"
always failed to match, while wildcard "*" worked because it
short-circuits before the lookup.

Populate RequestContext with both "principal" and "aws:PrincipalArn"
keys, consistent with how IsActionAllowed already does it.

Fixes #8588

* Remove GitHub discussion URL from source code comments

* Add specific error message assertions in trust policy tests
2026-03-10 21:19:40 -07:00
Chris Lu
ac579c1746 Fix plugin configuration tab layout overflow (#8596)
Fix plugin configuration tab layout overflow (#8587)

Remove h-100 from Job Scheduling Settings card, which caused it to
stretch to 100% of the row height and push the Next Run card below
the row boundary, overflowing into the Detection Results section.
2026-03-10 19:19:42 -07:00
Chris Lu
0a5c5ed4ce Persist S3 bucket counter metrics across idle periods (#8595)
* Stop deleting counter metrics during bucket TTL cleanup

Counter metrics (traffic bytes, request counts, object counts) are
monotonically increasing by design. Deleting them after 10 minutes of
bucket inactivity causes them to vanish from /metrics output and reset
to zero when traffic resumes, breaking Prometheus rate()/increase()
queries and making historical traffic reporting impossible.

Only delete gauges and histograms in the TTL cleanup loop, as these
represent current state and are safely re-populated on next activity.

Fixes https://github.com/seaweedfs/seaweedfs/issues/8521

* Clean up all bucket metrics on bucket deletion

Add DeleteBucketMetrics() to delete all metrics (including counters)
for a bucket when it is explicitly deleted. This prevents unbounded
label cardinality from accumulating for buckets that no longer exist.

Called from DeleteBucketHandler after successful bucket deletion.

* Reduce mutex scope in bucket metrics TTL sweep

Collect expired bucket names under the lock, then release before
calling DeletePartialMatch on Prometheus metrics. This prevents
RecordBucketActiveTime from blocking during the expensive cleanup.
2026-03-10 19:00:40 -07:00
Chris Lu
0a2dac1e56 Reduce mutex scope in bucket metrics TTL sweep
Collect expired bucket names under the lock, then release before
calling DeletePartialMatch on Prometheus metrics. This prevents
RecordBucketActiveTime from blocking during the expensive cleanup.
2026-03-10 18:43:35 -07:00
Chris Lu
737116e83c fix port probing 2026-03-10 18:30:19 -07:00
Chris Lu
b20eae697e Merge branch 'master' of https://github.com/seaweedfs/seaweedfs 2026-03-10 18:03:39 -07:00
Chris Lu
07f3f5eec5 remove worker links 2026-03-10 18:03:38 -07:00
Chris Lu
47cad59c70 Remove misleading Workers sub-menu items from admin sidebar (#8594)
* Remove misleading Workers sub-menu items from admin sidebar

The sidebar sub-items (Job Detection, Job Queue, Job Execution,
Configuration) always navigated to the first job type's tabs
(typically EC Encoding) rather than showing cross-job-type views.
This was confusing as noted in #8590. Since the in-page tabs already
provide this navigation, remove the redundant sidebar sub-items and
keep only the top-level Workers link.

Fixes #8590

* Update layout_templ.go
2026-03-10 15:55:14 -07:00
Chris Lu
b17e2b411a Add dynamic timeouts to plugin worker vacuum gRPC calls (#8593)
* add dynamic timeouts to plugin worker vacuum gRPC calls

All vacuum gRPC calls used context.Background() with no deadline,
so the plugin scheduler's execution timeout could kill a job while
a large volume compact was still in progress. Use volume-size-scaled
timeouts matching the topology vacuum approach: 3 min/GB for compact,
1 min/GB for check, commit, and cleanup.

Fixes #8591

* scale scheduler execution timeout by volume size

The scheduler's per-job execution timeout (default 240s) would kill
vacuum jobs on large volumes before they finish. Three changes:

1. Vacuum detection now includes estimated_runtime_seconds in job
   proposals, computed as 5 min/GB of volume size.

2. The scheduler checks for estimated_runtime_seconds in job
   parameters and uses it as the execution timeout when larger than
   the default — a generic mechanism any handler can use.

3. Vacuum task gRPC calls now use the passed-in ctx as parent
   instead of context.Background(), so scheduler cancellation
   propagates to in-flight RPCs.

* extend job type runtime when proposals need more time

The JobTypeMaxRuntime (default 30 min) wraps both detection and
execution. Its context is the parent of all per-job execution
contexts, so even with per-job estimated_runtime_seconds, jobCtx
would cancel everything when it expires.

After detection, scan proposals for the maximum
estimated_runtime_seconds. If any proposal needs more time than
the remaining JobTypeMaxRuntime, create a new execution context
with enough headroom. This lets large vacuum jobs complete without
being killed by the job type deadline while still respecting the
configured limit for normal-sized jobs.

* log missing volume size metric, remove dead minimum runtime guard

Add a debug log in vacuumTimeout when t.volumeSize is 0 so
operators can investigate why metrics are missing for a volume.

Remove the unreachable estimatedRuntimeSeconds < 180 check in
buildVacuumProposal — volumeSizeGB always >= 1 (due to +1 floor),
so estimatedRuntimeSeconds is always >= 300.

* cap estimated runtime and fix status check context

- Cap maxEstimatedRuntime and per-job timeout overrides to 8 hours
  to prevent unbounded timeouts from bad metrics.
- Check execCtx.Err() instead of jobCtx.Err() for status reporting,
  since dispatch runs under execCtx which may have a longer deadline.
  A successful dispatch under execCtx was misreported as "timeout"
  when jobCtx had expired.
2026-03-10 13:48:42 -07:00
Chris Lu
4c88fbfd5e Fix nil pointer crash during concurrent vacuum compaction (#8592)
* check for nil needle map before compaction sync

When CommitCompact runs concurrently, it sets v.nm = nil under
dataFileAccessLock. CompactByIndex does not hold that lock, so
v.nm.Sync() can hit a nil pointer. Add an early nil check to
return an error instead of crashing.

Fixes #8591

* guard copyDataBasedOnIndexFile size check against nil needle map

The post-compaction size validation at line 538 accesses
v.nm.ContentSize() and v.nm.DeletedSize(). If CommitCompact has
concurrently set v.nm to nil, this causes a SIGSEGV. Skip the
validation when v.nm is nil since the actual data copy uses local
needle maps (oldNm/newNm) and is unaffected.

Fixes #8591

* use atomic.Bool for compaction flags to prevent concurrent vacuum races

The isCompacting and isCommitCompacting flags were plain bools
read and written from multiple goroutines without synchronization.
This allowed concurrent vacuums on the same volume to pass the
guard checks and run simultaneously, leading to the nil pointer
crash. Using atomic.Bool with CompareAndSwap ensures only one
compaction or commit can run per volume at a time.

Fixes #8591

* use go-version-file in CI workflows instead of hardcoded versions

Use go-version-file: 'go.mod' so CI automatically picks up the Go
version from go.mod, avoiding future version drift. Reordered
checkout before setup-go in go.yml and e2e.yml so go.mod is
available. Removed the now-unused GO_VERSION env vars.

* capture v.nm locally in CompactByIndex to close TOCTOU race

A bare nil check on v.nm followed by v.nm.Sync() has a race window
where CommitCompact can set v.nm = nil between the two. Snapshot
the pointer into a local variable so the nil check and Sync operate
on the same reference.

* add dynamic timeouts to plugin worker vacuum gRPC calls

All vacuum gRPC calls used context.Background() with no deadline,
so the plugin scheduler's execution timeout could kill a job while
a large volume compact was still in progress. Use volume-size-scaled
timeouts matching the topology vacuum approach: 3 min/GB for compact,
1 min/GB for check, commit, and cleanup.

Fixes #8591

* Revert "add dynamic timeouts to plugin worker vacuum gRPC calls"

This reverts commit 80951934c37416bc4f6c1472a5d3f8d204a637d9.

* unify compaction lifecycle into single atomic flag

Replace separate isCompacting and isCommitCompacting flags with a
single isCompactionInProgress atomic.Bool. This ensures CompactBy*,
CommitCompact, Close, and Destroy are mutually exclusive — only one
can run at a time per volume.

Key changes:
- All entry points use CompareAndSwap(false, true) to claim exclusive
  access. CompactByVolumeData and CompactByIndex now also guard v.nm
  and v.DataBackend with local captures.
- Close() waits for the flag outside dataFileAccessLock to avoid
  deadlocking with CommitCompact (which holds the flag while waiting
  for the lock). It claims the flag before acquiring the lock so no
  new compaction can start.
- Destroy() uses CAS instead of a racy Load check, preventing
  concurrent compaction from racing with volume teardown.
- unmountVolumeByCollection no longer deletes from the map;
  DeleteCollectionFromDiskLocation removes entries only after
  successful Destroy, preventing orphaned volumes on failure.

Fixes #8591
2026-03-10 13:31:45 -07:00
Chris Lu
d4d2e511ed for mini, default to bind all 2026-03-10 00:56:40 -07:00
Chris Lu
3d9f7f6f81 go 1.25 2026-03-09 23:10:27 -07:00
Chris Lu
d89a78d9e3 reduce logs 2026-03-09 22:42:03 -07:00
Chris Lu
00000ec006 Update s3_buckets_templ.go 2026-03-09 22:41:07 -07:00
Chris Lu
1bd7a98a4a simplify plugin scheduler: remove configurable IdleSleepSeconds, use constant 61s
The SchedulerConfig struct and its persistence/API were unnecessary
indirection. Replace with a simple constant (reduced from 613s to 61s)
so the scheduler re-checks for detectable job types promptly after
going idle, improving the clean-install experience.
2026-03-09 22:41:03 -07:00
Chris Lu
8ad58e7002 4.16 2026-03-09 21:52:43 -07:00
Chris Lu
f220328ae4 test: assert ReadFileStatusCount in batch execution test
Verify that pre-delete verification called ReadVolumeFileStatus on
both source and target for each volume move.
2026-03-09 19:33:06 -07:00
Chris Lu
cf3693651c fix: add IdxFileSize check to pre-delete volume verification
The verification step checked DatFileSize and FileCount but not
IdxFileSize, leaving a gap in the copy validation before source
deletion.
2026-03-09 19:33:02 -07:00
Chris Lu
5f85bf5e8a Batch volume balance: run multiple moves per job (#8561)
* proto: add BalanceMoveSpec and batch fields to BalanceTaskParams

Add BalanceMoveSpec message for encoding individual volume moves,
and max_concurrent_moves + repeated moves fields to BalanceTaskParams
to support batching multiple volume moves in a single job.

* balance handler: add batch execution with concurrent volume moves

Refactor Execute() into executeSingleMove() (backward compatible) and
executeBatchMoves() which runs multiple volume moves concurrently using
a semaphore-bounded goroutine pool. When BalanceTaskParams.Moves is
populated, the batch path is taken; otherwise the single-move path.

Includes aggregate progress reporting across concurrent moves,
per-move error collection, and partial failure support.

* balance handler: add batch config fields to Descriptor and worker config

Add max_concurrent_moves and batch_size fields to the worker config
form and deriveBalanceWorkerConfig(). These control how many volume
moves run concurrently within a batch job and the maximum batch size.

* balance handler: group detection proposals into batch jobs

When batch_size > 1, the Detect method groups detection results into
batch proposals where each proposal encodes multiple BalanceMoveSpec
entries in BalanceTaskParams.Moves. Single-result batches fall back
to the existing single-move proposal format for backward compatibility.

* admin UI: add volume balance execution plan and batch badge

Add renderBalanceExecutionPlan() for rich rendering of volume balance
jobs in the job detail modal. Single-move jobs show source/target/volume
info; batch jobs show a moves table with all volume moves.

Add batch badge (e.g., "5 moves") next to job type in the execution
jobs table when the job has batch=true label.

* Update plugin_templ.go

* fix: detection algorithm uses greedy target instead of divergent topology scores

The detection loop tracked effective volume counts via an adjustments map,
but createBalanceTask independently called planBalanceDestination which used
the topology's LoadCount — a separate, unadjusted source of truth. This
divergence caused multiple moves to pile onto the same server.

Changes:
- Add resolveBalanceDestination to resolve the detection loop's greedy
  target (minServer) rather than independently picking a destination
- Add oscillation guard: stop when max-min <= 1 since no single move
  can improve the balance beyond that point
- Track unseeded destinations: if a target server wasn't in the initial
  serverVolumeCounts, add it so subsequent iterations include it
- Add TestDetection_UnseededDestinationDoesNotOverload

* fix: handler force_move propagation, partial failure, deterministic dedupe

- Propagate ForceMove from outer BalanceTaskParams to individual move
  TaskParams so batch moves respect the force_move flag
- Fix partial failure: mark job successful if at least one move
  succeeded (succeeded > 0 || failed == 0) to avoid re-running
  already-completed moves on retry
- Use SHA-256 hash for deterministic dedupe key fallback instead of
  time.Now().UnixNano() which is non-deterministic
- Remove unused successDetails variable
- Extract maxProposalStringLength constant to replace magic number 200

* admin UI: use template literals in balance execution plan rendering

* fix: integration test handles batch proposals from batched detection

With batch_size=20, all moves are grouped into a single proposal
containing BalanceParams.Moves instead of top-level Sources/Targets.
Update assertions to handle both batch and single-move proposal formats.

* fix: verify volume size on target before deleting source during balance

Add a pre-delete safety check that reads the volume file status on both
source and target, then compares .dat file size and file count. If they
don't match, the move is aborted — leaving the source intact rather than
risking irreversible data loss.

Also removes the redundant mountVolume call since VolumeCopy already
mounts the volume on the target server.

* fix: clamp maxConcurrent, serialize progress sends, validate config as int64

- Clamp maxConcurrentMoves to defaultMaxConcurrentMoves before creating
  the semaphore so a stale or malicious job cannot request unbounded
  concurrent volume moves
- Extend progressMu to cover sender.SendProgress calls since the
  underlying gRPC stream is not safe for concurrent writes
- Perform bounds checks on max_concurrent_moves and batch_size in int64
  space before casting to int, avoiding potential overflow on 32-bit

* fix: check disk capacity in resolveBalanceDestination

Skip disks where VolumeCount >= MaxVolumeCount so the detection loop
does not propose moves to a full disk that would fail at execution time.

* test: rename unseeded destination test to match actual behavior

The test exercises a server with 0 volumes that IS seeded from topology
(matching disk type), not an unseeded destination. Rename to
TestDetection_ZeroVolumeServerIncludedInBalance and fix comments.

* test: tighten integration test to assert exactly one batch proposal

With default batch_size=20, all moves should be grouped into a single
batch proposal. Assert len(proposals)==1 and require BalanceParams with
Moves, removing the legacy single-move else branch.

* fix: propagate ctx to RPCs and restore source writability on abort

- All helper methods (markVolumeReadonly, copyVolume, tailVolume,
  readVolumeFileStatus, deleteVolume) now accept a context parameter
  instead of using context.Background(), so Execute's ctx propagates
  cancellation and timeouts into every volume server RPC
- Add deferred cleanup that restores the source volume to writable if
  any step after markVolumeReadonly fails, preventing the source from
  being left permanently readonly on abort
- Add markVolumeWritable helper using VolumeMarkWritableRequest

* fix: deep-copy protobuf messages in test recording sender

Use proto.Clone in recordingExecutionSender to store immutable snapshots
of JobProgressUpdate and JobCompleted, preventing assertions from
observing mutations if the handler reuses message pointers.

* fix: add VolumeMarkWritable and ReadVolumeFileStatus to fake volume server

The balance task now calls ReadVolumeFileStatus for pre-delete
verification and VolumeMarkWritable to restore writability on abort.
Add both RPCs to the test fake, and drop the mountCalls assertion since
BalanceTask no longer calls VolumeMount directly (VolumeCopy handles it).

* fix: use maxConcurrentMovesLimit (50) for clamp, not defaultMaxConcurrentMoves

defaultMaxConcurrentMoves (5) is the fallback when the field is unset,
not an upper bound. Clamping to it silently overrides valid config
values like 10/20/50. Introduce maxConcurrentMovesLimit (50) matching
the descriptor's MaxValue and clamp to that instead.

* fix: cancel batch moves on progress stream failure

Derive a cancellable batchCtx from the caller's ctx. If
sender.SendProgress returns an error (client disconnect, context
cancelled), capture it, skip further sends, and cancel batchCtx so
in-flight moves abort via their propagated context rather than running
blind to completion.

* fix: bound cleanup timeout and validate batch move fields

- Use a 30-second timeout for the deferred markVolumeWritable cleanup
  instead of context.Background() which can block indefinitely if the
  volume server is unreachable
- Validate required fields (VolumeID, SourceNode, TargetNode) before
  appending moves to a batch proposal, skipping invalid entries
- Fall back to a single-move proposal when filtering leaves only one
  valid move in a batch

* fix: cancel task execution on SendProgress stream failure

All handler progress callbacks previously ignored SendProgress errors,
allowing tasks to continue executing after the client disconnected.
Now each handler creates a derived cancellable context and cancels it
on the first SendProgress error, stopping the in-flight task promptly.

Handlers fixed: erasure_coding, vacuum, volume_balance (single-move),
and admin_script (breaks command loop on send failure).

* fix: validate batch moves before scheduling in executeBatchMoves

Reject empty batches, enforce a hard upper bound (100 moves), and
filter out nil or incomplete move specs (missing source/target/volume)
before allocating progress tracking and launching goroutines.

* test: add batch balance execution integration test

Tests the batch move path with 3 volumes, max concurrency 2, using
fake volume servers. Verifies all moves complete with correct readonly,
copy, tail, and delete RPC counts.

* test: add MarkWritableCount and ReadFileStatusCount accessors

Expose the markWritableCalls and readFileStatusCalls counters on the
fake volume server, following the existing MarkReadonlyCount pattern.

* fix: oscillation guard uses global effective counts for heterogeneous capacity

The oscillation guard (max-min <= 1) previously used maxServer/minServer
which are determined by utilization ratio. With heterogeneous capacity,
maxServer by utilization can have fewer raw volumes than minServer,
producing a negative diff and incorrectly triggering the guard.

Now scans all servers' effective counts to find the true global max/min
volume counts, so the guard works correctly regardless of whether
utilization-based or raw-count balancing is used.

* fix: admin script handler breaks outer loop on SendProgress failure

The break on SendProgress error inside the shell.Commands scan only
exited the inner loop, letting the outer command loop continue
executing commands on a broken stream. Use a sendBroken flag to
propagate the break to the outer execCommands loop.
2026-03-09 19:30:08 -07:00
Chris Lu
b991acf634 fix: paginate bucket listing in Admin UI to show all buckets (#8585)
* fix: paginate bucket listing in Admin UI to show all buckets

The Admin UI's GetS3Buckets() had a hardcoded Limit of 1000 in the
ListEntries request, causing the Total Buckets count to cap at 1000
even when more buckets exist. This adds pagination to iterate through
all buckets by continuing from the last entry name when a full page
is returned.

Fixes seaweedfs/seaweedfs#8564

* feat: add server-side pagination and sorting to S3 buckets page

Add pagination controls, page size selector, and sortable column
headers to the Admin UI's Object Store buckets page, following the
same pattern used by the Cluster Volumes page. This ensures the UI
remains responsive with thousands of buckets.

- Add CurrentPage, TotalPages, PageSize, SortBy, SortOrder to S3BucketsData
- Accept page/pageSize/sortBy/sortOrder query params in ShowS3Buckets handler
- Sort buckets by name, owner, created, objects, logical/physical size
- Paginate results server-side (default 100 per page)
- Add pagination nav, page size dropdown, and sort indicators to template

* Update s3_buckets_templ.go

* Update object_store_users_templ.go

* fix: use errors.Is(err, io.EOF) instead of string comparison

Replace brittle err.Error() == "EOF" string comparison with idiomatic
errors.Is(err, io.EOF) for checking stream end in bucket listing.

* fix: address PR review findings for bucket pagination

- Clamp page to totalPages when page exceeds total, preventing empty
  results with misleading pagination state
- Fix sort comparator to use explicit ascending/descending comparisons
  with a name tie-breaker, satisfying strict weak ordering for sort.Slice
- Capture SnapshotTsNs from first ListEntries response and pass it to
  subsequent requests for consistent pagination across pages
- Replace non-focusable <th onclick> sort headers with <a> tags and
  reuse getSortIcon, matching the cluster_volumes accessibility pattern
- Change exportBucketList() to fetch all buckets from /api/s3/buckets
  instead of scraping DOM rows (which now only contain the current page)
2026-03-09 18:55:47 -07:00
Chris Lu
02d3e3195c Update object_store_users_templ.go 2026-03-09 18:34:58 -07:00
Chris Lu
470075dd90 admin/balance: fix Max Volumes display and balancer source selection (#8583)
* admin: fix Max Volumes column always showing 0

GetClusterVolumeServers() computed DiskCapacity from
diskInfo.MaxVolumeCount but never populated the MaxVolumes field
on the VolumeServer struct, causing the column to always display 0.

* balance: use utilization ratio for source server selection

The balancer selected the source server (to move volumes FROM) by raw
volume count. In clusters with heterogeneous MaxVolumeCount settings,
the server with the highest capacity naturally holds the most volumes
and was always picked as the source, even when it had the lowest
utilization ratio.

Change source selection and imbalance calculation to use utilization
ratio (effectiveCount / maxVolumeCount) so servers are compared by how
full they are relative to their capacity, not by absolute volume count.

This matches how destination scoring already works via
calculateBalanceScore().
2026-03-09 18:34:11 -07:00
Lars Lehtonen
f8b7357350 weed/server: fix dropped error (#8584)
* weed/server: fix dropped error

* Removed the redundant check.

---------

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-03-09 18:04:12 -07:00
dependabot[bot]
e1c4faba38 build(deps): bump org.apache.zookeeper:zookeeper from 3.9.4 to 3.9.5 in /test/java/spark (#8580)
* build(deps): bump org.apache.zookeeper:zookeeper in /test/java/spark

Bumps org.apache.zookeeper:zookeeper from 3.9.4 to 3.9.5.

---
updated-dependencies:
- dependency-name: org.apache.zookeeper:zookeeper
  dependency-version: 3.9.5
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: use go-version-file instead of hardcoded Go version in CI workflows

The hardcoded go-version '1.24' is too old for go.mod which requires
go >= 1.25.0, causing build failures in Spark integration tests.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-03-09 17:29:53 -07:00
Chris Lu
6c7fe87a72 helm: add s3.tlsSecret for custom S3 HTTPS certificate (#8582)
* helm: add s3.tlsSecret to allow custom TLS certificate for S3 HTTPS endpoint

Allow users to specify an external Kubernetes TLS secret for the S3
HTTPS endpoint instead of using the internal self-signed client
certificate. This enables using publicly trusted certificates (e.g.
from Let's Encrypt) so S3 clients don't need to trust the internal CA.

The new s3.tlsSecret value is supported in the standalone S3 gateway,
filer with embedded S3, and all-in-one deployment templates.

Closes #8581

* refactor: extract S3 TLS helpers to reduce duplication

Move repeated S3 TLS cert/key logic into shared helper templates
(seaweedfs.s3.tlsArgs, seaweedfs.s3.tlsVolumeMount, seaweedfs.s3.tlsVolume)
in _helpers.tpl, and use them across all three deployment templates.

* helm: add allInOne.s3.trafficDistribution support

Add the missing allInOne.s3.trafficDistribution branch to the
seaweedfs.trafficDistribution helper and wire it into the all-in-one
service template, mirroring the existing s3-service.yaml behavior.
PreferClose is auto-converted to PreferSameZone on k8s >=1.35.

* fix: scope S3 TLS mounts to S3-enabled pods and simplify trafficDistribution helper

- Wrap S3 TLS volume/volumeMount includes in allInOne.s3.enabled and
  filer.s3.enabled guards so the custom TLS secret is only mounted
  when S3 is actually enabled in that deployment mode.
- Refactor seaweedfs.trafficDistribution helper to accept an explicit
  value+Capabilities dict instead of walking multiple .Values paths,
  making each call site responsible for passing its own setting.
2026-03-09 14:24:42 -07:00
Chris Lu
b3d32fe73b fix go version 2026-03-09 14:18:46 -07:00
dependabot[bot]
f439c84d01 build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.41.1 to 1.41.3 (#8576)
Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) from 1.41.1 to 1.41.3.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.41.1...v1.41.3)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2
  dependency-version: 1.41.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2026-03-09 14:14:29 -07:00
Chris Lu
89f1096c0e Update ec-integration.yml 2026-03-09 14:05:39 -07:00
Chris Lu
6dab90472b admin: fix access key creation UX (#8579)
* admin: remove misleading "secret key only shown once" warning

The access key details modal already allows viewing both the access key
and secret key at any time, so the warning about the secret key only
being displayed once is incorrect and misleading.

* admin: allow specifying custom access key and secret key

Add optional access_key and secret_key fields to the create access key
API. When provided, the specified keys are used instead of generating
random ones. The UI now shows a form with optional fields when creating
a new key, with a note that leaving them blank auto-generates keys.

* admin: check access key uniqueness before creating

Access keys must be globally unique across all users since S3 auth
looks them up in a single global map. Add an explicit check using
GetUserByAccessKey before creating, so the user gets a clear error
("access key is already in use") rather than a generic store error.

* Update object_store_users_templ.go

* admin: address review feedback for access key creation

Handler:
- Use decodeJSONBody/newJSONMaxReader instead of raw json.Decode to
  enforce request size limits and handle malformed JSON properly
- Return 409 Conflict for duplicate access keys, 400 Bad Request for
  validation errors, instead of generic 500

Backend:
- Validate access key length (4-128 chars) and secret key length
  (8-128 chars) when user-provided

Frontend:
- Extract resetCreateKeyForm() helper to avoid duplicated cleanup logic
- Wire resetCreateKeyForm to accessKeysModal hidden.bs.modal event so
  form state is always cleared when modal is dismissed
- Change secret key input to type="password" with a visibility toggle

* admin: guard against nil request and handle GetUserByAccessKey errors

- Add nil check for the CreateAccessKeyRequest pointer before
  dereferencing, defaulting to an empty request (auto-generate both
  keys).
- Handle non-"not found" errors from GetUserByAccessKey explicitly
  instead of silently proceeding, so store errors (e.g. db connection
  failures) surface rather than being swallowed.

* Update object_store_users_templ.go

* admin: fix access key uniqueness check with gRPC store

GetUserByAccessKey returns a gRPC NotFound status error (not the
sentinel credential.ErrAccessKeyNotFound) when using the gRPC store,
causing the uniqueness check to fail with a spurious error.

Treat the lookup as best-effort: only reject when a user is found
(err == nil). Any error (not-found via any store, connectivity issues)
falls through to the store's own CreateAccessKey which enforces
uniqueness definitively.

* admin: fix error handling and input validation for access key creation

Backend:
- Remove access key value from the duplicate-key error message to avoid
  logging the caller-supplied identifier.

Handler:
- Handle empty POST body (io.EOF) as a valid request that auto-generates
  both keys, instead of rejecting it as malformed JSON.
- Return 404 for "not found" errors (e.g. non-existent user) instead of
  collapsing them into a 500.

Frontend:
- Add minlength/maxlength attributes matching backend constraints
  (access key 4-128, secret key 8-128).
- Call reportValidity() before submitting so invalid lengths are caught
  client-side without a round trip.

* admin: use sentinel errors and fix GetUserByAccessKey error handling

Backend (user_management.go):
- Define sentinel errors (ErrAccessKeyInUse, ErrUserNotFound,
  ErrInvalidInput) and wrap them in returned errors so callers can use
  errors.Is.
- Handle GetUserByAccessKey errors properly: check the sentinel
  credential.ErrAccessKeyNotFound first, then fall back to string
  matching for stores (gRPC) that return non-sentinel not-found errors.
  Surface unexpected errors instead of silently proceeding.

Handler (user_handlers.go):
- Replace fragile strings.Contains error matching with errors.Is
  against the new dash sentinels.

Frontend (object_store_users.templ):
- Add double-submit guard (isCreatingKey flag + button disabling) to
  prevent duplicate access key creation requests.
2026-03-09 14:03:41 -07:00
dependabot[bot]
a00d38d8d4 build(deps): bump go.mongodb.org/mongo-driver from 1.17.6 to 1.17.9 (#8575)
Bumps [go.mongodb.org/mongo-driver](https://github.com/mongodb/mongo-go-driver) from 1.17.6 to 1.17.9.
- [Release notes](https://github.com/mongodb/mongo-go-driver/releases)
- [Commits](https://github.com/mongodb/mongo-go-driver/compare/v1.17.6...v1.17.9)

---
updated-dependencies:
- dependency-name: go.mongodb.org/mongo-driver
  dependency-version: 1.17.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2026-03-09 13:11:58 -07:00
Chris Lu
f8d783f80e fix: ListObjectVersions interleave Version and DeleteMarker in sort order (#8567)
* fix: ListObjectVersions interleave Version and DeleteMarker in sort order

Go's default xml.Marshal serializes struct fields in definition order,
causing all <Version> elements to appear before all <DeleteMarker>
elements. The S3 API contract requires these elements to be interleaved
in the correct global sort order (by key ascending, then newest version
first within each key).

This broke clients that validate version list ordering within a single
key — an older Version would appear before a newer DeleteMarker for the
same object.

Fix: Replace the separate Versions/DeleteMarkers/CommonPrefixes arrays
with a single Entries []VersionListEntry slice. Each VersionListEntry
uses a per-element MarshalXML that outputs the correct XML tag name
(<Version>, <DeleteMarker>, or <CommonPrefixes>) based on which field
is populated. Since the entries are already in their correct sorted
order from buildSortedCombinedList, the XML output is automatically
interleaved correctly.

Also removes the unused ListObjectVersionsResult struct.

Note: The reporter also mentioned a cross-key timestamp ordering issue
when paginating with max-keys=1, but that is correct S3 behavior —
ListObjectVersions sorts by key name (ascending), not by timestamp.
Different keys having non-monotonic timestamps is expected.

* test: add CommonPrefixes XML marshaling coverage for ListObjectVersions

* fix: validate VersionListEntry has exactly one field set in MarshalXML

Return an error instead of silently emitting an empty <Version> element
when no field (or multiple fields) are populated. Also clean up the
misleading xml:"Version" struct tag on the Entries field.
2026-03-09 12:37:59 -07:00
dependabot[bot]
120d38176f build(deps): bump golang.org/x/sys from 0.41.0 to 0.42.0 (#8573)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.41.0 to 0.42.0.
- [Commits](https://github.com/golang/sys/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2026-03-09 12:24:28 -07:00
Chris Lu
55bce53953 reduce logs 2026-03-09 12:14:25 -07:00
Chris Lu
992db11d2b iam: add IAM group management (#8560)
* iam: add Group message to protobuf schema

Add Group message (name, members, policy_names, disabled) and
add groups field to S3ApiConfiguration for IAM group management
support (issue #7742).

* iam: add group CRUD to CredentialStore interface and all backends

Add group management methods (CreateGroup, GetGroup, DeleteGroup,
ListGroups, UpdateGroup) to the CredentialStore interface with
implementations for memory, filer_etc, postgres, and grpc stores.
Wire group loading/saving into filer_etc LoadConfiguration and
SaveConfiguration.

* iam: add group IAM response types

Add XML response types for group management IAM actions:
CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser.

* iam: add group management handlers to embedded IAM API

Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, and ListGroupsForUser handlers with
dispatch in ExecuteAction.

* iam: add group management handlers to standalone IAM API

Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups,
AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions
dispatch. Also add helper functions for user/policy side effects.

* iam: integrate group policies into authorization

Add groups and userGroups reverse index to IdentityAccessManagement.
Populate both maps during ReplaceS3ApiConfiguration and
MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate
policies from user's enabled groups in addition to user policies.
Update VerifyActionPermission to consider group policies when
checking hasAttachedPolicies.

* iam: add group side effects on user deletion and rename

When a user is deleted, remove them from all groups they belong to.
When a user is renamed, update group membership references. Applied
to both embedded and standalone IAM handlers.

* iam: watch /etc/iam/groups directory for config changes

Add groups directory to the filer subscription watcher so group
file changes trigger IAM configuration reloads.

* admin: add group management page to admin UI

Add groups page with CRUD operations, member management, policy
attachment, and enable/disable toggle. Register routes in admin
handlers and add Groups entry to sidebar navigation.

* test: add IAM group management integration tests

Add comprehensive integration tests for group CRUD, membership,
policy attachment, policy enforcement, disabled group behavior,
user deletion side effects, and multi-group membership. Add
"group" test type to CI matrix in s3-iam-tests workflow.

* iam: address PR review comments for group management

- Fix XSS vulnerability in groups.templ: replace innerHTML string
  concatenation with DOM APIs (createElement/textContent) for rendering
  member and policy lists
- Use userGroups reverse index in embedded IAM ListGroupsForUser for
  O(1) lookup instead of iterating all groups
- Add buildUserGroupsIndex helper in standalone IAM handlers; use it
  in ListGroupsForUser and removeUserFromAllGroups for efficient lookup
- Add note about gRPC store load-modify-save race condition limitation

* iam: add defensive copies, validation, and XSS fixes for group management

- Memory store: clone groups on store/retrieve to prevent mutation
- Admin dash: deep copy groups before mutation, validate user/policy exists
- HTTP handlers: translate credential errors to proper HTTP status codes,
  use *bool for Enabled field to distinguish missing vs false
- Groups templ: use data attributes + event delegation instead of inline
  onclick for XSS safety, prevent stale async responses

* iam: add explicit group methods to PropagatingCredentialStore

Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup
methods instead of relying on embedded interface fallthrough. Group
changes propagate via filer subscription so no RPC propagation needed.

* iam: detect postgres unique constraint violation and add groups index

Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of
a generic error. Add index on groups(disabled) for filtered queries.

* iam: add Marker field to group list response types

Add Marker string field to GetGroupResult, ListGroupsResult,
ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to
match AWS IAM pagination response format.

* iam: check group attachment before policy deletion

Reject DeletePolicy if the policy is attached to any group, matching
AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response.

* iam: include group policies in IAM authorization

Merge policy names from user's enabled groups into the IAMIdentity
used for authorization, so group-attached policies are evaluated
alongside user-attached policies.

* iam: check for name collision before renaming user in UpdateUser

Scan identities and inline policies for newUserName before mutating,
returning EntityAlreadyExists if a collision is found. Reuse the
already-loaded policies instead of loading them again inside the loop.

* test: use t.Cleanup for bucket cleanup in group policy test

* iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error

Wrap credential.ErrUserNotInGroup so errors.Is works in
groupErrorToHTTPStatus, returning proper 400 instead of 500.

* admin: regenerate groups_templ.go with XSS-safe data attributes

Regenerated from groups.templ which uses data-group-name attributes
instead of inline onclick with string interpolation.

* iam: add input validation and persist groups during migration

- Validate nil/empty group name in CreateGroup and UpdateGroup
- Save groups in migrateToMultiFile so they survive legacy migration

* admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies

* iam: short-circuit UpdateUser when newUserName equals current name

* iam: require empty PolicyNames before group deletion

Reject DeleteGroup when group has attached policies, matching the
existing members check. Also fix GetGroup error handling in
DeletePolicy to only skip ErrGroupNotFound, not all errors.

* ci: add weed/pb/** to S3 IAM test trigger paths

* test: replace time.Sleep with require.Eventually for propagation waits

Use polling with timeout instead of fixed sleeps to reduce flakiness
in integration tests waiting for IAM policy propagation.

* fix: use credentialManager.GetPolicy for AttachGroupPolicy validation

Policies created via CreatePolicy through credentialManager are stored
in the credential store, not in s3cfg.Policies (which only has static
config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy()
for policy existence validation.

* feat: add UpdateGroup handler to embedded IAM API

Add UpdateGroup action to enable/disable groups and rename groups
via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used
by tests to toggle group disabled status.

* fix: authenticate raw IAM API calls in group tests

The embedded IAM endpoint rejects anonymous requests. Replace
callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token
authentication via the test framework.

* feat: add UpdateGroup handler to standalone IAM API

Mirror the embedded IAM UpdateGroup handler in the standalone IAM API
for parity.

* fix: add omitempty to Marker XML tags in group responses

Non-truncated responses should not emit an empty <Marker/> element.

* fix: distinguish backend errors from missing policies in AttachGroupPolicy

Return ServiceFailure for credential manager errors instead of masking
them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups
instead of in-memory reverse index to avoid stale data. Add duplicate
name check to UpdateGroup rename.

* fix: standalone IAM AttachGroupPolicy uses persisted policy store

Check managed policies from GetPolicies() instead of s3cfg.Policies
so dynamically created policies are found. Also add duplicate name
check to UpdateGroup rename.

* fix: rollback inline policies on UpdateUser PutPolicies failure

If PutPolicies fails after moving inline policies to the new username,
restore both the identity name and the inline policies map to their
original state to avoid a partial-write window.

* fix: correct test cleanup ordering for group tests

Replace scattered defers with single ordered t.Cleanup in each test
to ensure resources are torn down in reverse-creation order:
remove membership, detach policies, delete access keys, delete users,
delete groups, delete policies. Move bucket cleanup to parent test
scope and delete objects before bucket.

* fix: move identity nil check before map lookup and refine hasAttachedPolicies

Move the nil check on identity before accessing identity.Name to
prevent panic. Also refine hasAttachedPolicies to only consider groups
that are enabled and have actual policies attached, so membership in
a no-policy group doesn't incorrectly trigger IAM authorization.

* fix: fail group reload on unreadable or corrupt group files

Return errors instead of logging and continuing when group files
cannot be read or unmarshaled. This prevents silently applying a
partial IAM config with missing group memberships or policies.

* fix: use errors.Is for sql.ErrNoRows comparison in postgres group store

* docs: explain why group methods skip propagateChange

Group changes propagate to S3 servers via filer subscription
(watching /etc/iam/groups/) rather than gRPC RPCs, since there
are no group-specific RPCs in the S3 cache protocol.

* fix: remove unused policyNameFromArn and strings import

* fix: update service account ParentUser on user rename

When renaming a user via UpdateUser, also update ParentUser references
in service accounts to prevent them from becoming orphaned after the
next configuration reload.

* fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel

Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps
it to 400 instead of falling back to 500.

* fix: use admin S3 client for bucket cleanup in enforcement test

The user S3 client may lack permissions by cleanup time since the
user is removed from the group in an earlier subtest. Use the admin
S3 client to ensure bucket and object cleanup always succeeds.

* fix: add nil guard for group param in propagating store log calls

Prevent potential nil dereference when logging group.Name in
CreateGroup and UpdateGroup of PropagatingCredentialStore.

* fix: validate Disabled field in UpdateGroup handlers

Reject values other than "true" or "false" with InvalidInputException
instead of silently treating them as false.

* fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration

Previously the merge started with empty group maps, dropping any
static-file groups. Now seeds from existing iam.groups before
overlaying dynamic config, and builds the reverse index after
merging to avoid stale entries from overridden groups.

* fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading

Replace direct equality (==) with errors.Is() to correctly match
wrapped errors, consistent with the rest of the codebase.

* fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus

Map these sentinel errors to 404 so AddGroupMember and
AttachGroupPolicy return proper HTTP status codes.

* fix: log cleanup errors in group integration tests

Replace fire-and-forget cleanup calls with error-checked versions
that log failures via t.Logf for debugging visibility.

* fix: prevent duplicate group test runs in CI matrix

The basic lane's -run "TestIAM" regex also matched TestIAMGroup*
tests, causing them to run in both the basic and group lanes.
Replace with explicit test function names.

* fix: add GIN index on groups.members JSONB for membership lookups

Without this index, ListGroupsForUser and membership queries
require full table scans on the groups table.

* fix: handle cross-directory moves in IAM config subscription

When a file is moved out of an IAM directory (e.g., /etc/iam/groups),
the dir variable was overwritten with NewParentPath, causing the
source directory change to be missed. Now also notifies handlers
about the source directory for cross-directory moves.

* fix: validate members/policies before deleting group in admin handler

AdminServer.DeleteGroup now checks for attached members and policies
before delegating to credentialManager, matching the IAM handler guards.

* fix: merge groups by name instead of blind append during filer load

Match the identity loader's merge behavior: find existing group
by name and replace, only append when no match exists. Prevents
duplicates when legacy and multi-file configs overlap.

* fix: check DeleteEntry response error when cleaning obsolete group files

Capture and log resp.Error from filer DeleteEntry calls during
group file cleanup, matching the pattern used in deleteGroupFile.

* fix: verify source user exists before no-op check in UpdateUser

Reorder UpdateUser to find the source identity first and return
NoSuchEntityException if not found, before checking if the rename
is a no-op. Previously a non-existent user renamed to itself
would incorrectly return success.

* fix: update service account parent refs on user rename in embedded IAM

The embedded IAM UpdateUser handler updated group membership but
not service account ParentUser fields, unlike the standalone handler.

* fix: replay source-side events for all handlers on cross-dir moves

Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for
the source directory during cross-directory moves, so all watchers
can clear caches for the moved-away resource.

* fix: don't seed mergedGroups from existing iam.groups in merge

Groups are always dynamic (from filer), never static (from s3.config).
Seeding from iam.groups caused stale deleted groups to persist.
Now only uses config.Groups from the dynamic filer config.

* fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect

Register t.Cleanup for the created user so it gets cleaned up
even if the test fails before the inline DeleteUser call.

* fix: assert UpdateGroup HTTP status in disabled group tests

Add require.Equal checks for 200 status after UpdateGroup calls
so the test fails immediately on API errors rather than relying
on the subsequent Eventually timeout.

* fix: trim whitespace from group name in filer store operations

Trim leading/trailing whitespace from group.Name before validation
in CreateGroup and UpdateGroup to prevent whitespace-only filenames.
Also merge groups by name during multi-file load to prevent duplicates.

* fix: add nil/empty group validation in gRPC store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics and invalid persistence.

* fix: add nil/empty group validation in postgres store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil member access and empty-name row inserts.

* fix: add name collision check in embedded IAM UpdateUser

The embedded IAM handler renamed users without checking if the
target name already existed, unlike the standalone handler.

* fix: add ErrGroupNotEmpty sentinel and map to HTTP 409

AdminServer.DeleteGroup now wraps conflict errors with
ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to
409 Conflict instead of 500.

* fix: use appropriate error message in GetGroupDetails based on status

Return "Group not found" only for 404, use "Failed to retrieve group"
for other error statuses instead of always saying "Group not found".

* fix: use backend-normalized group.Name in CreateGroup response

After credentialManager.CreateGroup may normalize the name (e.g.,
trim whitespace), use group.Name instead of the raw input for
the returned GroupData to ensure consistency.

* fix: add nil/empty group validation in memory store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil pointer dereference on map access.

* fix: reorder embedded IAM UpdateUser to verify source first

Find the source identity before checking for collisions, matching
the standalone handler's logic. Previously a non-existent user
renamed to an existing name would get EntityAlreadyExists instead
of NoSuchEntity.

* fix: handle same-directory renames in metadata subscription

Replay a delete event for the old entry name during same-directory
renames so handlers like onBucketMetadataChange can clean up stale
state for the old name.

* fix: abort GetGroups on non-ErrGroupNotFound errors

Only skip groups that return ErrGroupNotFound. Other errors (e.g.,
transient backend failures) now abort the handler and return the
error to the caller instead of silently producing partial results.

* fix: add aria-label and title to icon-only group action buttons

Add accessible labels to View and Delete buttons so screen readers
and tooltips provide meaningful context.

* fix: validate group name in saveGroup to prevent invalid filenames

Trim whitespace and reject empty names before writing group JSON
files, preventing creation of files like ".json".

* fix: add /etc/iam/groups to filer subscription watched directories

The groups directory was missing from the watched directories list,
so S3 servers in a cluster would not detect group changes made by
other servers via filer. The onIamConfigChange handler already had
code to handle group directory changes but it was never triggered.

* add direct gRPC propagation for group changes to S3 servers

Groups now have the same dual propagation as identities and policies:
direct gRPC push via propagateChange + async filer subscription.

- Add PutGroup/RemoveGroup proto messages and RPCs
- Add PutGroup/RemoveGroup in-memory cache methods on IAM
- Add PutGroup/RemoveGroup gRPC server handlers
- Update PropagatingCredentialStore to call propagateChange on group mutations

* reduce log verbosity for config load summary

Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof
to avoid noisy output on every config reload.

* admin: show user groups in view and edit user modals

- Add Groups field to UserDetails and populate from credential manager
- Show groups as badges in user details view modal
- Add group management to edit user modal: display current groups,
  add to group via dropdown, remove from group via badge x button

* fix: remove duplicate showAlert that broke modal-alerts.js

admin.js defined showAlert(type, message) which overwrote the
modal-alerts.js version showAlert(message, type), causing broken
unstyled alert boxes. Remove the duplicate and swap all callers
in admin.js to use the correct (message, type) argument order.

* fix: unwrap groups API response in edit user modal

The /api/groups endpoint returns {"groups": [...]}, not a bare array.

* Update object_store_users_templ.go

* test: assert AccessDenied error code in group denial tests

Replace plain assert.Error checks with awserr.Error type assertion
and AccessDenied code verification, matching the pattern used in
other IAM integration tests.

* fix: propagate GetGroups errors in ShowGroups handler

getGroupsPageData was swallowing errors and returning an empty page
with 200 status. Now returns the error so ShowGroups can respond
with a proper error status.

* fix: reject AttachGroupPolicy when credential manager is nil

Previously skipped policy existence validation when credentialManager
was nil, allowing attachment of nonexistent policies. Now returns
a ServiceFailureException error.

* fix: preserve groups during partial MergeS3ApiConfiguration updates

UpsertIdentity calls MergeS3ApiConfiguration with a partial config
containing only the updated identity (nil Groups). This was wiping
all in-memory group state. Now only replaces groups when
config.Groups is non-nil (full config reload).

* fix: propagate errors from group lookup in GetObjectStoreUserDetails

ListGroups and GetGroup errors were silently ignored, potentially
showing incomplete group data in the UI.

* fix: use DOM APIs for group badge remove button to prevent XSS

Replace innerHTML with onclick string interpolation with DOM
createElement + addEventListener pattern. Also add aria-label
and title to the add-to-group button.

* fix: snapshot group policies under RLock to prevent concurrent map access

evaluateIAMPolicies was copying the map reference via groupMap :=
iam.groups under RLock then iterating after RUnlock, while PutGroup
mutates the map in-place. Now copies the needed policy names into
a slice while holding the lock.

* fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers

Match the nil guard pattern used by PutPolicy/DeletePolicy to
prevent nil pointer dereference when IAM is not initialized.
2026-03-09 11:54:32 -07:00
dependabot[bot]
115dcb5ada build(deps): bump github.com/prometheus/procfs from 0.19.2 to 0.20.1 (#8578)
Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.19.2 to 0.20.1.
- [Release notes](https://github.com/prometheus/procfs/releases)
- [Commits](https://github.com/prometheus/procfs/compare/v0.19.2...v0.20.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/procfs
  dependency-version: 0.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:37:32 -07:00
dependabot[bot]
7be2d1ecfb build(deps): bump github.com/getsentry/sentry-go from 0.42.0 to 0.43.0 (#8577)
Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go) from 0.42.0 to 0.43.0.
- [Release notes](https://github.com/getsentry/sentry-go/releases)
- [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-go/compare/v0.42.0...v0.43.0)

---
updated-dependencies:
- dependency-name: github.com/getsentry/sentry-go
  dependency-version: 0.43.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:37:24 -07:00
dependabot[bot]
1272612bbd build(deps): bump docker/setup-qemu-action from 3 to 4 (#8574)
Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-qemu-action/releases)
- [Commits](https://github.com/docker/setup-qemu-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/setup-qemu-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:33:38 -07:00
dependabot[bot]
e568d85a5c build(deps): bump docker/build-push-action from 6 to 7 (#8572)
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v6...v7)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:33:17 -07:00
dependabot[bot]
f79ba1eb37 build(deps): bump docker/login-action from 3 to 4 (#8569)
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:13:06 -07:00
dependabot[bot]
b132232895 build(deps): bump docker/setup-buildx-action from 3 to 4 (#8570)
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](https://github.com/docker/setup-buildx-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:12:56 -07:00
dependabot[bot]
d765ff50e6 build(deps): bump actions/dependency-review-action from 4.8.3 to 4.9.0 (#8571)
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.8.3 to 4.9.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](05fe457637...2031cfc080)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: 4.9.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 11:12:40 -07:00
Chris Lu
bff084ff6a update go version 2026-03-09 11:12:05 -07:00