Commit Graph

119 Commits

Author SHA1 Message Date
Chris Lu
8fad85aed7 feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK (#8904)
* feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK

Add support for providing the SSE-S3 Key Encryption Key (KEK) via the
WEED_S3_SSE_KEY environment variable (hex-encoded 256-bit key). This
avoids storing the master key in plaintext on the filer at /etc/s3/sse_kek.

Key source priority:
1. WEED_S3_SSE_KEY environment variable (recommended)
2. Existing filer KEK at /etc/s3/sse_kek (backward compatible)
3. Auto-generate and save to filer (deprecated for new deployments)

Existing deployments with a filer-stored KEK continue to work unchanged.
A deprecation warning is logged when auto-generating a new filer KEK.

* refactor(s3): derive KEK from any string via HKDF instead of requiring hex

Accept any secret string in WEED_S3_SSE_KEY and derive a 256-bit key
using HKDF-SHA256 instead of requiring a hex-encoded key. This is
simpler for users — no need to generate hex, just set a passphrase.

* feat(s3): add WEED_S3_SSE_KEK and WEED_S3_SSE_KEY env vars for KEK

Two env vars for providing the SSE-S3 Key Encryption Key:

- WEED_S3_SSE_KEK: hex-encoded, same format as /etc/s3/sse_kek.
  If the filer file also exists, they must match.
- WEED_S3_SSE_KEY: any string, 256-bit key derived via HKDF-SHA256.
  Refuses to start if /etc/s3/sse_kek exists (must delete first).

Only one may be set. Existing filer-stored KEKs continue to work.
Auto-generating and storing new KEKs on filer is deprecated.

* fix(s3): stop auto-generating KEK, fail only when SSE-S3 is used

Instead of auto-generating a KEK and storing it on the filer when no
key source is configured, simply leave SSE-S3 disabled. Encrypt and
decrypt operations return a clear error directing the user to set
WEED_S3_SSE_KEK or WEED_S3_SSE_KEY.

* refactor(s3): move SSE-S3 KEK config to security.toml

Move KEK configuration from standalone env vars to security.toml's new
[sse_s3] section, following the same pattern as JWT keys and TLS certs.

  [sse_s3]
  kek = ""   # hex-encoded 256-bit key (same format as /etc/s3/sse_kek)
  key = ""   # any string, HKDF-derived

Viper's WEED_ prefix auto-mapping provides env var support:
WEED_SSE_S3_KEK and WEED_SSE_S3_KEY.

All existing behavior is preserved: filer KEK fallback, mismatch
detection, and HKDF derivation.

* refactor(s3): rename SSE-S3 config keys to s3.sse.kek / s3.sse.key

Use [s3.sse] section in security.toml, matching the existing naming
convention (e.g. [s3.*]). Env vars: WEED_S3_SSE_KEK, WEED_S3_SSE_KEY.

* fix(s3): address code review findings for SSE-S3 KEK

- Don't hold mutex during filer retry loop (up to 20s of sleep).
  Lock only to write filerClient and superKey.
- Remove dead generateAndSaveSuperKeyToFiler and unused constants.
- Return error from deriveKeyFromSecret instead of ignoring it.
- Fix outdated doc comment on InitializeWithFiler.
- Use t.Setenv in tests instead of manual os.Setenv/Unsetenv.

* fix(s3): don't block startup on filer errors when KEK is configured

- When s3.sse.kek is set, a temporarily unreachable filer no longer
  prevents startup. The filer consistency check becomes best-effort
  with a warning.
- Same treatment for s3.sse.key: filer unreachable logs a warning
  instead of failing.
- Rewrite error messages to suggest migration instead of file deletion,
  avoiding the risk of orphaning encrypted data.

Finding 3 (restore auto-generation) intentionally skipped — auto-gen
was removed by design to avoid storing plaintext KEK on filer.

* fix(test): set WEED_S3_SSE_KEY in SSE integration test server startup

SSE-S3 no longer auto-generates a KEK, so integration tests must
provide one. Set WEED_S3_SSE_KEY=test-sse-s3-key in all weed mini
invocations in the test Makefile.
2026-04-03 13:01:21 -07:00
Chris Lu
059bee683f feat(s3): add STS GetFederationToken support (#8891)
* feat(s3): add STS GetFederationToken support

Implement the AWS STS GetFederationToken API, which allows long-term IAM
users to obtain temporary credentials scoped down by an optional inline
session policy. This is useful for server-side applications that mint
per-user temporary credentials.

Key behaviors:
- Requires SigV4 authentication from a long-term IAM user
- Rejects calls from temporary credentials (session tokens)
- Name parameter (2-64 chars) identifies the federated user
- DurationSeconds supports 900-129600 (15 min to 36 hours, default 12h)
- Optional inline session policy for permission scoping
- Caller's attached policies are embedded in the JWT token
- Returns federated user ARN: arn:aws:sts::<account>:federated-user/<Name>

No performance impact on the S3 hot path — credential vending is a
separate control-plane operation, and all policy data is embedded in
the stateless JWT token.

* fix(s3): address GetFederationToken PR review feedback

- Fix Name validation: max 32 chars (not 64) per AWS spec, add regex
  validation for [\w+=,.@-]+ character whitelist
- Refactor parseDurationSeconds into parseDurationSecondsWithBounds to
  eliminate duplicated duration parsing logic
- Add sts:GetFederationToken permission check via VerifyActionPermission
  mirroring the AssumeRole authorization pattern
- Change GetPoliciesForUser to return ([]string, error) so callers fail
  closed on policy-resolution failures instead of silently returning nil
- Move temporary-credentials rejection before SigV4 verification for
  early rejection and proper test coverage
- Update tests: verify specific error message for temp cred rejection,
  add regex validation test cases (spaces, slashes rejected)

* refactor(s3): use sts.Action* constants instead of hard-coded strings

Replace hard-coded "sts:AssumeRole" and "sts:GetFederationToken" strings
in VerifyActionPermission calls with sts.ActionAssumeRole and
sts.ActionGetFederationToken package constants.

* fix(s3): pass through sts: prefix in action resolver and merge policies

Two fixes:

1. mapBaseActionToS3Format now passes through "sts:" prefix alongside
   "s3:" and "iam:", preventing sts:GetFederationToken from being
   rewritten to s3:sts:GetFederationToken in VerifyActionPermission.
   This also fixes the existing sts:AssumeRole permission checks.

2. GetFederationToken policy embedding now merges identity.PolicyNames
   (from SigV4 identity) with policies from the IAM manager (which may
   include group-attached policies), deduplicated via a map. Previously
   the IAM manager lookup was skipped when identity.PolicyNames was
   non-empty, causing group policies to be omitted from the token.

* test(s3): add integration tests for sts: action passthrough and policy merge

Action resolver tests:
- TestMapBaseActionToS3Format_ServicePrefixPassthrough: verifies s3:, iam:,
  and sts: prefixed actions pass through unchanged while coarse actions
  (Read, Write) are mapped to S3 format
- TestResolveS3Action_STSActionsPassthrough: verifies sts:AssumeRole,
  sts:GetFederationToken, sts:GetCallerIdentity pass through ResolveS3Action
  unchanged with both nil and real HTTP requests

Policy merge tests:
- TestGetFederationToken_GetPoliciesForUser: tests IAMManager.GetPoliciesForUser
  with no user store (error), missing user, user with policies, user without
- TestGetFederationToken_PolicyMergeAndDedup: tests that identity.PolicyNames
  and IAM-manager-resolved policies are merged and deduplicated (SharedPolicy
  appears in both sources, result has 3 unique policies)
- TestGetFederationToken_PolicyMergeNoManager: tests that when IAM manager is
  unavailable, identity.PolicyNames alone are embedded

* test(s3): add end-to-end integration tests for GetFederationToken

Add integration tests that call GetFederationToken using real AWS SigV4
signed HTTP requests against a running SeaweedFS instance, following the
existing pattern in test/s3/iam/s3_sts_assume_role_test.go.

Tests:
- TestSTSGetFederationTokenValidation: missing name, name too short/long,
  invalid characters, duration too short/long, malformed policy, anonymous
  rejection (7 subtests)
- TestSTSGetFederationTokenRejectTemporaryCredentials: obtains temp creds
  via AssumeRole then verifies GetFederationToken rejects them
- TestSTSGetFederationTokenSuccess: basic success, custom 1h duration,
  36h max duration with expiration time verification
- TestSTSGetFederationTokenWithSessionPolicy: creates a bucket, obtains
  federated creds with GetObject-only session policy, verifies GetObject
  succeeds and PutObject is denied using the AWS SDK S3 client
2026-04-02 17:37:05 -07:00
Chris Lu
a4b896a224 fix(s3): skip directories before marker in ListObjectVersions pagination (#8890)
* fix(s3): skip directories before marker in ListObjectVersions pagination

ListObjectVersions was re-traversing the entire directory tree from the
beginning on every paginated request, only skipping entries at the leaf
level. For buckets with millions of objects in deep hierarchies, this
caused exponentially slower responses as pagination progressed.

Two optimizations:
1. Use keyMarker to compute a startFrom position at each directory level,
   skipping directly to the relevant entry instead of scanning from the
   beginning (mirroring how ListObjects uses marker descent).
2. Skip recursing into subdirectories whose keys are entirely before the
   keyMarker.

Changes per-page cost from O(entries_before_marker) to O(tree_depth).

* test(s3): add integration test for deep-hierarchy version listing pagination

Adds TestVersioningPaginationDeepDirectoryHierarchy which creates objects
across 20 subdirectories at depth 6 (mimicking Veeam 365 backup layout)
and paginates through them with small maxKeys. Verifies correctness
(no duplicates, sorted order, all objects found) and checks that later
pages don't take dramatically longer than earlier ones — the symptom
of the pre-fix re-traversal bug. Also tests delimiter+pagination
interaction across subdirectories.

* test(s3): strengthen deep-hierarchy pagination assertions

- Replace timing warning (t.Logf) with a failing assertion (t.Errorf)
  so pagination regressions actually fail the test.
- Replace generic count/uniqueness/sort checks on CommonPrefixes with
  exact equality against the expected prefix slice, catching wrong-but-
  sorted results.

* test(s3): use allKeys for exact assertion in deep-hierarchy pagination test

Wire the allKeys slice (previously unused dead code) into the version
listing assertion, replacing generic count/uniqueness/sort checks with
an exact equality comparison against the keys that were created.
2026-04-02 15:59:52 -07:00
Chris Lu
98714b9f70 fix(test): address flaky S3 distributed lock integration test (#8888)
* fix(test): address flaky S3 distributed lock integration test

Two root causes:

1. Lock ring convergence race: After waitForFilerCount(2) confirms the
   master sees both filers, there's a window where filer0's lock ring
   still only contains itself (master's LockRingUpdate broadcast is
   delayed by the 1s stabilization timer). During this window filer0
   considers itself primary for ALL keys, so both filers can
   independently grant the same lock.

   Fix: Add waitForLockRingConverged() that acquires the same lock
   through both filers and verifies mutual exclusion before proceeding.

2. Hash function mismatch: ownerForObjectLock used util.HashStringToLong
   (MD5 + modulo) to predict lock owners, but the production DLM uses
   CRC32 consistent hashing via HashRing. This meant the test could
   pick keys that route to the same filer, not exercising the
   cross-filer coordination it intended to test.

   Fix: Use lock_manager.NewHashRing + GetPrimary() to match production
   routing exactly.

* fix(test): verify lock denial reason in convergence check

Ensure the convergence check only returns true when the second lock
attempt is denied specifically because the lock is already owned,
avoiding false positives from transient errors.

* fix(test): check one key per primary filer in convergence wait

A single arbitrary key can false-pass: if its real primary is the filer
with the stale ring, mutual exclusion holds trivially because that filer
IS the correct primary. Generate one test key per distinct primary using
the same consistent-hash ring as production, so a stale ring on any
filer is caught deterministically.
2026-04-02 13:11:04 -07:00
Chris Lu
d5068b3ee6 test: harden weed mini readiness checks 2026-03-30 16:21:38 -07:00
Chris Lu
0884acd70c ci: add S3 mutation regression coverage (#8804)
* test/s3: stabilize distributed lock regression

* ci: add S3 mutation regression workflow

* test: fix delete regression readiness probe

* test: address mutation regression review comments
2026-03-28 20:17:20 -07:00
Chris Lu
0adb78bc6b s3api: make conditional mutations atomic and AWS-compatible (#8802)
* s3api: serialize conditional write finalization

* s3api: add conditional delete mutation checks

* s3api: enforce destination conditions for copy

* s3api: revalidate multipart completion under lock

* s3api: rollback failed put finalization hooks

* s3api: report delete-marker version deletions

* s3api: fix copy destination versioning edge cases

* s3api: make versioned multipart completion idempotent

* test/s3: cover conditional mutation regressions

* s3api: rollback failed copy version finalization

* s3api: resolve suspended delete conditions via latest entry

* s3api: remove copy test null-version injection

* s3api: reject out-of-order multipart completions

* s3api: preserve multipart replay version metadata

* s3api: surface copy destination existence errors

* s3api: simplify delete condition target resolution

* test/s3: make conditional delete assertions order independent

* test/s3: add distributed lock gateway integration

* s3api: fail closed multipart versioned completion

* s3api: harden copy metadata and overwrite paths

* s3api: create delete markers for suspended deletes

* s3api: allow duplicate multipart completion parts
2026-03-27 19:22:26 -07:00
Chris Lu
ba624f1f34 Rust volume server implementation with CI (#8539)
* Match Go gRPC client transport defaults

* Honor Go HTTP idle timeout

* Honor maintenanceMBps during volume copy

* Honor images.fix.orientation on uploads

* Honor cpuprofile when pprof is disabled

* Match Go memory status payloads

* Propagate request IDs across gRPC calls

* Format pending Rust source updates

* Match Go stats endpoint payloads

* Serve Go volume server UI assets

* Enforce Go HTTP whitelist guards

* Align Rust metrics admin-port test with Go behavior

* Format pending Rust server updates

* Honor access.ui without per-request JWT checks

* Honor keepLocalDatFile in tier upload shortcut

* Honor Go remote volume write mode

* Load tier backends from master config

* Check master config before loading volumes

* Remove vif files on volume destroy

* Delete remote tier data on volume destroy

* Honor vif version defaults and overrides

* Reject mismatched vif bytes offsets

* Load remote-only tiered volumes

* Report Go tail offsets in sync status

* Stream remote dat in incremental copy

* Honor collection vif for EC shard config

* Persist EC expireAtSec in vif metadata

* Stream remote volume reads through HTTP

* Serve HTTP ranges from backend source

* Match Go ReadAllNeedles scan order

* Match Go CopyFile zero-stop metadata

* Delete EC volumes with collection cleanup

* Drop deleted collection metrics

* Match Go tombstone ReadNeedleMeta

* Match Go TTL parsing: all-digit default to minutes, two-pass fit algorithm

* Match Go needle ID/cookie formatting and name size computation

* Match Go image ext checks: webp resize only, no crop; empty healthz body

* Match Go Prometheus metric names and add missing handler counter constants

* Match Go ReplicaPlacement short string parsing with zero-padding

* Add missing EC constants MAX_SHARD_COUNT and MIN_TOTAL_DISKS

* Add walk_ecx_stats for accurate EC volume file counts and size

* Match Go VolumeStatus dat file size, EC shard stats, and disk pct precision

* Match Go needle map: unconditional delete counter, fix redb idx walk offset

* Add CompactMapSegment overflow panic guard matching Go

* Match Go volume: vif creation, version from superblock, TTL expiry, dedup data_size, garbage_level fallback

* Match Go 304 Not Modified: return bare status with no headers

* Match Go JWT error message: use "wrong jwt" instead of detailed error

* Match Go read handler bare 400, delete error prefix, download throttle timeout

* Match Go pretty JSON 1-space indent and "Deletion Failed:" error prefix

* Match Go heartbeat: keep is_heartbeating on error, add EC shard identification

* Match Go needle ReadBytes V2: tolerate EOF on truncated body

* Match Go volume: cookie check on any existing needle, return DataSize, 128KB meta guard

* Match Go DeleteCollection: propagate destroy errors

* Match Go gRPC: BatchDelete no flag, IncrementalCopy error, FetchAndWrite concurrent, VolumeUnmount/DeleteCollection errors, tail draining, query error code

* Match Go Content-Disposition RFC 6266 formatting with RFC 2231 encoding

* Match Go Guard isWriteActive: combine whitelist and signing key check

* Match Go DeleteCollectionMetrics: use partial label matching

* Match Go heartbeat: send state-only delta on volume state changes

* Match Go ReadNeedleMeta paged I/O: read header+tail only, skip data; add EIO tracking

* Match Go ScrubVolume INDEX mode dispatch; add VolumeCopy preallocation and EC NeedleStatus TODOs

* Add read_ec_shard_needle for full needle reconstruction from local EC shards

* Make heartbeat master config helpers pub for VolumeCopy preallocation

* Match Go gRPC: VolumeCopy preallocation, EC NeedleStatus full read, error message wording

* Match Go HTTP responses: omitempty fields, 2-space JSON indent, JWT JSON error, delete pretty/JSONP, 304 Last-Modified, raw write error

* Match Go WriteNeedleBlob V3 timestamp patching, fix makeup_diff double padding, count==0 read handling

* Add rebuild_ecx_file for EC index reconstruction from data shards

* Match Go gRPC: tail header first-chunk-only, EC cleanup on failure, copy append mode, ecx rebuild, compact cancellation

* Add EC volume read and delete support in HTTP handlers

* Add per-shard EC mount/unmount, location predicate search, idx directory for EC

* Add CheckVolumeDataIntegrity on volume load matching Go

* Match Go gRPC: EC multi-disk placement, per-shard mount/unmount, no auto-mount on reconstruct, streaming ReadAll/EcShardRead, ReceiveFile cleanup, version check, proxy streaming, redirect Content-Type

* Match Go heartbeat metric accounting

* Match Go duplicate UUID heartbeat retries

* Delete expired EC volumes during heartbeat

* Match Go volume heartbeat pruning

* Honor master preallocate in volume max

* Report remote storage info in heartbeats

* Emit EC heartbeat deltas on shard changes

* Match Go throttle boundary: use <= instead of <, fix pretty JSON to 1-space

* Match Go write_needle_blob monotonic appendAtNs via get_append_at_ns

* Match Go VolumeUnmount: idempotent success when volume not found

* Match Go TTL Display: return empty string when unit is Empty

Go checks `t.Unit == Empty` separately and returns "" for TTLs
with nonzero count but Empty unit. Rust only checked is_empty()
(count==0 && unit==0), so count>0 with unit=0 would format as
"5 " instead of "".

* Match Go error behavior for truncated needle data in read_body_v2

Go's readNeedleDataVersion2 returns "index out of range %d" errors
(indices 1-7) when needle body or metadata fields are truncated.
Rust was silently tolerating truncation and returning Ok. Now returns
NeedleError::IndexOutOfRange with the matching index for each field.

* Match Go download throttle: return JSON error instead of plain text

* Match Go crop params: default x1/y1 to 0 when not provided

* Match Go ScrubEcVolume: accumulate total_files from EC shards

* Match Go ScrubVolume: count total_files even on scrub error

* Match Go VolumeEcShardsCopy: set ignore_source_file_not_found for .vif

* Match Go VolumeTailSender: send needle_header on every chunk

* Match Go read_super_block: apply replication override from .vif

* Match Go check_volume_data_integrity: verify all 10 entries, detect trailing corruption

* Match Go WriteNeedleBlob: dedup check before writing during replication

* handlers: use meta-only reads for HEAD

* handlers: align range parsing and responses with Go

* handlers: align upload parsing with Go

* deps: enable webp support

* Make 5bytes the default feature for idx entry compatibility

* Match Go TTL: preserve original unit when count fits in byte

* Fix EC locate_needle: use get_actual_size for full needle size

* Fix raw body POST: only parse multipart when Content-Type contains form-data

* Match Go ReceiveFile: return protocol errors in response body, not gRPC status

* add docs

* Match Go VolumeEcShardsCopy: append to .ecj file instead of truncating

* Match Go ParsePath: support _delta suffix on file IDs for sub-file addressing

* Match Go chunk manifest: add Accept-Ranges, Content-Disposition, filename fallback, MIME detection

* Match Go privateStoreHandler: use proper JSON error for unsupported methods

* Match Go Destroy: add only_empty parameter to reject non-empty volume deletion

* Fix compilation: set_read_only_persist and set_writable return ()

These methods fire-and-forget save_vif internally, so gRPC callers
should not try to chain .map_err() on the unit return type.

* Match Go SaveVolumeInfo: check writability and propagate errors in save_vif

* Match Go VolumeDelete: propagate only_empty to delete_volume for defense in depth

The gRPC VolumeDelete handler had a pre-check for only_empty but then
passed false to store.delete_volume(), bypassing the store-level check.
Go passes req.OnlyEmpty directly to DeleteVolume. Now Rust does the same
for defense in depth against TOCTOU races (though the store write lock
makes this unlikely).

* Match Go ProcessRangeRequest: return full content for empty/oversized ranges

Go returns nil from ProcessRangeRequest when ranges are empty or total
range size exceeds content length, causing the caller to serve the full
content as a normal 200 response. Rust was returning an empty 200 body.

* Match Go Query: quote JSON keys in output records

Go's ToJson produces valid JSON with quoted keys like {"name":"Alice"}.
Rust was producing invalid JSON with unquoted keys like {name:"Alice"}.

* Match Go VolumeCopy: reject when no suitable disk location exists

Go returns ErrVolumeNoSpaceLeft when no location matches the disk type
and has sufficient space. Rust had an unsafe fallback that silently
picked the first location regardless of type or available space.

* Match Go DeleteVolumeNeedle: check noWriteOrDelete before allowing delete

Go checks v.noWriteOrDelete before proceeding with needle deletion,
returning "volume is read only" if true. Rust was skipping this check.

* Match Go ReceiveFile: prefer HardDrive location for EC and use response-level write errors

Two fixes: (1) Go prefers HardDriveType disk location for EC volumes,
falling back to first location. Returns "no storage location available"
when no locations exist. (2) Write failures are now response-level
errors (in response body) instead of gRPC status errors, matching Go.

* Match Go CopyFile: sync EC volume journal to disk before copying

Go calls ecVolume.Sync() before copying EC volume files to ensure the
.ecj journal is flushed to disk. Added sync_to_disk() to EcVolume and
call it in the CopyFile EC branch.

* Match Go readSuperBlock: propagate replication parse errors

Go returns an error when parsing the replication string from the .vif
file fails. Rust was silently ignoring the parse failure and using the
super block's replication as-is.

* Match Go TTL expiry: remove append_at_ns > 0 guard

Go computes TTL expiry from AppendAtNs without guarding against zero.
When append_at_ns is 0, the expiry is epoch + TTL which is in the past,
correctly returning NotFound. Rust's extra guard skipped the check,
incorrectly returning success for such needles.

* Match Go delete_collection: skip volumes with compaction in progress

Go checks !v.isCompactionInProgress.Load() before destroying a volume
during collection deletion, skipping compacting volumes. Also changed
destroy errors to log instead of aborting the entire collection delete.

* Match Go MarkReadonly/MarkWritable: always notify master even on local error

Go always notifies the master regardless of whether the local
set_read_only_persist or set_writable step fails. The Rust code was
using `?` which short-circuited on error, skipping the final master
notification. Save the result and defer the `?` until after the
notify call.

* Match Go PostHandler: return 500 for all write errors

Go returns 500 (InternalServerError) for all write failures. Rust was
returning 404 for volume-not-found and 403 for read-only volumes.

* Match Go makeupDiff: validate .cpd compaction revision is old + 1

Go reads the new .cpd file's super block and verifies the compaction
revision is exactly old + 1. Rust only validated the old revision.

* Match Go VolumeStatus: check data backend before returning status

Go checks v.DataBackend != nil before building the status response,
returning an error if missing. Rust was silently returning size 0.

* Match Go PostHandler: always include mime field in upload response JSON

Go always serializes the mime field even when empty ("mime":""). Rust was
omitting it when empty due to Option<String> with skip_serializing_if.

* Match Go FindFreeLocation: account for EC shards in free slot calculation

Go subtracts EC shard equivalents when computing available volume slots.
Rust was only comparing volume count, potentially over-counting free
slots on locations with many EC shards.

* Match Go privateStoreHandler: use INVALID as metrics label for unsupported methods

Go records the method as INVALID in metrics for unsupported HTTP methods.
Rust was using the actual method name.

* Match Go volume: add commit_compact guard and scrub data size validation

Two fixes: (1) commit_compact now checks/sets is_compacting flag to
prevent concurrent commits, matching Go's CompareAndSwap guard.
(2) scrub now validates total needle sizes against .dat file size.

* Match Go gRPC: fix TailSender error propagation, EcShardsInfo all slots, EcShardRead .ecx check

Three fixes: (1) VolumeTailSender now propagates binary search errors
instead of silently falling back to start. (2) VolumeEcShardsInfo
returns entries for all shard slots including unmounted. (3)
VolumeEcShardRead checks .ecx index for deletions instead of .ecj.

* Match Go metrics: add BuildInfo gauge and connection tracking functions

Go exposes a BuildInfo Prometheus metric with version labels, and tracks
open connections via stats.ConnectionOpen/Close. Added both to Rust.

* Match Go NeedleMap.Delete: use !is_deleted() instead of is_valid()

Go's CompactMap.Delete checks !IsDeleted() not IsValid(), so needles
with size==0 (live but anomalous) can still be deleted. The Rust code
was using is_valid() which returns false for size==0, preventing
deletion of such needles.

* Match Go fitTtlCount: always normalize TTL to coarsest unit

Go's fitTtlCount always converts to seconds first, then finds the
coarsest unit that fits in one byte (e.g., 120m → 2h). Rust had an
early return for count<=255 that skipped normalization, producing
different binary encodings for the same duration.

* Match Go BuildInfo metric: correct name and add missing labels

Go uses SeaweedFS_build_info (Namespace=SeaweedFS, Subsystem=build,
Name=info) with labels [version, commit, sizelimit, goos, goarch].
Rust had SeaweedFS_volumeServer_buildInfo with only [version].

* Match Go HTTP handlers: fix UploadResult fields, DiskStatus JSON, chunk manifest ETag

- UploadResult.mime: add skip_serializing_if to omit empty MIME (Go uses omitempty)
- UploadResult.contentMd5: only include when request provided Content-MD5 header
- Content-MD5 response header: only set when request provided it
- DiskStatuses: use camelCase field names (percentFree, percentUsed, diskType)
  to match Go's protobuf JSON marshaling
- Chunk manifest: preserve needle ETag in expanded response headers

* Match Go volume: fix version(), integrity check, scrub, and commit_compact

- version(): use self.version() instead of self.super_block.version in
  read_all_needles, check_volume_data_integrity, scan_raw_needles_from
  to respect volumeInfo.version override
- check_volume_data_integrity: initialize healthy_index_size to idx_size
  (matching Go) and continue on EOF instead of returning error
- scrub(): count deleted needles in total_read since they still occupy
  space in the .dat file (matches Go's totalRead += actualSize for deleted)
- commit_compact: clean up .cpd/.cpx files on makeup_diff failure
  (matches Go's error path cleanup)

* Match Go write queue: add 4MB batch byte limit

Go's startWorker breaks the batch at either 128 requests or 4MB of
accumulated write data. Rust only had the 128-request limit, allowing
large writes to accumulate unbounded latency.

* Add TTL normalization tests for Go parity verification

Test that fit_ttl_count normalizes 120m→2h, 24h→1d, 7d→1w even
when count fits in a byte, matching Go's fitTtlCount behavior.

* Match Go FindFreeLocation: account for EC shards in free slot calculation

Go's free volume count subtracts both regular volumes and EC volumes
from max_volume_count. Rust was only counting regular volumes, which
could over-report available slots when EC shards are mounted.

* Match Go EC volume: mark deletions in .ecx and replay .ecj at startup

Go's DeleteNeedleFromEcx marks needles as deleted in the .ecx index
in-place (writing TOMBSTONE_FILE_SIZE at the size field) in addition
to appending to the .ecj journal. Go's RebuildEcxFile replays .ecj
entries into .ecx on startup, then removes the .ecj file.

Rust was only appending to .ecj without marking .ecx, which meant
deleted EC needles remained readable via .ecx binary search. This
fix:
- Opens .ecx in read/write mode (was read-only)
- Adds mark_needle_deleted_in_ecx: binary search + in-place write
- Calls it from journal_delete before appending to .ecj
- Adds rebuild_ecx_from_journal: replays .ecj into .ecx on startup

* Match Go check_all_ec_shards_deleted: use MAX_SHARD_COUNT instead of hardcoded 14

Go's TotalShardsCount is DataShardsCount + ParityShardsCount = 14 by
default, but custom EC configs via .vif can have more shards (up to
MaxShardCount = 32). Using MAX_SHARD_COUNT ensures all shard files
are checked regardless of EC configuration.

* Match Go EC locate: subtract 1 from shard size and use datFileSize override

Go's LocateEcShardNeedleInterval passes shard.ecdFileSize-1 to
LocateData (shards are padded, -1 avoids overcounting large block
rows). When datFileSize is known, Go uses datFileSize/DataShards
instead. Rust was passing the raw shard file size without adjustment.

* Fix TTL parsing and DiskStatus field names to match Go exactly

TTL::read: Go's ReadTTL preserves the original unit (7d stays 7d,
not 1w) and errors on count > 255. The previous normalization change
was incorrect — Go only normalizes internally via fitTtlCount, not
during string parsing.

DiskStatus: Go uses encoding/json on protobuf structs, which reads
the json struct tags (snake_case: percent_free, percent_used,
disk_type), not the protobuf JSON names (camelCase). Revert to
snake_case to match Go's actual output.

* Fix heartbeat: check leader != current master before redirect, process duplicated UUIDs first

Match Go's volume_grpc_client_to_master.go behavior:
1. Only trigger leader redirect when the leader address differs from the
   current master (prevents unnecessary reconnect loops when master confirms
   its own address).
2. Process duplicated_uuids before leader redirect check, matching Go's
   ordering where duplicate UUID detection takes priority.

* Remove SetState version check to match Go behavior

Go's SetState unconditionally applies the state without any version
mismatch check. The Rust version had an extra optimistic concurrency
check that would reject valid requests from Go clients that don't
track versions.

* Fix TTL::read() to normalize via fit_ttl_count matching Go's ReadTTL

Go's ReadTTL calls fitTtlCount which converts to seconds and normalizes
to the coarsest unit that fits in a byte count (e.g. 120m->2h, 7d->1w,
24h->1d). The Rust version was preserving the original unit, producing
different binary encodings on disk and in heartbeat messages.

* Always return Content-MD5 header and JSON field on successful writes

Go always sets Content-MD5 in the response regardless of whether the
request included it. The Rust version was conditionally including it
only when the request provided Content-MD5.

* Include name and size in UploadResult JSON even when empty/zero

Go's encoding/json always includes empty strings and zero values in
the upload response. The Rust version was using skip_serializing_if
to omit them, causing JSON structure differences.

* Include deleted needles in scan_raw_needles_from to match Go

Go's ScanVolumeFileFrom visits ALL needles including deleted ones.
Skipping deleted entries during incremental copy would cause tombstones
to not be propagated, making deleted files reappear on the receiving side.

* Match Go NeedleMap.Delete: always write tombstone to idx file

Go's NeedleMap.Delete unconditionally writes a tombstone entry to the
idx file and updates metrics, even if the needle doesn't exist or is
already deleted. This is important for replication where every delete
operation must produce an idx write. The Rust version was skipping the
tombstone write for non-existent or already-deleted needles.

* Limit MIME type to 255 bytes matching Go's CreateNeedleFromRequest

* Title-case Seaweed-* pair keys to match Go HTTP header canonicalization

* Unify DiskType::Hdd into HardDrive to match Go's single HardDriveType

* Skip tombstone entries in walk_ecx_stats total_size matching Go's Raw()

* Return EMPTY TTL when computed seconds is zero matching Go's fitTtlCount

* Include disk-space-low in Volume.is_read_only() matching Go

* Log error on CIDR parse failure in whitelist matching Go's glog.Errorf

* Log cookie mismatch in gRPC Query matching Go's V(0).Infof

* Fix is_expired volume_size comparison to use < matching Go

Go checks `volumeSize < super_block.SuperBlockSize` (strict less-than),
but Rust used `<=`. This meant Rust would fail to expire a volume that
is exactly SUPER_BLOCK_SIZE bytes.

* Apply Go's JWT expiry defaults: 10s write, 60s read

Go calls v.SetDefault("jwt.signing.expires_after_seconds", 10) and
v.SetDefault("jwt.signing.read.expires_after_seconds", 60). Rust
defaulted to 0 for both, which meant tokens would never expire when
security.toml has a signing key but omits expires_after_seconds.

* Stop [grpc.volume].ca from overriding [grpc].ca matching Go

Go reads the gRPC CA file only from config.GetString("grpc.ca"), i.e.
the [grpc] section. The [grpc.volume] section only provides cert and
key. Rust was also reading ca from [grpc.volume] which would silently
override the [grpc].ca value when both were present.

* Fix free_volume_count to use EC shard count matching Go

Was counting EC volumes instead of EC shards, which underestimates EC
space usage. One EC volume with 14 shards uses ~1.4 volume slots, not 1.
Now uses Go's formula: ((max - volumes) * DataShardsCount - ecShardCount) / DataShardsCount.

* Include preallocate in compaction space check matching Go

Go uses max(preallocate, estimatedCompactSize) for the free space check.
Rust was only using the estimated volume size, which could start a
compaction that fails mid-way if preallocate exceeds the volume size.

* Check gzip magic bytes before setting Content-Encoding matching Go

Go checks both Accept-Encoding contains "gzip" AND IsGzippedContent
(data starts with 0x1f 0x8b) before setting Content-Encoding: gzip.
Rust only checked Accept-Encoding, which could incorrectly declare
gzip encoding for non-gzip compressed data.

* Only set upload response name when needle HasName matching Go

Go checks reqNeedle.HasName() before setting ret.Name. Rust always set
the name from the filename variable, which could return the fid portion
of the path as the name for raw PUT requests without a filename.

* Treat MaxVolumeCount==0 as unlimited matching Go's hasFreeDiskLocation

Go's hasFreeDiskLocation returns true immediately when MaxVolumeCount
is 0, treating it as unlimited. Rust was computing effective_free as
<= 0 for max==0, rejecting the location. This could fail volume
creation during early startup before the first heartbeat adjusts max.

* Read lastAppendAtNs from deleted V3 entries in integrity check

Go's doCheckAndFixVolumeData reads AppendAtNs from both live entries
(verifyNeedleIntegrity) and deleted tombstones (verifyDeletedNeedleIntegrity).
Rust was skipping deleted entries, which could result in a stale
last_append_at_ns if the last index entry is a deletion.

* Return empty body for empty/oversized range requests matching Go

Go's ProcessRangeRequest returns nil (empty body, 200 OK) when
parsed ranges are empty or combined range size exceeds total content
size. The Rust buffered path incorrectly returned the full file data
for both cases. The streaming path already handled this correctly.

* Dispatch ScrubEcVolume by mode matching Go's INDEX/LOCAL/FULL

Go's ScrubEcVolume switches on mode: INDEX calls v.ScrubIndex()
(ecx integrity only), LOCAL calls v.ScrubLocal(), FULL calls
vs.store.ScrubEcVolume(). Rust was ignoring the mode and always
running verify_ec_shards. Now INDEX mode checks ecx index integrity
(sorted overlap detection + file size validation) without shard I/O,
while LOCAL/FULL modes run the existing shard verification.

* Fix TTL test expectation: 7d normalizes to 1w matching Go's fitTtlCount

Go's ReadTTL calls fitTtlCount which normalizes to the coarsest unit
that fits: 7 days = 1 week, so "7d" becomes {Count:1, Unit:Week}
which displays as "1w". Both Go and Rust normalize identically.

* Add version mismatch check to SetState matching Go's State.Update

Go's State.Update compares the incoming version with the stored
version and returns "version mismatch" error if they differ. This
provides optimistic concurrency control. The Rust implementation
was accepting any version unconditionally.

* Use unquoted keys in Query JSON output matching Go's json.ToJson

Go's json.ToJson produces records with unquoted keys like
{score:12} not {"score":12}. This is a custom format used
internally by SeaweedFS for query results.

* Fix TTL test expectation in VolumeNeedleStatus: 7d normalizes to 1w

Same normalization as the HTTP test: Go's ReadTTL calls fitTtlCount
which converts 7 days to 1 week.

* Include ETag header in 304 Not Modified responses matching Go behavior

Go sets ETag on the response writer (via SetEtag) before the
If-Modified-Since and If-None-Match conditional checks, so both
304 response paths include the ETag header. The Rust implementation
was only adding ETag to 200 responses.

* Remove needle-name fallback in chunk manifest filename resolution

Go's tryHandleChunkedFile only falls back from URL filename to
manifest name. Rust had an extra fallback to needle.name that
Go does not perform, which could produce different
Content-Disposition filenames for chunk manifests.

* Validate JWT nbf (Not Before) claim matching Go's jwt-go/v5

Go's jwt.ParseWithClaims validates the nbf claim when present,
rejecting tokens whose nbf is in the future. The Rust jsonwebtoken
crate defaults validate_nbf to false, so tokens with future nbf
were incorrectly accepted.

* Set isHeartbeating to true at startup matching Go's VolumeServer init

Go unconditionally sets isHeartbeating: true in the VolumeServer
struct literal. Rust was starting with false when masters are
configured, causing /healthz to return 503 until the first
heartbeat succeeds.

* Call store.close() on shutdown matching Go's Shutdown()

Go's Shutdown() calls vs.store.Close() which closes all volumes
and flushes file handles. The Rust server was relying on process
exit for cleanup, which could leave data unflushed.

* Include server ID in maintenance mode error matching Go's format

Go returns "volume server %s is in maintenance mode" with the
store ID. Rust was returning a generic "maintenance mode" message.

* Fix DiskType test: use HardDrive variant matching Go's HddType=""

Go maps both "" and "hdd" to HardDriveType (empty string). The
Rust enum variant is HardDrive, not Hdd. The test referenced a
nonexistent Hdd variant causing compilation failure.

* Do not include ETag in 304 responses matching Go's GetOrHeadHandler

Go sets ETag at L235 AFTER the If-Modified-Since and If-None-Match
304 return paths, so Go's 304 responses do not include the ETag header.
The Rust code was incorrectly including ETag in both 304 response paths.

* Return 400 on malformed query strings in PostHandler matching Go's ParseForm

Go's r.ParseForm() returns HTTP 400 with "form parse error: ..." when
the query string is malformed. Rust was silently falling back to empty
query params via unwrap_or_default().

* Load EC volume version from .vif matching Go's NewEcVolume

Go sets ev.Version = needle.Version(volumeInfo.Version) from the .vif
file. Rust was always using Version::current() (V3), which would produce
wrong needle actual size calculations for volumes created with V1 or V2.

* Sync .ecx file before close matching Go's EcVolume.Close

Go calls ev.ecxFile.Sync() before closing to ensure in-place deletion
marks are flushed to disk. Without this, deletion marks written via
MarkNeedleDeleted could be lost on crash.

* Validate SuperBlock extra data size matching Go's Bytes() guard

Go checks extraSize > 256*256-2 and calls glog.Fatalf to prevent
corrupt super block headers. Rust was silently truncating via u16 cast,
which would write an incorrect extra_size field.

* Update quinn-proto 0.11.13 -> 0.11.14 to fix GHSA-6xvm-j4wr-6v98

Fixes Dependency Review CI failure: quinn-proto < 0.11.14 is vulnerable
to unauthenticated remote DoS via panic in QUIC transport parameter
parsing.

* Skip TestMultipartUploadUsesFormFieldsForTimestampAndTTL for Go server

Go's r.FormValue() cannot read multipart text fields after
r.MultipartReader() consumes the body, so ts/ttl sent as multipart
form fields only work with the Rust volume server. Skip this test
when VOLUME_SERVER_IMPL != "rust" to fix CI failure.

* Flush .ecx in EC volume sync_to_disk matching Go's Sync()

Go's EcVolume.Sync() flushes both the .ecj journal and the .ecx index
to disk. The Rust version only flushed .ecj, leaving in-place deletion
marks in .ecx unpersisted until close(). This could cause data
inconsistency if the server crashes after marking a needle deleted in
.ecx but before close().

* Remove .vif file in EC volume destroy matching Go's Destroy()

Go's EcVolume.Destroy() removes .ecx, .ecj, and .vif files. The Rust
version only removed .ecx and .ecj, leaving orphaned .vif files on
disk after EC volume destruction (e.g., after TTL expiry).

* Fix is_expired to use <= for SuperBlockSize check matching Go

Go checks contentSize <= SuperBlockSize to detect empty volumes (no
needles). Rust used < which would incorrectly allow a volume with
exactly SuperBlockSize bytes (header only, no data) to proceed to
the TTL expiry check and potentially be marked as expired.

* Fix read_append_at_ns to read timestamps from tombstone entries

Go reads the full needle body for all entries including tombstones
(deleted needles with size=0) to extract the actual AppendAtNs
timestamp. The Rust version returned 0 early for size <= 0 entries,
which would cause the binary search in incremental copy to produce
incorrect results for positions containing deleted needles.

Now uses get_actual_size to compute the on-disk size (which handles
tombstones correctly) and only returns 0 when the actual size is 0.

* Add X-Request-Id response header matching Go's requestIDMiddleware

Go sets both X-Request-Id and x-amz-request-id response headers.
The Rust server only set x-amz-request-id, missing X-Request-Id.

* Add skip_serializing_if for UploadResult name and size fields

Go's UploadResult uses json:"name,omitempty" and json:"size,omitempty",
omitting these fields from JSON when they are zero values (empty
string / 0). The Rust struct always serialized them, producing
"name":"" and "size":0 where Go would omit them.

* Support JSONP/pretty-print for write success responses

Go's writeJsonQuiet checks for callback (JSONP) and pretty query
parameters on all JSON responses including write success. The Rust
write success path used axum::Json directly, bypassing JSONP and
pretty-print support. Now uses json_result_with_query to match Go.

* Include actual limit in file size limit error message

Go returns "file over the limited %d bytes" with the actual limit
value included. Rust returned a generic "file size limit exceeded"
without the limit value, making it harder to debug.

* Extract extension from 2-segment URL paths for image operations

Go's parseURLPath extracts the file extension from all URL formats
including 2-segment paths like /vid,fid.jpg. The Rust version only
handled 3-segment paths (/vid/fid/filename.ext), so extensions in
2-segment paths were lost. This caused image resize/crop operations
requested via query params to be silently skipped for those paths.

* Add size_hint to TrackedBody so throttled downloads get Content-Length

TrackedBody (used for download throttling) did not implement
size_hint(), causing HTTP/1.1 to fall back to chunked transfer
encoding instead of setting Content-Length. Go always sets
Content-Length explicitly for non-range responses.

* Add Last-Modified, pairs, and S3 headers to chunk manifest responses

Go sets Last-Modified, needle pairs, and S3 pass-through headers on
the response writer BEFORE calling tryHandleChunkedFile. Since the
Rust chunk manifest handler created fresh response headers and
returned early, these headers were missing from chunk manifest
responses. Now passes last_modified_str into the chunk manifest
handler and applies pairs and S3 pass-through query params
(response-cache-control, response-content-encoding, etc.) to the
chunk manifest response headers.

* Fix multipart fallback to use first part data when no filename

Go reads the first part's data unconditionally, then looks for a
part with a filename. If none found, Go uses the first part's data
(with empty filename). Rust only captured parts with filenames, so
when no part had a filename it fell back to the raw multipart body
bytes (including boundary delimiters), producing corrupt needle data.

* Set HasName and HasMime flags for empty values matching Go

Go's CreateNeedleFromRequest sets HasName and HasMime flags even when
the filename or MIME type is empty (len < 256 is true for len 0).
Rust skipped empty values, causing the on-disk needle format to
differ: Go-written needles include extra bytes for the empty name/mime
size fields, changing the serialized needle size in the idx entry.
This ensures binary format compatibility between Go and Rust servers.

* Add is_stopping guard to vacuum_volume_commit matching Go

Go's CommitCompactVolume (store_vacuum.go L53-54) checks
s.isStopping before committing compaction to prevent file
swaps during shutdown. The Rust handler was missing this
check, which could allow compaction commits while the
server is stopping.

* Remove disk_type from required status fields since Go omits it

Go's default DiskType is "" (HardDriveType), and protobuf's omitempty
tag causes empty strings to be dropped from JSON output.

* test: honor rust env in dual volume harness

* grpc: notify master after volume lifecycle changes

* http: proxy to replicas before download-limit timeout

* test: pass readMode to rust volume harnesses

* fix store free-location predicate selection

* fix volume copy disk placement and heartbeat notification

* fix chunk manifest delete replication

* fix write replication to survive client disconnects

* fix download limit proxy and wait flow

* fix crop gating for streamed reads

* fix upload limit wait counter behavior

* fix chunk manifest image transforms

* fix has_resize_ops to check width/height > 0 instead of is_some()

Go's shouldResizeImages condition is `width > 0 || height > 0`, so
`?width=0` correctly evaluates to false. Rust was using `is_some()`
which made `?width=0` evaluate to true, unnecessarily disabling
streaming reads for those requests.

* fix Content-MD5 to only compute and return when provided by client

Go only computes the MD5 of uncompressed data when a Content-MD5
header or multipart field is provided. Rust was always computing and
returning it. Also fix the mismatch error message to include size,
matching Go's format.

* fix save_vif to compute ExpireAtSec from TTL

Go's SaveVolumeInfo always computes ExpireAtSec = now + ttlSeconds
when the volume has a TTL. The save_vif path (used by set_read_only
and set_writable) was missing this computation, causing .vif files
to be written without the correct expiration timestamp for TTL volumes.

* fix set_writable to not modify no_write_can_delete

Go's MarkVolumeWritable only sets noWriteOrDelete=false and persists.
Rust was additionally setting no_write_can_delete=has_remote_file,
which could incorrectly change the write mode for remote-file volumes
when the master explicitly asks to make the volume writable.

* fix write_needle_blob_and_index to error on too-small V3 blob

Go returns an error when the needle blob is too small for timestamp
patching. Rust was silently skipping the patch and writing the blob
with a stale/zero timestamp, which could cause data integrity issues
during incremental replication that relies on AppendAtNs ordering.

* fix VolumeEcShardsToVolume to validate dataShards range

Go validates that dataShards is > 0 and <= MaxShardCount before
proceeding with EC-to-volume reconstruction. Without this check,
a zero or excessively large data_shards value could cause confusing
downstream failures.

* fix destroy to use VolumeError::NotEmpty instead of generic Io error

The dedicated NotEmpty variant exists in the enum but was not being
used. This makes error matching consistent with Go's ErrVolumeNotEmpty.

* fix SetState to persist state to disk with rollback on failure

Go's State.Update saves VolumeServerState to a state.pb file after
each SetState call, and rolls back the in-memory state if persistence
fails. Rust was only updating in-memory atomics, so maintenance mode
would be lost on server restart. Now saves protobuf-encoded state.pb
and loads it on startup.

* fix VolumeTierMoveDatToRemote to close local dat backend after upload

Go calls v.LoadRemoteFile() after saving volume info, which closes
the local DataBackend before transitioning to remote storage. Without
this, the volume holds a stale file handle to the deleted local .dat
file, causing reads to fail until server restart.

* fix VolumeTierMoveDatFromRemote to close remote dat backend after download

Go calls v.DataBackend.Close() and sets DataBackend=nil after removing
the remote file reference. Without this, the stale remote backend
state lingers and reads may not discover the newly downloaded local
.dat file until server restart.

* fix redirect to use internal url instead of public_url

Go's proxyReqToTargetServer builds the redirect Location header from
loc.Url (the internal URL), not publicUrl. Using public_url could
cause redirect failures when internal and external URLs differ.

* fix redirect test and add state_file_path to integration test

Update redirect unit test to expect internal url (matching the
previous fix). Add missing state_file_path field to the integration
test VolumeServerState constructor.

* fix FetchAndWriteNeedle to await all writes before checking errors

Go uses a WaitGroup to await all writes (local + replicas) before
checking errors. Rust was short-circuiting on local write failure,
which could leave replica writes in-flight without waiting for
completion.

* fix shutdown to send deregister heartbeat before pre_stop delay

Go's StopHeartbeat() closes stopChan immediately on interrupt, causing
the heartbeat goroutine to send the deregister heartbeat right away,
before the preStopSeconds delay. Rust was only setting is_stopping=true
without waking the heartbeat loop, so the deregister was delayed until
after the pre_stop sleep. Now we call volume_state_notify.notify_one()
to wake the heartbeat immediately.

* fix heartbeat response ordering to check duplicate UUIDs first

Go processes heartbeat responses in this order: DuplicatedUuids first,
then volume options (prealloc/size limit), then leader redirect. Rust
was applying volume options before checking for duplicate UUIDs, which
meant volume option changes would take effect even when the response
contained a duplicate UUID error that should cause an immediate return.

* the test thread was blocked

* fix(deps): update aws-lc-sys 0.38.0 → 0.39.0 to resolve security advisories

Bumps aws-lc-rs 1.16.1 → 1.16.2, pulling in aws-lc-sys 0.39.0 which
fixes GHSA-394x-vwmw-crm3 (X.509 Name Constraints wildcard/unicode
bypass) and GHSA-9f94-5g5w-gf6r (CRL Distribution Point scope check
logic error).

* fix: match Go Content-MD5 mismatch error message format

Go uses "Content-MD5 did not match md5 of file data expected [X]
received [Y] size Z" while Rust had a shorter format. Match the
exact Go error string so clients see identical messages.

* fix: match Go Bearer token length check (> 7, not >= 7)

Go requires len(bearer) > 7 ensuring at least one char after
"Bearer ". Rust used >= 7 which would accept an empty token.

* fix(deps): drop legacy rustls 0.21 to resolve rustls-webpki GHSA-pwjx-qhcg-rvj4

aws-sdk-s3's default "rustls" feature enables tls-rustls in
aws-smithy-runtime, which pulls in legacy-rustls-ring (rustls 0.21
→ rustls-webpki 0.101.7, moderate CRL advisory). Replace with
explicit default-https-client which uses only rustls 0.23 /
rustls-webpki 0.103.9.

* fix: use uploaded filename for auto-compression extension detection

Go extracts the file extension from pu.FileName (the uploaded
filename) for auto-compression decisions. Rust was using the URL
path, which typically has no extension for SeaweedFS file IDs.

* fix: add CRC legacy Value() backward-compat check on needle read

Go double-checks CRC: n.Checksum != crc && uint32(n.Checksum) !=
crc.Value(). The Value() path is a deprecated transform for compat
with seaweed versions prior to commit 056c480eb. Rust had the
legacy_value() method but wasn't using it in validation.

* fix: remove /stats/* endpoints to match Go (commented out since L130)

Go's volume_server.go has the /stats/counter, /stats/memory, and
/stats/disk endpoints commented out (lines 130-134). Remove them
from the Rust router along with the now-unused whitelist_guard
middleware.

* fix: filter application/octet-stream MIME for chunk manifests

Go's tryHandleChunkedFile (L334) filters out application/octet-stream
from chunk manifest MIME types, falling back to extension-based
detection. Rust was returning the stored MIME as-is for manifests.

* fix: VolumeMarkWritable returns error before notifying master

Go returns early at L200 if MarkVolumeWritable fails, before
reaching the master notification at L206. Rust was notifying master
even on failure, creating inconsistent state where master thinks
the volume is writable but local marking failed.

* fix: check volume existence before maintenance in MarkReadonly/Writable

Go's VolumeMarkReadonly (L239-241) and VolumeMarkWritable (L253-255)
look up the volume first, then call makeVolumeReadonly/Writable which
checks maintenance. Rust was checking maintenance first, returning
"maintenance mode" instead of "not found" for missing volumes.

* feat: implement ScrubVolume mark_broken_volumes_readonly (PR #8360)

Add the mark_broken_volumes_readonly flag from PR #8360:
- Sync proto field (tag 3) to local volume_server.proto
- After scrubbing, if flag is set, call makeVolumeReadonly on each
  broken volume (notify master, mark local readonly, notify again)
- Collect errors via joined error semantics matching Go's errors.Join
- Factor out make_volume_readonly helper reused by both
  VolumeMarkReadonly and ScrubVolume

Also refactors VolumeMarkReadonly to use the shared helper.

* fix(deps): update rustls-webpki 0.103.9 → 0.103.10 (GHSA-pwjx-qhcg-rvj4)

CRL Distribution Point matching logic fix for moderate severity
advisory about CRLs not considered authoritative.

* test: update integration tests for removed /stats/* endpoints

Replace tests that expected /stats/* routes to return 200/401 with
tests confirming they now fall through to the store handler (400),
matching Go's commented-out stats endpoints.

* docs: fix misleading comment about default offset feature

The comment said "4-byte offsets unless explicitly built with 5-byte
support" but the default feature enables 5bytes. This is intentional
for production parity with Go -tags 5BytesOffset builds. Fix the
comment to match reality.
2026-03-26 17:24:35 -07:00
Chris Lu
e5f72077ee fix: resolve CORS cache race condition causing stale 404 responses (#8748)
The metadata subscription handler (updateBucketConfigCacheFromEntry) was
making a separate RPC call via loadCORSFromBucketContent to load CORS
configuration. This created a race window where a slow CreateBucket
subscription event could re-cache stale data after PutBucketCors had
already cleared the cache, causing subsequent GetBucketCors to return
404 NoSuchCORSConfiguration.

Parse CORS directly from the subscription entry's Content field instead
of making a separate RPC. Also fix getBucketConfig to parse CORS from
the already-fetched entry, eliminating a redundant RPC call.

Fix TestCORSCaching to use require.NoError to prevent nil pointer
dereference panics when GetBucketCors fails.
2026-03-23 19:33:20 -07:00
Chris Lu
d5ee35c8df Fix S3 delete for non-empty directory markers (#8740)
* Fix S3 delete for non-empty directory markers

* Address review feedback on directory marker deletes

* Stabilize FUSE concurrent directory operations
2026-03-23 13:35:16 -07:00
Chris Lu
d6a872c4b9 Preserve explicit directory markers with octet-stream MIME (#8726)
* Preserve octet-stream MIME on explicit directory markers

* Run empty directory marker regression in CI

* Run S3 Spark workflow for filer changes
2026-03-21 19:31:56 -07:00
Chris Lu
80f3079d2a fix(s3): include directory markers in ListObjects without delimiter (#8704)
* fix(s3): include directory markers in ListObjects without delimiter (#8698)

Directory key objects (zero-byte objects with keys ending in "/") created
via PutObject were omitted from ListObjects/ListObjectsV2 results when no
delimiter was specified. AWS S3 includes these as regular keys in Contents.

The issue was in doListFilerEntries: when recursing into directories in
non-delimiter mode, directory key objects were only emitted when
prefixEndsOnDelimiter was true. Added an else branch to emit them in the
general recursive case as well.

* remove issue reference from inline comment

* test: add child-under-marker and paginated listing coverage

Extend test 6 to place a child object under the directory marker
and paginate with MaxKeys=1 so the emit-then-recurse truncation
path is exercised.

* fix(test): skip directory markers in Spark temporary artifacts check

The listing check now correctly shows directory markers (keys ending
in "/") after the ListObjects fix. These 0-byte metadata objects are
not data artifacts — filter them from the listing check since the
HeadObject-based check already verifies their cleanup with a timeout.
2026-03-19 15:36:11 -07:00
Chris Lu
9984ce7dcb fix(s3): omit NotResource:null from bucket policy JSON response (#8658)
* fix(s3): omit NotResource:null from bucket policy JSON response (#8657)

Change NotResource from value type to pointer (*StringOrStringSlice) so
that omitempty properly omits it when unset, matching the existing
Principal field pattern. This prevents IaC tools (Terraform, Ansible)
from detecting false configuration drift.

Add bucket policy round-trip idempotency integration tests.

* simplify JSON comparison in bucket policy idempotency test

Use require.JSONEq directly on the raw JSON strings instead of
round-tripping through unmarshal/marshal, since JSONEq already
handles normalization internally.

* fix bucket policy test cases that locked out the admin user

The Deny+NotResource test cases used Action:"s3:*" which denied the
admin's own GetBucketPolicy call. Scope deny to s3:GetObject only,
and add an Allow+NotResource variant instead.

* fix(s3): also make Resource a pointer to fix empty string in JSON

Apply the same omitempty pointer fix to the Resource field, which was
emitting "Resource":"" when only NotResource was set. Add
NewStringOrStringSlicePtr helper, make Strings() nil-safe, and handle
*StringOrStringSlice in normalizeToStringSliceWithError.

* improve bucket policy integration tests per review feedback

- Replace time.Sleep with waitForClusterReady using ListBuckets
- Use structural hasKey check instead of brittle substring NotContains
- Assert specific NoSuchBucketPolicy error code after delete
- Handle single-statement policies in hasKey helper
2026-03-16 12:58:26 -07:00
Chris Lu
8cde3d4486 Add data file compaction to iceberg maintenance (Phase 2) (#8503)
* Add iceberg_maintenance plugin worker handler (Phase 1)

Implement automated Iceberg table maintenance as a new plugin worker job
type. The handler scans S3 table buckets for tables needing maintenance
and executes operations in the correct Iceberg order: expire snapshots,
remove orphan files, and rewrite manifests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add data file compaction to iceberg maintenance handler (Phase 2)

Implement bin-packing compaction for small Parquet data files:
- Enumerate data files from manifests, group by partition
- Merge small files using parquet-go (read rows, write merged output)
- Create new manifest with ADDED/DELETED/EXISTING entries
- Commit new snapshot with compaction metadata

Add 'compact' operation to maintenance order (runs before expire_snapshots),
configurable via target_file_size_bytes and min_input_files thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix memory exhaustion in mergeParquetFiles by processing files sequentially

Previously all source Parquet files were loaded into memory simultaneously,
risking OOM when a compaction bin contained many small files. Now each file
is loaded, its rows are streamed into the output writer, and its data is
released before the next file is loaded — keeping peak memory proportional
to one input file plus the output buffer.

* Validate bucket/namespace/table names against path traversal

Reject names containing '..', '/', or '\' in Execute to prevent
directory traversal via crafted job parameters.

* Add filer address failover in iceberg maintenance handler

Try each filer address from cluster context in order instead of only
using the first one. This improves resilience when the primary filer
is temporarily unreachable.

* Add separate MinManifestsToRewrite config for manifest rewrite threshold

The rewrite_manifests operation was reusing MinInputFiles (meant for
compaction bin file counts) as its manifest count threshold. Add a
dedicated MinManifestsToRewrite field with its own config UI section
and default value (5) so the two thresholds can be tuned independently.

* Fix risky mtime fallback in orphan removal that could delete new files

When entry.Attributes is nil, mtime defaulted to Unix epoch (1970),
which would always be older than the safety threshold, causing the
file to be treated as eligible for deletion. Skip entries with nil
Attributes instead, matching the safer logic in operations.go.

* Fix undefined function references in iceberg_maintenance_handler.go

Use the exported function names (ShouldSkipDetectionByInterval,
BuildDetectorActivity, BuildExecutorActivity) matching their
definitions in vacuum_handler.go.

* Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage

The IcebergMaintenanceHandler and its compaction code in the parent
pluginworker package duplicated the logic already present in the
iceberg/ subpackage (which self-registers via init()). The old code
lacked stale-plan guards, proper path normalization, CAS-based xattr
updates, and error-returning parseOperations.

Since the registry pattern (default "all") makes the old handler
unreachable, remove it entirely. All functionality is provided by
iceberg.Handler with the reviewed improvements.

* Fix MinManifestsToRewrite clamping to match UI minimum of 2

The clamp reset values below 2 to the default of 5, contradicting the
UI's advertised MinValue of 2. Clamp to 2 instead.

* Sort entries by size descending in splitOversizedBin for better packing

Entries were processed in insertion order which is non-deterministic
from map iteration. Sorting largest-first before the splitting loop
improves bin packing efficiency by filling bins more evenly.

* Add context cancellation check to drainReader loop

The row-streaming loop in drainReader did not check ctx between
iterations, making long compaction merges uncancellable. Check
ctx.Done() at the top of each iteration.

* Fix splitOversizedBin to always respect targetSize limit

The minFiles check in the split condition allowed bins to grow past
targetSize when they had fewer than minFiles entries, defeating the
OOM protection. Now bins always split at targetSize, and a trailing
runt with fewer than minFiles entries is merged into the previous bin.

* Add integration tests for iceberg table maintenance plugin worker

Tests start a real weed mini cluster, create S3 buckets and Iceberg
table metadata via filer gRPC, then exercise the iceberg.Handler
operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against
the live filer. A full maintenance cycle test runs all operations in
sequence and verifies metadata consistency.

Also adds exported method wrappers (testing_api.go) so the integration
test package can call the unexported handler methods.

* Fix splitOversizedBin dropping files and add source path to drainReader errors

The runt-merge step could leave leading bins with fewer than minFiles
entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop
the first 80-byte file). Replace the filter-based approach with an
iterative merge that folds any sub-minFiles bin into its smallest
neighbor, preserving all eligible files.

Also add the source file path to drainReader error messages so callers
can identify which Parquet file caused a read/write failure.

* Harden integration test error handling

- s3put: fail immediately on HTTP 4xx/5xx instead of logging and
  continuing
- lookupEntry: distinguish NotFound (return nil) from unexpected RPC
  errors (fail the test)
- writeOrphan and orphan creation in FullMaintenanceCycle: check
  CreateEntryResponse.Error in addition to the RPC error

* go fmt

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 11:27:42 -07:00
Chris Lu
3d9f7f6f81 go 1.25 2026-03-09 23:10:27 -07:00
Chris Lu
992db11d2b iam: add IAM group management (#8560)
* iam: add Group message to protobuf schema

Add Group message (name, members, policy_names, disabled) and
add groups field to S3ApiConfiguration for IAM group management
support (issue #7742).

* iam: add group CRUD to CredentialStore interface and all backends

Add group management methods (CreateGroup, GetGroup, DeleteGroup,
ListGroups, UpdateGroup) to the CredentialStore interface with
implementations for memory, filer_etc, postgres, and grpc stores.
Wire group loading/saving into filer_etc LoadConfiguration and
SaveConfiguration.

* iam: add group IAM response types

Add XML response types for group management IAM actions:
CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser.

* iam: add group management handlers to embedded IAM API

Add CreateGroup, DeleteGroup, GetGroup, ListGroups, AddUserToGroup,
RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, and ListGroupsForUser handlers with
dispatch in ExecuteAction.

* iam: add group management handlers to standalone IAM API

Add group handlers (CreateGroup, DeleteGroup, GetGroup, ListGroups,
AddUserToGroup, RemoveUserFromGroup, AttachGroupPolicy, DetachGroupPolicy,
ListAttachedGroupPolicies, ListGroupsForUser) and wire into DoActions
dispatch. Also add helper functions for user/policy side effects.

* iam: integrate group policies into authorization

Add groups and userGroups reverse index to IdentityAccessManagement.
Populate both maps during ReplaceS3ApiConfiguration and
MergeS3ApiConfiguration. Modify evaluateIAMPolicies to evaluate
policies from user's enabled groups in addition to user policies.
Update VerifyActionPermission to consider group policies when
checking hasAttachedPolicies.

* iam: add group side effects on user deletion and rename

When a user is deleted, remove them from all groups they belong to.
When a user is renamed, update group membership references. Applied
to both embedded and standalone IAM handlers.

* iam: watch /etc/iam/groups directory for config changes

Add groups directory to the filer subscription watcher so group
file changes trigger IAM configuration reloads.

* admin: add group management page to admin UI

Add groups page with CRUD operations, member management, policy
attachment, and enable/disable toggle. Register routes in admin
handlers and add Groups entry to sidebar navigation.

* test: add IAM group management integration tests

Add comprehensive integration tests for group CRUD, membership,
policy attachment, policy enforcement, disabled group behavior,
user deletion side effects, and multi-group membership. Add
"group" test type to CI matrix in s3-iam-tests workflow.

* iam: address PR review comments for group management

- Fix XSS vulnerability in groups.templ: replace innerHTML string
  concatenation with DOM APIs (createElement/textContent) for rendering
  member and policy lists
- Use userGroups reverse index in embedded IAM ListGroupsForUser for
  O(1) lookup instead of iterating all groups
- Add buildUserGroupsIndex helper in standalone IAM handlers; use it
  in ListGroupsForUser and removeUserFromAllGroups for efficient lookup
- Add note about gRPC store load-modify-save race condition limitation

* iam: add defensive copies, validation, and XSS fixes for group management

- Memory store: clone groups on store/retrieve to prevent mutation
- Admin dash: deep copy groups before mutation, validate user/policy exists
- HTTP handlers: translate credential errors to proper HTTP status codes,
  use *bool for Enabled field to distinguish missing vs false
- Groups templ: use data attributes + event delegation instead of inline
  onclick for XSS safety, prevent stale async responses

* iam: add explicit group methods to PropagatingCredentialStore

Add CreateGroup, GetGroup, DeleteGroup, ListGroups, and UpdateGroup
methods instead of relying on embedded interface fallthrough. Group
changes propagate via filer subscription so no RPC propagation needed.

* iam: detect postgres unique constraint violation and add groups index

Return ErrGroupAlreadyExists when INSERT hits SQLState 23505 instead of
a generic error. Add index on groups(disabled) for filtered queries.

* iam: add Marker field to group list response types

Add Marker string field to GetGroupResult, ListGroupsResult,
ListAttachedGroupPoliciesResult, and ListGroupsForUserResult to
match AWS IAM pagination response format.

* iam: check group attachment before policy deletion

Reject DeletePolicy if the policy is attached to any group, matching
AWS IAM behavior. Add PolicyArn to ListAttachedGroupPolicies response.

* iam: include group policies in IAM authorization

Merge policy names from user's enabled groups into the IAMIdentity
used for authorization, so group-attached policies are evaluated
alongside user-attached policies.

* iam: check for name collision before renaming user in UpdateUser

Scan identities and inline policies for newUserName before mutating,
returning EntityAlreadyExists if a collision is found. Reuse the
already-loaded policies instead of loading them again inside the loop.

* test: use t.Cleanup for bucket cleanup in group policy test

* iam: wrap ErrUserNotInGroup sentinel in RemoveGroupMember error

Wrap credential.ErrUserNotInGroup so errors.Is works in
groupErrorToHTTPStatus, returning proper 400 instead of 500.

* admin: regenerate groups_templ.go with XSS-safe data attributes

Regenerated from groups.templ which uses data-group-name attributes
instead of inline onclick with string interpolation.

* iam: add input validation and persist groups during migration

- Validate nil/empty group name in CreateGroup and UpdateGroup
- Save groups in migrateToMultiFile so they survive legacy migration

* admin: use groupErrorToHTTPStatus in GetGroupMembers and GetGroupPolicies

* iam: short-circuit UpdateUser when newUserName equals current name

* iam: require empty PolicyNames before group deletion

Reject DeleteGroup when group has attached policies, matching the
existing members check. Also fix GetGroup error handling in
DeletePolicy to only skip ErrGroupNotFound, not all errors.

* ci: add weed/pb/** to S3 IAM test trigger paths

* test: replace time.Sleep with require.Eventually for propagation waits

Use polling with timeout instead of fixed sleeps to reduce flakiness
in integration tests waiting for IAM policy propagation.

* fix: use credentialManager.GetPolicy for AttachGroupPolicy validation

Policies created via CreatePolicy through credentialManager are stored
in the credential store, not in s3cfg.Policies (which only has static
config policies). Change AttachGroupPolicy to use credentialManager.GetPolicy()
for policy existence validation.

* feat: add UpdateGroup handler to embedded IAM API

Add UpdateGroup action to enable/disable groups and rename groups
via the IAM API. This is a SeaweedFS extension (not in AWS SDK) used
by tests to toggle group disabled status.

* fix: authenticate raw IAM API calls in group tests

The embedded IAM endpoint rejects anonymous requests. Replace
callIAMAPI with callIAMAPIAuthenticated that uses JWT bearer token
authentication via the test framework.

* feat: add UpdateGroup handler to standalone IAM API

Mirror the embedded IAM UpdateGroup handler in the standalone IAM API
for parity.

* fix: add omitempty to Marker XML tags in group responses

Non-truncated responses should not emit an empty <Marker/> element.

* fix: distinguish backend errors from missing policies in AttachGroupPolicy

Return ServiceFailure for credential manager errors instead of masking
them as NoSuchEntity. Also switch ListGroupsForUser to use s3cfg.Groups
instead of in-memory reverse index to avoid stale data. Add duplicate
name check to UpdateGroup rename.

* fix: standalone IAM AttachGroupPolicy uses persisted policy store

Check managed policies from GetPolicies() instead of s3cfg.Policies
so dynamically created policies are found. Also add duplicate name
check to UpdateGroup rename.

* fix: rollback inline policies on UpdateUser PutPolicies failure

If PutPolicies fails after moving inline policies to the new username,
restore both the identity name and the inline policies map to their
original state to avoid a partial-write window.

* fix: correct test cleanup ordering for group tests

Replace scattered defers with single ordered t.Cleanup in each test
to ensure resources are torn down in reverse-creation order:
remove membership, detach policies, delete access keys, delete users,
delete groups, delete policies. Move bucket cleanup to parent test
scope and delete objects before bucket.

* fix: move identity nil check before map lookup and refine hasAttachedPolicies

Move the nil check on identity before accessing identity.Name to
prevent panic. Also refine hasAttachedPolicies to only consider groups
that are enabled and have actual policies attached, so membership in
a no-policy group doesn't incorrectly trigger IAM authorization.

* fix: fail group reload on unreadable or corrupt group files

Return errors instead of logging and continuing when group files
cannot be read or unmarshaled. This prevents silently applying a
partial IAM config with missing group memberships or policies.

* fix: use errors.Is for sql.ErrNoRows comparison in postgres group store

* docs: explain why group methods skip propagateChange

Group changes propagate to S3 servers via filer subscription
(watching /etc/iam/groups/) rather than gRPC RPCs, since there
are no group-specific RPCs in the S3 cache protocol.

* fix: remove unused policyNameFromArn and strings import

* fix: update service account ParentUser on user rename

When renaming a user via UpdateUser, also update ParentUser references
in service accounts to prevent them from becoming orphaned after the
next configuration reload.

* fix: wrap DetachGroupPolicy error with ErrPolicyNotAttached sentinel

Use credential.ErrPolicyNotAttached so groupErrorToHTTPStatus maps
it to 400 instead of falling back to 500.

* fix: use admin S3 client for bucket cleanup in enforcement test

The user S3 client may lack permissions by cleanup time since the
user is removed from the group in an earlier subtest. Use the admin
S3 client to ensure bucket and object cleanup always succeeds.

* fix: add nil guard for group param in propagating store log calls

Prevent potential nil dereference when logging group.Name in
CreateGroup and UpdateGroup of PropagatingCredentialStore.

* fix: validate Disabled field in UpdateGroup handlers

Reject values other than "true" or "false" with InvalidInputException
instead of silently treating them as false.

* fix: seed mergedGroups from existing groups in MergeS3ApiConfiguration

Previously the merge started with empty group maps, dropping any
static-file groups. Now seeds from existing iam.groups before
overlaying dynamic config, and builds the reverse index after
merging to avoid stale entries from overridden groups.

* fix: use errors.Is for filer_pb.ErrNotFound comparison in group loading

Replace direct equality (==) with errors.Is() to correctly match
wrapped errors, consistent with the rest of the codebase.

* fix: add ErrUserNotFound and ErrPolicyNotFound to groupErrorToHTTPStatus

Map these sentinel errors to 404 so AddGroupMember and
AttachGroupPolicy return proper HTTP status codes.

* fix: log cleanup errors in group integration tests

Replace fire-and-forget cleanup calls with error-checked versions
that log failures via t.Logf for debugging visibility.

* fix: prevent duplicate group test runs in CI matrix

The basic lane's -run "TestIAM" regex also matched TestIAMGroup*
tests, causing them to run in both the basic and group lanes.
Replace with explicit test function names.

* fix: add GIN index on groups.members JSONB for membership lookups

Without this index, ListGroupsForUser and membership queries
require full table scans on the groups table.

* fix: handle cross-directory moves in IAM config subscription

When a file is moved out of an IAM directory (e.g., /etc/iam/groups),
the dir variable was overwritten with NewParentPath, causing the
source directory change to be missed. Now also notifies handlers
about the source directory for cross-directory moves.

* fix: validate members/policies before deleting group in admin handler

AdminServer.DeleteGroup now checks for attached members and policies
before delegating to credentialManager, matching the IAM handler guards.

* fix: merge groups by name instead of blind append during filer load

Match the identity loader's merge behavior: find existing group
by name and replace, only append when no match exists. Prevents
duplicates when legacy and multi-file configs overlap.

* fix: check DeleteEntry response error when cleaning obsolete group files

Capture and log resp.Error from filer DeleteEntry calls during
group file cleanup, matching the pattern used in deleteGroupFile.

* fix: verify source user exists before no-op check in UpdateUser

Reorder UpdateUser to find the source identity first and return
NoSuchEntityException if not found, before checking if the rename
is a no-op. Previously a non-existent user renamed to itself
would incorrectly return success.

* fix: update service account parent refs on user rename in embedded IAM

The embedded IAM UpdateUser handler updated group membership but
not service account ParentUser fields, unlike the standalone handler.

* fix: replay source-side events for all handlers on cross-dir moves

Pass nil newEntry to bucket, IAM, and circuit-breaker handlers for
the source directory during cross-directory moves, so all watchers
can clear caches for the moved-away resource.

* fix: don't seed mergedGroups from existing iam.groups in merge

Groups are always dynamic (from filer), never static (from s3.config).
Seeding from iam.groups caused stale deleted groups to persist.
Now only uses config.Groups from the dynamic filer config.

* fix: add deferred user cleanup in TestIAMGroupUserDeletionSideEffect

Register t.Cleanup for the created user so it gets cleaned up
even if the test fails before the inline DeleteUser call.

* fix: assert UpdateGroup HTTP status in disabled group tests

Add require.Equal checks for 200 status after UpdateGroup calls
so the test fails immediately on API errors rather than relying
on the subsequent Eventually timeout.

* fix: trim whitespace from group name in filer store operations

Trim leading/trailing whitespace from group.Name before validation
in CreateGroup and UpdateGroup to prevent whitespace-only filenames.
Also merge groups by name during multi-file load to prevent duplicates.

* fix: add nil/empty group validation in gRPC store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics and invalid persistence.

* fix: add nil/empty group validation in postgres store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil member access and empty-name row inserts.

* fix: add name collision check in embedded IAM UpdateUser

The embedded IAM handler renamed users without checking if the
target name already existed, unlike the standalone handler.

* fix: add ErrGroupNotEmpty sentinel and map to HTTP 409

AdminServer.DeleteGroup now wraps conflict errors with
ErrGroupNotEmpty, and groupErrorToHTTPStatus maps it to
409 Conflict instead of 500.

* fix: use appropriate error message in GetGroupDetails based on status

Return "Group not found" only for 404, use "Failed to retrieve group"
for other error statuses instead of always saying "Group not found".

* fix: use backend-normalized group.Name in CreateGroup response

After credentialManager.CreateGroup may normalize the name (e.g.,
trim whitespace), use group.Name instead of the raw input for
the returned GroupData to ensure consistency.

* fix: add nil/empty group validation in memory store

Guard CreateGroup and UpdateGroup against nil group or empty name
to prevent panics from nil pointer dereference on map access.

* fix: reorder embedded IAM UpdateUser to verify source first

Find the source identity before checking for collisions, matching
the standalone handler's logic. Previously a non-existent user
renamed to an existing name would get EntityAlreadyExists instead
of NoSuchEntity.

* fix: handle same-directory renames in metadata subscription

Replay a delete event for the old entry name during same-directory
renames so handlers like onBucketMetadataChange can clean up stale
state for the old name.

* fix: abort GetGroups on non-ErrGroupNotFound errors

Only skip groups that return ErrGroupNotFound. Other errors (e.g.,
transient backend failures) now abort the handler and return the
error to the caller instead of silently producing partial results.

* fix: add aria-label and title to icon-only group action buttons

Add accessible labels to View and Delete buttons so screen readers
and tooltips provide meaningful context.

* fix: validate group name in saveGroup to prevent invalid filenames

Trim whitespace and reject empty names before writing group JSON
files, preventing creation of files like ".json".

* fix: add /etc/iam/groups to filer subscription watched directories

The groups directory was missing from the watched directories list,
so S3 servers in a cluster would not detect group changes made by
other servers via filer. The onIamConfigChange handler already had
code to handle group directory changes but it was never triggered.

* add direct gRPC propagation for group changes to S3 servers

Groups now have the same dual propagation as identities and policies:
direct gRPC push via propagateChange + async filer subscription.

- Add PutGroup/RemoveGroup proto messages and RPCs
- Add PutGroup/RemoveGroup in-memory cache methods on IAM
- Add PutGroup/RemoveGroup gRPC server handlers
- Update PropagatingCredentialStore to call propagateChange on group mutations

* reduce log verbosity for config load summary

Change ReplaceS3ApiConfiguration log from Infof to V(1).Infof
to avoid noisy output on every config reload.

* admin: show user groups in view and edit user modals

- Add Groups field to UserDetails and populate from credential manager
- Show groups as badges in user details view modal
- Add group management to edit user modal: display current groups,
  add to group via dropdown, remove from group via badge x button

* fix: remove duplicate showAlert that broke modal-alerts.js

admin.js defined showAlert(type, message) which overwrote the
modal-alerts.js version showAlert(message, type), causing broken
unstyled alert boxes. Remove the duplicate and swap all callers
in admin.js to use the correct (message, type) argument order.

* fix: unwrap groups API response in edit user modal

The /api/groups endpoint returns {"groups": [...]}, not a bare array.

* Update object_store_users_templ.go

* test: assert AccessDenied error code in group denial tests

Replace plain assert.Error checks with awserr.Error type assertion
and AccessDenied code verification, matching the pattern used in
other IAM integration tests.

* fix: propagate GetGroups errors in ShowGroups handler

getGroupsPageData was swallowing errors and returning an empty page
with 200 status. Now returns the error so ShowGroups can respond
with a proper error status.

* fix: reject AttachGroupPolicy when credential manager is nil

Previously skipped policy existence validation when credentialManager
was nil, allowing attachment of nonexistent policies. Now returns
a ServiceFailureException error.

* fix: preserve groups during partial MergeS3ApiConfiguration updates

UpsertIdentity calls MergeS3ApiConfiguration with a partial config
containing only the updated identity (nil Groups). This was wiping
all in-memory group state. Now only replaces groups when
config.Groups is non-nil (full config reload).

* fix: propagate errors from group lookup in GetObjectStoreUserDetails

ListGroups and GetGroup errors were silently ignored, potentially
showing incomplete group data in the UI.

* fix: use DOM APIs for group badge remove button to prevent XSS

Replace innerHTML with onclick string interpolation with DOM
createElement + addEventListener pattern. Also add aria-label
and title to the add-to-group button.

* fix: snapshot group policies under RLock to prevent concurrent map access

evaluateIAMPolicies was copying the map reference via groupMap :=
iam.groups under RLock then iterating after RUnlock, while PutGroup
mutates the map in-place. Now copies the needed policy names into
a slice while holding the lock.

* fix: add nil IAM check to PutGroup and RemoveGroup gRPC handlers

Match the nil guard pattern used by PutPolicy/DeletePolicy to
prevent nil pointer dereference when IAM is not initialized.
2026-03-09 11:54:32 -07:00
Chris Lu
78a3441b30 fix: volume balance detection returns multiple tasks per run (#8559)
* fix: volume balance detection now returns multiple tasks per run (#8551)

Previously, detectForDiskType() returned at most 1 balance task per disk
type, making the MaxJobsPerDetection setting ineffective. The detection
loop now iterates within each disk type, planning multiple moves until
the imbalance drops below threshold or maxResults is reached. Effective
volume counts are adjusted after each planned move so the algorithm
correctly re-evaluates which server is overloaded.

* fix: factor pending tasks into destination scoring and use UnixNano for task IDs

- Use UnixNano instead of Unix for task IDs to avoid collisions when
  multiple tasks are created within the same second
- Adjust calculateBalanceScore to include LoadCount (pending + assigned
  tasks) in the utilization estimate, so the destination picker avoids
  stacking multiple planned moves onto the same target disk

* test: add comprehensive balance detection tests for complex scenarios

Cover multi-server convergence, max-server shifting, destination
spreading, pre-existing pending task skipping, no-duplicate-volume
invariant, and parameterized convergence verification across different
cluster shapes and thresholds.

* fix: address PR review findings in balance detection

- hasMore flag: compute from len(results) >= maxResults so the scheduler
  knows more pages may exist, matching vacuum/EC handler pattern
- Exhausted server fallthrough: when no eligible volumes remain on the
  current maxServer (all have pending tasks) or destination planning
  fails, mark the server as exhausted and continue to the next
  overloaded server instead of stopping the entire detection loop
- Return canonical destination server ID directly from createBalanceTask
  instead of resolving via findServerIDByAddress, eliminating the
  fragile address→ID lookup for adjustment tracking
- Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks
  with negative scores (high pending load, same rack/DC) are still
  selected as the best available destination
- Add TestDetection_ExhaustedServerFallsThrough covering the scenario
  where the top server's volumes are all blocked by pre-existing tasks

* test: fix computeEffectiveCounts and add len guard in no-duplicate test

- computeEffectiveCounts now takes a servers slice to seed counts for all
  known servers (including empty ones) and uses an address→ID map from
  the topology spec instead of scanning metrics, so destination servers
  with zero initial volumes are tracked correctly
- TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1
  before checking duplicates, so the test actually fails if Detection
  regresses to returning a single task

* fix: remove redundant HasAnyTask check in createBalanceTask

The HasAnyTask check in createBalanceTask duplicated the same check
already performed in detectForDiskType's volume selection loop.
Since detection runs single-threaded (MaxDetectionConcurrency: 1),
no race can occur between the two points.

* fix: consistent hasMore pattern and remove double-counted LoadCount in scoring

- Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check
  len > maxResults, and truncate — consistent truncation semantics
- Remove direct LoadCount penalty in calculateBalanceScore since
  LoadCount is already factored into effectiveVolumeCount for
  utilization scoring; bump utilization weight from 40 to 50 to
  compensate for the removed 10-point load penalty

* fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers

- When MaxResults is 0 (omitted), treat as no explicit cap instead of
  defaulting to 1; only apply the +1 over-fetch probe when caller
  supplies a positive limit
- Move decision trace emission after hasMore/trim so the trace
  accurately reflects the returned proposals
- Seed serverVolumeCounts from ActiveTopology so servers that have a
  matching disk type but zero volumes are included in the imbalance
  calculation and MinServerCount check

* fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order

- Add early nil guard for clusterInfo in Detection to prevent panics
  in downstream helpers (detectForDiskType, createBalanceTask)
- Change register.go DetectionFunc wrapper from maxResults=1 to 0
  (no cap) so the legacy code path returns all detected tasks
- Sort disk type keys before iteration so results are deterministic
  when maxResults spans multiple disk types (HDD/SSD)

* fix: don't over-fetch in stateful detection to avoid orphaned pending tasks

Detection registers planned moves in ActiveTopology via AddPendingTask,
so requesting maxResults+1 would create an extra pending task that gets
discarded during trim. Use len(results) >= maxResults as the hasMore
signal instead, which is correct since Detection already caps internally.

* fix: return explicit truncated flag from Detection instead of approximating

Detection now returns (results, truncated, error) where truncated is true
only when the loop stopped because it hit maxResults, not when it ran out
of work naturally. This eliminates false hasMore signals when detection
happens to produce exactly maxResults results by resolving the imbalance.

* cleanup: simplify detection logic and remove redundancies

- Remove redundant clusterInfo nil check in detectForDiskType since
  Detection already guards against nil clusterInfo
- Remove adjustments loop for destination servers not in
  serverVolumeCounts — topology seeding ensures all servers with
  matching disk type are already present
- Merge two-loop min/max calculation into a single loop: min across
  all servers, max only among non-exhausted servers
- Replace magic number 100 with len(metrics) for minC initialization
  in convergence test

* fix: accurate truncation flag, deterministic server order, indexed volume lookup

- Track balanced flag to distinguish "hit maxResults cap" from "cluster
  balanced at exactly maxResults" — truncated is only true when there's
  genuinely more work to do
- Sort servers for deterministic iteration and tie-breaking when
  multiple servers have equal volume counts
- Pre-index volumes by server with per-server cursors to avoid
  O(maxResults * volumes) rescanning on each iteration
- Add truncation flag assertions to RespectsMaxResults test: true when
  capped, false when detection finishes naturally

* fix: seed trace server counts from ActiveTopology to match detection logic

The decision trace was building serverVolumeCounts only from metrics,
missing zero-volume servers seeded from ActiveTopology by Detection.
This could cause the trace to report wrong server counts, incorrect
imbalance ratios, or spurious "too few servers" messages. Pass
activeTopology into the trace function and seed server counts the
same way Detection does.

* fix: don't exhaust server on per-volume planning failure, sort volumes by ID

- When createBalanceTask returns nil, continue to the next volume on
  the same server instead of marking the entire server as exhausted.
  The failure may be volume-specific (not found in topology, pending
  task registration failed) and other volumes on the server may still
  be viable candidates.
- Sort each server's volume slice by VolumeID after pre-indexing so
  volume selection is fully deterministic regardless of input order.

* fix: use require instead of assert to prevent nil dereference panic in CORS test

The test used assert.NoError (non-fatal) for GetBucketCors, then
immediately accessed getResp.CORSRules. When the API returns an error,
getResp is nil causing a panic. Switch to require.NoError/NotNil/Len
so the test stops before dereferencing a nil response.

* fix: deterministic disk tie-breaking and stronger pre-existing task test

- Sort available disks by NodeID then DiskID before scoring so
  destination selection is deterministic when two disks score equally
- Add task count bounds assertion to SkipsPreExistingPendingTasks test:
  with 15 of 20 volumes already having pending tasks, at most 5 new
  tasks should be created and at least 1 (imbalance still exists)

* fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling

Detection now calls ActiveTopology.GetTaskServerAdjustments() to
initialize the adjustments map with source/destination deltas from
existing pending and assigned balance tasks. This ensures
effectiveCounts reflects in-flight moves, preventing the algorithm
from planning additional moves in the same direction when prior
moves already address the imbalance.

Added GetTaskServerAdjustments(taskType) to ActiveTopology which
iterates pending and assigned tasks, decrementing source servers
and incrementing destination servers for the given task type.
2026-03-08 21:34:03 -07:00
Chris Lu
df5e8210df Implement IAM managed policy operations (#8507)
* feat: Implement IAM managed policy operations (GetPolicy, ListPolicies, DeletePolicy, AttachUserPolicy, DetachUserPolicy)

- Add response type aliases in iamapi_response.go for managed policy operations
- Implement 6 handler methods in iamapi_management_handlers.go:
  - GetPolicy: Lookup managed policy by ARN
  - DeletePolicy: Remove managed policy
  - ListPolicies: List all managed policies
  - AttachUserPolicy: Attach managed policy to user, aggregating inline + managed actions
  - DetachUserPolicy: Detach managed policy from user
  - ListAttachedUserPolicies: List user's attached managed policies
- Add computeAllActionsForUser() to aggregate actions from both inline and managed policies
- Wire 6 new DoActions switch cases for policy operations
- Add comprehensive tests for all new handlers
- Fixes #8506

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for IAM managed policy operations

- Add parsePolicyArn() helper with proper ARN prefix validation, replacing
  fragile strings.Split parsing in GetPolicy, DeletePolicy, AttachUserPolicy,
  and DetachUserPolicy
- DeletePolicy now detaches the policy from all users and recomputes their
  aggregated actions, preventing stale permissions after deletion
- Set changed=true for DeletePolicy DoActions case so identity updates persist
- Make PolicyId consistent: CreatePolicy now uses Hash(&policyName) matching
  GetPolicy and ListPolicies
- Remove redundant nil map checks (Go handles nil map lookups safely)
- DRY up action deduplication in computeAllActionsForUser with addUniqueActions
  closure
- Add tests for invalid/empty ARN rejection and DeletePolicy identity cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add integration tests for managed policy lifecycle (#8506)

Add two integration tests covering the user-reported use case where
managed policy operations returned 500 errors:

- TestS3IAMManagedPolicyLifecycle: end-to-end workflow matching the
  issue report — CreatePolicy, ListPolicies, GetPolicy, AttachUserPolicy,
  ListAttachedUserPolicies, idempotent re-attach, DeletePolicy while
  attached (expects DeleteConflict), DetachUserPolicy, DeletePolicy,
  and verification that deleted policy is gone

- TestS3IAMManagedPolicyErrorCases: covers error paths — nonexistent
  policy/user for GetPolicy, DeletePolicy, AttachUserPolicy,
  DetachUserPolicy, and ListAttachedUserPolicies

Also fixes DeletePolicy to reject deletion when policy is still attached
to a user (AWS-compatible DeleteConflictException), and adds the 409
status code mapping for DeleteConflictException in the error response
handler.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: nil map panic in CreatePolicy, add PolicyId test assertions

- Initialize policies.Policies map in CreatePolicy if nil (prevents panic
  when no policies exist yet); also handle filer_pb.ErrNotFound like other
  callers
- Add PolicyId assertions in TestGetPolicy and TestListPolicies to lock in
  the consistent Hash(&policyName) behavior
- Remove redundant time.Sleep calls from new integration tests (startMiniCluster
  already blocks on waitForS3Ready)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: PutUserPolicy and DeleteUserPolicy now preserve managed policy actions

PutUserPolicy and DeleteUserPolicy were calling computeAggregatedActionsForUser
(inline-only), overwriting ident.Actions and dropping managed policy actions.
Both now call computeAllActionsForUser which unions inline + managed actions.

Add TestManagedPolicyActionsPreservedAcrossInlineMutations regression test:
attaches a managed policy, adds an inline policy (verifies both actions present),
deletes the inline policy, then asserts managed policy actions still persist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: PutUserPolicy verifies user exists before persisting inline policy

Previously the inline policy was written to storage before checking if the
target user exists in s3cfg.Identities, leaving orphaned policy data when
the user was absent. Now validates the user first, returning
NoSuchEntityException immediately if not found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent stale/lost actions on computeAllActionsForUser failure

- PutUserPolicy: on recomputation failure, preserve existing ident.Actions
  instead of falling back to only the current inline policy's actions
- DeleteUserPolicy: on recomputation failure, preserve existing ident.Actions
  instead of assigning nil (which wiped all permissions)
- AttachUserPolicy: roll back ident.PolicyNames and return error if
  action recomputation fails, keeping identity consistent
- DetachUserPolicy: roll back ident.PolicyNames and return error if
  GetPolicies or action recomputation fails
- Add doc comment on newTestIamApiServer noting it only sets s3ApiConfig

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 14:18:07 -08:00
Chris Lu
10a30a83e1 s3api: add GetObjectAttributes API support (#8504)
* s3api: add error code and header constants for GetObjectAttributes

Add ErrInvalidAttributeName error code and header constants
(X-Amz-Object-Attributes, X-Amz-Max-Parts, X-Amz-Part-Number-Marker,
X-Amz-Delete-Marker) needed by the S3 GetObjectAttributes API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: implement GetObjectAttributes handler

Add GetObjectAttributesHandler that returns selected object metadata
(ETag, Checksum, StorageClass, ObjectSize, ObjectParts) without
returning the object body. Follows the same versioning and conditional
header patterns as HeadObjectHandler.

The handler parses the X-Amz-Object-Attributes header to determine
which attributes to include in the XML response, and supports
ObjectParts pagination via X-Amz-Max-Parts and X-Amz-Part-Number-Marker.

Ref: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAttributes.html

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: register GetObjectAttributes route

Register the GET /{object}?attributes route for the
GetObjectAttributes API, placed before other object query
routes to ensure proper matching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: add integration tests for GetObjectAttributes

Test coverage:
- Basic: simple object with all attribute types
- MultipartObject: multipart upload with parts pagination
- SelectiveAttributes: requesting only specific attributes
- InvalidAttribute: server rejects invalid attribute names
- NonExistentObject: returns NoSuchKey for missing objects

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: add versioned object test for GetObjectAttributes

Test puts two versions of the same object and verifies that:
- GetObjectAttributes returns the latest version by default
- GetObjectAttributes with versionId returns the specific version
- ObjectSize and VersionId are correct for each version

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: fix combined conditional header evaluation per RFC 7232

Per RFC 7232:
- Section 3.4: If-Unmodified-Since MUST be ignored when If-Match is
  present (If-Match is the more accurate replacement)
- Section 3.3: If-Modified-Since MUST be ignored when If-None-Match is
  present (If-None-Match is the more accurate replacement)

Previously, all four conditional headers were evaluated independently.
This caused incorrect 412 responses when If-Match succeeded but
If-Unmodified-Since failed (should return 200 per AWS S3 behavior).

Fix applied to both validateConditionalHeadersForReads (GET/HEAD) and
validateConditionalHeaders (PUT) paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: add conditional header combination tests for GetObjectAttributes

Test the RFC 7232 combined conditional header semantics:
- If-Match=true + If-Unmodified-Since=false => 200 (If-Unmodified-Since ignored)
- If-None-Match=false + If-Modified-Since=true => 304 (If-Modified-Since ignored)
- If-None-Match=true + If-Modified-Since=false => 200 (If-Modified-Since ignored)
- If-Match=true + If-Unmodified-Since=true => 200
- If-Match=false => 412 regardless

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: document Checksum attribute as not yet populated

Checksum is accepted in validation (so clients requesting it don't get
a 400 error, matching AWS behavior for objects without checksums) but
SeaweedFS does not yet store S3 checksums. Add a comment explaining
this and noting where to populate it when checksum storage is added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: add s3:GetObjectAttributes IAM action for ?attributes query

Previously, GET /{object}?attributes resolved to s3:GetObject via the
fallback path since resolveFromQueryParameters had no case for the
"attributes" query parameter.

Add S3_ACTION_GET_OBJECT_ATTRIBUTES constant ("s3:GetObjectAttributes")
and a branch in resolveFromQueryParameters to return it for GET requests
with the "attributes" query parameter, so IAM policies can distinguish
GetObjectAttributes from GetObject.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: evaluate conditional headers after version resolution

Move conditional header evaluation (If-Match, If-None-Match, etc.) to
after the version resolution step in GetObjectAttributesHandler. This
ensures that when a specific versionId is requested, conditions are
checked against the correct version entry rather than always against
the latest version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: use bounded HTTP client in GetObjectAttributes tests

Replace http.DefaultClient with a timeout-aware http.Client (10s) in
the signedGetObjectAttributes helper and testGetObjectAttributesInvalid
to prevent tests from hanging indefinitely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: check attributes query before versionId in action resolver

Move the GetObjectAttributes action check before the versionId check
in resolveFromQueryParameters. This fixes GET /bucket/key?attributes&versionId=xyz
being incorrectly classified as s3:GetObjectVersion instead of
s3:GetObjectAttributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: add tests for versioned conditional headers and action resolver

Add integration test that verifies conditional headers (If-Match,
If-None-Match) are evaluated against the requested version entry, not
the latest version. This covers the fix in 55c409dec.

Add unit test for ResolveS3Action verifying that the attributes query
parameter takes precedence over versionId, so GET ?attributes&versionId
resolves to s3:GetObjectAttributes. This covers the fix in b92c61c95.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: guard negative chunk indices and rename PartsCount field

Add bounds checks for b.StartChunk >= 0 and b.EndChunk >= 0 in
buildObjectAttributesParts to prevent panics from corrupted metadata
with negative index values.

Rename ObjectAttributesParts.PartsCount to TotalPartsCount to match
the AWS SDK v2 Go field naming convention, while preserving the XML
element name "PartsCount" via the struct tag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* s3api: reject malformed max-parts and part-number-marker headers

Return ErrInvalidMaxParts and ErrInvalidPartNumberMarker when the
X-Amz-Max-Parts or X-Amz-Part-Number-Marker headers contain
non-integer or negative values, matching ListObjectPartsHandler
behavior. Previously these were silently ignored with defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 12:52:09 -08:00
blitt001
3d81d5bef7 Fix S3 signature verification behind reverse proxies (#8444)
* Fix S3 signature verification behind reverse proxies

When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong,
Traefik), AWS S3 Signature V4 verification fails because the Host header
the client signed with (e.g. "localhost:9000") differs from the Host
header SeaweedFS receives on the backend (e.g. "seaweedfs:8333").

This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL
environment variable) that tells SeaweedFS what public-facing URL clients
use to connect. When set, SeaweedFS uses this host value for signature
verification instead of the Host header from the incoming request.

New parameter:
  -s3.externalUrl  (flag) or S3_EXTERNAL_URL (environment variable)
  Example: -s3.externalUrl=http://localhost:9000
  Example: S3_EXTERNAL_URL=https://s3.example.com

The environment variable is particularly useful in Docker/Kubernetes
deployments where the external URL is injected via container config.
The flag takes precedence over the environment variable when both are set.

At startup, the URL is parsed and default ports are stripped to match
AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so
"http://s3.example.com:80" and "http://s3.example.com" are equivalent.

Bugs fixed:
- Default port stripping was removed by a prior PR, causing signature
  mismatches when clients connect on standard ports (80/443)
- X-Forwarded-Port was ignored when X-Forwarded-Host was not present
- Scheme detection now uses proper precedence: X-Forwarded-Proto >
  TLS connection > URL scheme > "http"
- Test expectations for standard port stripping were incorrect
- expectedHost field in TestSignatureV4WithForwardedPort was declared
  but never actually checked (self-referential test)

* Add Docker integration test for S3 proxy signature verification

Docker Compose setup with nginx reverse proxy to validate that the
-s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly
resolves S3 signature verification when SeaweedFS runs behind a proxy.

The test uses nginx proxying port 9000 to SeaweedFS on port 8333,
with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured
with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000"
for signature verification, matching what the AWS CLI signs with.

The test can be run with aws CLI on the host or without it by using
the amazon/aws-cli Docker image with --network host.

Test covers: create-bucket, list-buckets, put-object, head-object,
list-objects-v2, get-object, content round-trip integrity,
delete-object, and delete-bucket — all through the reverse proxy.

* Create s3-proxy-signature-tests.yml

* fix CLI

* fix CI

* Update s3-proxy-signature-tests.yml

* address comments

* Update Dockerfile

* add user

* no need for fuse

* Update s3-proxy-signature-tests.yml

* debug

* weed mini

* fix health check

* health check

* fix health checking

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-26 14:20:42 -08:00
Chris Lu
a92e9baddf Add integration test for multipart operations inheriting s3:PutObject permissions
TestS3MultipartOperationsInheritPutObjectPermissions verifies that multipart
upload operations (CreateMultipartUpload, UploadPart, ListParts,
CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads) work
correctly when a user has only s3:PutObject permission granted.

This test validates the behavior where multipart operations are implicitly
granted when s3:PutObject is authorized, as multipart upload is an
implementation detail of putting objects in S3.
2026-02-25 13:38:14 -08:00
Chris Lu
b565a0cc86 Adds volume.merge command with deduplication and disk-based backend (#8441)
* Enhance volume.merge command with deduplication and disk-based backend

* Fix copyVolume function call with correct argument order and missing bool parameter

* Revert "Fix copyVolume function call with correct argument order and missing bool parameter"

This reverts commit 7b4a190643576fec11f896b26bcad03dd02da2f7.

* Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures

* Optimize memory usage with watermark approach for duplicate detection

* Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging

* Replace temporary file with in-memory buffer for needle blob serialization

* test(volume.merge): Add comprehensive unit and integration tests

Add 7 unit tests covering:
- Ordering by timestamp
- Cross-stream duplicate deduplication
- Empty stream handling
- Complex multi-stream deduplication
- Single stream passthrough
- Large needle ID support
- LastModified fallback when timestamp unavailable

Add 2 integration validation tests:
- TestMergeWorkflowValidation: Documents 9-stage merge workflow
- TestMergeEdgeCaseHandling: Validates 10 edge case handling

All tests passing (9/9)

* fix(volume.merge): Use time window for deduplication to handle clock skew

The same needle ID can have different timestamps on different servers due to
clock skew and replication lag. Needles with the same ID within a 5-second
time window are now treated as duplicates (same write with timestamp variance).

Key changes:
- Add mergeDeduplicationWindowNs constant (5 seconds)
- Replace exact timestamp matching with time window comparison
- Use windowInitialized flag to properly detect window transitions
- Add TestMergeNeedleStreamsTimeWindowDeduplication test

This ensures that replicated writes with slight timestamp differences are
properly deduplicated during merge, while separate updates to the same file
ID (outside the window) are preserved.

All tests passing (10/10)

* test: Add volume.merge integration tests with 5 comprehensive test cases

* test: integration tests for volume.merge command

* Fix integration tests: use TripleVolumeCluster for volume.merge testing

- Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers
- Rebuilt weed binary with volume.merge command compiled in
- Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster
- Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd
- All 5 integration tests now pass:
  - TestVolumeMergeBasic
  - TestVolumeMergeReadonly
  - TestVolumeMergeRestore
  - TestVolumeMergeTailNeedles
  - TestVolumeMergeDivergentReplicas

* Refactor test framework: use parameterized server count instead of hardcoded

- Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter
- Replaced hardcoded volumePort0/1/2 with slices for flexible server count
- Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3)
- Made directory creation, port allocation, and server startup loop-based
- Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count
- All 5 integration tests continue to pass with new parameterized cluster framework
- Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly

* Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster

- Made DualVolumeCluster a type alias for MultiVolumeCluster
- Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2)
- Removed duplicate code from cluster_dual.go (now just 17 lines)
- All existing tests using StartDualVolumeCluster continue to work without changes
- Backward compatible: existing code continues to use the old function signatures
- Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster
- Enables unified cluster management across all test suites

* Address PR review comments: improve error handling and clean up code

- Replace parse error swallow with proper error return
- Log cleanup and restoration errors instead of silently discarding them
- Remove unused offset field from memoryBackendFile struct
- Fix WriteAt buffer truncation bug to preserve trailing bytes
- All unit tests passing (10/10)
- Code compiles successfully

* Fix PR review findings: test improvements and code quality

- Add timeout to runWeedShell to prevent hanging
- Add server 1 readonly status verification in tests
- Assert merge fails when replicas writable (not just log output)
- Replace sleep with polling for writable restoration check
- Fix WriteAt stale data snapshot bug in memoryBackendFile
- Fix startVolume error logging to show current server log
- Fix volumePubPorts double assignment in port allocation
- Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows
- Fix misleading dedup window comment

Unit tests: 10/10 passing
Binary: Compiles successfully

* Fix test assumption: merge command marks volumes readonly automatically

TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the
merge command is designed to mark volumes readonly as part of its operation. Fixed
test to verify merge succeeds on writable volumes and properly restores writable
state afterward. Removed redundant Test 2 code that duplicated the new behavior.

* fmt

* Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates

The dedup map previously used only NeedleId as key, causing same-stream
overwrites to be incorrectly skipped as duplicates. Changed to track which
stream first processed each needle ID in the current window:

- Cross-stream duplicates (same ID from different streams, within window) are skipped
- Same-stream duplicates (overwrites from same stream) are kept
- Map now stores: needleId -> streamIndex of first occurrence in window

Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream
overwrites are preserved while cross-stream duplicates are skipped.

All unit tests passing (11/11)
Binary compiles successfully
2026-02-25 10:12:09 -08:00
Chris Lu
e4b70c2521 go fix 2026-02-20 18:42:00 -08:00
Chris Lu
bd0b1fe9d5 S3 IAM: Added ListPolicyVersions and GetPolicyVersion support (#8395)
* test(s3/iam): add managed policy CRUD lifecycle integration coverage

* s3/iam: add ListPolicyVersions and GetPolicyVersion support

* test(s3/iam): cover ListPolicyVersions and GetPolicyVersion
2026-02-20 11:04:18 -08:00
Michał Szynkiewicz
2f837c4780 Fix error on deleting non-empty bucket (#8376)
* Move check for non-empty bucket deletion out of `WithFilerClient` call

* Added proper checking if a bucket has "user" objects
2026-02-19 22:56:50 -08:00
Chris Lu
e9c45144cf Implement managed policy storage (#8385)
* Persist managed IAM policies

* Add IAM list/get policy integration test

* Faster marker lookup and cleanup

* Handle delete conflict and improve listing

* Add delete-in-use policy integration test

* Stabilize policy ID and guard path prefix

* Tighten CreatePolicy guard and reload

* Add ListPolicyNames to credential store
2026-02-19 14:21:19 -08:00
Chris Lu
7b8df39cf7 s3api: add AttachUserPolicy/DetachUserPolicy/ListAttachedUserPolicies (#8379)
* iam: add XML responses for managed user policy APIs

* s3api: implement attach/detach/list attached user policies

* s3api: add embedded IAM tests for managed user policies

* iam: update CredentialStore interface and Manager for managed policies

Updated the `CredentialStore` interface to include `AttachUserPolicy`,
`DetachUserPolicy`, and `ListAttachedUserPolicies` methods.
The `CredentialManager` was updated to delegate these calls to the store.
Added common error variables for policy management.

* iam: implement managed policy methods in MemoryStore

Implemented `AttachUserPolicy`, `DetachUserPolicy`, and
`ListAttachedUserPolicies` in the MemoryStore.
Also ensured deep copying of identities includes PolicyNames.

* iam: implement managed policy methods in PostgresStore

Modified Postgres schema to include `policy_names` JSONB column in `users`.
Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`.
Updated user CRUD operations to handle policy names persistence.

* iam: implement managed policy methods in remaining stores

Implemented user policy management in:
- `FilerEtcStore` (partial implementation)
- `IamGrpcStore` (delegated via GetUser/UpdateUser)
- `PropagatingCredentialStore` (to broadcast updates)
Ensures cluster-wide consistency for policy attachments.

* s3api: refactor EmbeddedIamApi to use managed policy APIs

- Refactored `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`
  to use `e.credentialManager` directly.
- Fixed a critical error suppression bug in `ExecuteAction` that always
  returned success even on failure.
- Implemented robust error matching using string comparison fallbacks.
- Improved consistency by reloading configuration after policy changes.

* s3api: update and refine IAM integration tests

- Updated tests to use a real `MemoryStore`-backed `CredentialManager`.
- Refined test configuration synchronization using `sync.Once` and
  manual deep-copying to prevent state corruption.
- Improved `extractEmbeddedIamErrorCodeAndMessage` to handle more XML
  formats robustly.
- Adjusted test expectations to match current AWS IAM behavior.

* fix compilation

* visibility

* ensure 10 policies

* reload

* add integration tests

* Guard raft command registration

* Allow IAM actions in policy tests

* Validate gRPC policy attachments

* Revert Validate gRPC policy attachments

* Tighten gRPC policy attach/detach

* Improve IAM managed policy handling

* Improve managed policy filters
2026-02-19 12:26:27 -08:00
Michał Szynkiewicz
53048ffffb Add md5 checksum validation support on PutObject and UploadPart (#8367)
* Add md5 checksum validation support on PutObject and UploadPart

Per the S3 specification, when a client sends a Content-MD5 header, the server must compare it against the MD5 of the received body and return BadDigest (HTTP 400) if they don't match.

SeaweedFS was silently accepting objects with incorrect Content-MD5 headers, which breaks data integrity verification for clients that rely on this feature (e.g. boto3). The error infrastructure (ErrBadDigest, ErrMsgBadDigest) already existed from PR #7306 but was never wired to an actual check.

This commit adds MD5 verification in putToFiler after the body is streamed and the MD5 is computed, and adds Content-MD5 header validation to PutObjectPartHandler (matching PutObjectHandler). Orphaned chunks are cleaned up on mismatch.

Refs: https://github.com/seaweedfs/seaweedfs/discussions/3908

* handle SSE, add uploadpart test

* s3 integration test: fix typo and add multipart upload checksum test

* s3api: move validateContentMd5 after GetBucketAndObject in PutObjectPartHandler

* s3api: move validateContentMd5 after GetBucketAndObject in PutObjectHandler

* s3api: fix MD5 validation for SSE uploads and logging in putToFiler

* add SSE test with checksum validation - mostly ai-generated

* Update s3_integration_test.go

* Address S3 integration test feedback: fix typos, rename variables, add verification steps, and clean up comments.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-18 15:40:08 -08:00
Chris Lu
0d8588e3ae S3: Implement IAM defaults and STS signing key fallback (#8348)
* S3: Implement IAM defaults and STS signing key fallback logic

* S3: Refactor startup order to init SSE-S3 key manager before IAM

* S3: Derive STS signing key from KEK using HKDF for security isolation

* S3: Document STS signing key fallback in security.toml

* fix(s3api): refine anonymous access logic and secure-by-default behavior

- Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions.
- Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration.
- Update `NewIdentityAccessManagement` signature to accept `filerClient`.
- In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior.
- Use specific `LookupAnonymous` method instead of generic map lookup.
- Update tests to accommodate signature changes and verify improved anonymous handling.

* feat(s3api): make IAM configuration optional

- Start S3 API server without a configuration file if `EnableIam` option is set.
- Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode).
- Handle empty configuration path gracefully in `loadIAMManagerFromConfig`.
- Add integration test `iam_optional_test.go` to verify empty config behavior.

* fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore

* fix(iamapi): properly initialize FilerClient instead of passing nil

* fix(iamapi): properly initialize filer client for IAM management

- Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses.
- Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality.

* clean: remove dead code in s3api_server.go

* refactor(s3api): improve IAM initialization, safety and anonymous access security

* fix(s3api): ensure IAM config loads from filer after client init

* fix(s3): resolve test failures in integration, CORS, and tagging tests

- Fix CORS tests by providing explicit anonymous permissions config
- Fix S3 integration tests by setting admin credentials in init
- Align tagging test credentials in CI with IAM defaults
- Added goroutine to retry IAM config load in iamapi server

* fix(s3): allow anonymous access to health targets and S3 Tables when identities are present

* fix(ci): use /healthz for Caddy health check in awscli tests

* iam, s3api: expose DefaultAllow from IAM and Policy Engine

This allows checking the global "Open by Default" configuration from
other components like S3 Tables.

* s3api/s3tables: support DefaultAllow in permission logic and handler

Updated CheckPermissionWithContext to respect the DefaultAllow flag
in PolicyContext. This enables "Open by Default" behavior for
unauthenticated access in zero-config environments. Added a targeted
unit test to verify the logic.

* s3api/s3tables: propagate DefaultAllow through handlers

Propagated the DefaultAllow flag to individual handlers for
namespaces, buckets, tables, policies, and tagging. This ensures
consistent "Open by Default" behavior across all S3 Tables API
endpoints.

* s3api: wire up DefaultAllow for S3 Tables API initialization

Updated registerS3TablesRoutes to query the global IAM configuration
and set the DefaultAllow flag on the S3 Tables API server. This
completes the end-to-end propagation required for anonymous access in
zero-config environments. Added a SetDefaultAllow method to
S3TablesApiServer to facilitate this.

* s3api: fix tests by adding DefaultAllow to mock IAM integrations

The IAMIntegration interface was updated to include DefaultAllow(),
breaking several mock implementations in tests. This commit fixes
the build errors by adding the missing method to the mocks.

* env

* ensure ports

* env

* env

* fix default allow

* add one more test using non-anonymous user

* debug

* add more debug

* less logs
2026-02-16 13:59:13 -08:00
Chris Lu
25ea48227f Fix STS temporary credentials to use ASIA prefix instead of AKIA (#8326)
Temporary credentials from STS AssumeRole were using "AKIA" prefix
(permanent IAM user credentials) instead of "ASIA" prefix (temporary
security credentials). This violates AWS conventions and may cause
compatibility issues with AWS SDKs that validate credential types.

Changes:
- Rename generateAccessKeyId to generateTemporaryAccessKeyId for clarity
- Update function to use ASIA prefix for temporary credentials
- Add unit tests to verify ASIA prefix format (weed/iam/sts/credential_prefix_test.go)
- Add integration test to verify ASIA prefix in S3 API (test/s3/iam/s3_sts_credential_prefix_test.go)
- Ensure AWS-compatible credential format (ASIA + 16 hex chars)

The credentials are already deterministic (SHA256-based from session ID)
and the SessionToken is correctly set to the JWT token, so this is just
a prefix fix to follow AWS standards.

Fixes #8312
2026-02-12 14:47:20 -08:00
Chris Lu
b57429ef2e Switch empty-folder cleanup to bucket policy (#8292)
* Fix Spark _temporary cleanup and add issue #8285 regression test

* Generalize empty folder cleanup for Spark temp artifacts

* Revert synchronous folder pruning and add cleanup diagnostics

* Add actionable empty-folder cleanup diagnostics

* Fix Spark temp marker cleanup in async folder cleaner

* Fix Spark temp cleanup with implicit directory markers

* Keep explicit directory markers non-implicit

* logging

* more logs

* Switch empty-folder cleanup to bucket policy

* Seaweed-X-Amz-Allow-Empty-Folders

* less logs

* go vet

* less logs

* refactoring
2026-02-10 18:38:38 -08:00
Chris Lu
458c12fb99 test: add Spark S3 integration regression for issue #8234 (#8249)
* test: add Spark S3 regression integration test for issue 8234

* Update test/s3/spark/setup_test.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-08 21:13:31 -08:00
Chris Lu
c284e51d20 fix: multipart upload ETag calculation (#8238)
* fix multipart etag

* address comments

* clean up

* clean up

* optimization

* address comments

* unquoted etag

* dedup

* upgrade

* clean

* etag

* return quoted tag

* quoted etag

* debug

* s3api: unify ETag retrieval and quoting across handlers

Refactor newListEntry to take *S3ApiServer and use getObjectETag,
and update setResponseHeaders to use the same logic. This ensures
consistent ETags are returned for both listing and direct access.

* s3api: implement ListObjects deduplication for versioned buckets

Handle duplicate entries between the main path and the .versions
directory by prioritizing the latest version when bucket versioning
is enabled.

* s3api: cleanup stale main file entries during versioned uploads

Add explicit deletion of pre-existing "main" files when creating new
versions in versioned buckets. This prevents stale entries from
appearing in bucket listings and ensures consistency.

* s3api: fix cleanup code placement in versioned uploads

Correct the placement of rm calls in completeMultipartUpload and
putVersionedObject to ensure stale main files are properly deleted
during versioned uploads.

* s3api: improve getObjectETag fallback for empty ExtETagKey

Ensure that when ExtETagKey exists but contains an empty value,
the function falls through to MD5/chunk-based calculation instead
of returning an empty string.

* s3api: fix test files for new newListEntry signature

Update test files to use the new newListEntry signature where the
first parameter is *S3ApiServer. Created mockS3ApiServer to properly
test owner display name lookup functionality.

* s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry

Change getEtagFromEntry fallback to use filer.ETag(entry) instead of
filer.ETagChunks to ensure legacy entries with Attributes.Md5 are
handled consistently with the rest of the codebase.

* s3api: optimize list logic and fix conditional header logging

- Hoist bucket versioning check out of per-entry callback to avoid
  repeated getVersioningState calls
- Extract appendOrDedup helper function to eliminate duplicate
  dedup/append logic across multiple code paths
- Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof
  and remove DEBUG prefix for consistency

* s3api: fix test mock to properly initialize IAM accounts

Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by
directly initializing the IdentityAccessManagement.accounts map in the
test setup. This ensures newListEntry can properly look up account
display names without panicking.

* cleanup

* s3api: remove premature main file cleanup in versioned uploads

Removed incorrect cleanup logic that was deleting main files during
versioned uploads. This was causing test failures because it deleted
objects that should have been preserved as null versions when
versioning was first enabled. The deduplication logic in listing is
sufficient to handle duplicate entries without deleting files during
upload.

* s3api: add empty-value guard to getEtagFromEntry

Added the same empty-value guard used in getObjectETag to prevent
returning quoted empty strings. When ExtETagKey exists but is empty,
the function now falls through to filer.ETag calculation instead of
returning "".

* s3api: fix listing of directory key objects with matching prefix

Revert prefix handling logic to use strings.TrimPrefix instead of
checking HasPrefix with empty string result. This ensures that when a
directory key object exactly matches the prefix (e.g. prefix="dir/",
object="dir/"), it is correctly handled as a regular entry instead of
being skipped or incorrectly processed as a common prefix. Also fixed
missing variable definition.

* s3api: refactor list inline dedup to use appendOrDedup helper

Refactored the inline deduplication logic in listFilerEntries to use the
shared appendOrDedup helper function. This ensures consistent behavior
and reduces code duplication.

* test: fix port allocation race in s3tables integration test

Updated startMiniCluster to find all required ports simultaneously using
findAvailablePorts instead of sequentially. This prevents race conditions
where the OS reallocates a port that was just released, causing multiple
services (e.g. Filer and Volume) to be assigned the same port and fail
to start.
2026-02-06 21:54:43 -08:00
Chris Lu
217d977579 Merge branch 'master' of https://github.com/seaweedfs/seaweedfs 2026-02-06 13:14:30 -08:00
Chris Lu
a3b83f8808 test: add Trino Iceberg catalog integration test (#8228)
* test: add Trino Iceberg catalog integration test

- Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog
- Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog
- Starts weed mini with all services and Trino in Docker container
- Validates Iceberg catalog schema creation and listing operations
- Uses native S3 filesystem support in Trino with path-style access
- Add workflow job to s3-tables-tests.yml for CI execution

* fix: preserve AWS environment credentials when replacing S3 configuration

When S3 configuration is loaded from filer/db, it replaces the identities list
and inadvertently removes AWS_ACCESS_KEY_ID credentials that were added from
environment variables. This caused auth to remain disabled even though valid
credentials were present.

Fix by preserving environment-based identities when replacing the configuration
and re-adding them after the replacement. This ensures environment credentials
persist across configuration reloads and properly enable authentication.

* fix: use correct ServerAddress format with gRPC port encoding

The admin server couldn't connect to master because the master address
was missing the gRPC port information. Use pb.NewServerAddress() which
properly encodes both HTTP and gRPC ports in the address string.

Changes:
- weed/command/mini.go: Use pb.NewServerAddress for master address in admin
- test/s3/policy/policy_test.go: Store and use gRPC ports for master/filer addresses

This fix applies to:
1. Admin server connection to master (mini.go)
2. Test shell commands that need master/filer addresses (policy_test.go)

* move

* move

* fix: always include gRPC port in server address encoding

The NewServerAddress() function was omitting the gRPC port from the address
string when it matched the port+10000 convention. However, gRPC port allocation
doesn't always follow this convention - when the calculated port is busy, an
alternative port is allocated.

This caused a bug where:
1. Master's gRPC port was allocated as 50661 (sequential, not port+10000)
2. Address was encoded as '192.168.1.66:50660' (gRPC port omitted)
3. Admin client called ToGrpcAddress() which assumed port+10000 offset
4. Admin tried to connect to 60660 but master was on 50661 → connection failed

Fix: Always include explicit gRPC port in address format (host:httpPort.grpcPort)
unless gRPC port is 0. This makes addresses unambiguous and works regardless of
the port allocation strategy used.

Impacts: All server-to-server gRPC connections now use properly formatted addresses.

* test: fix Iceberg REST API readiness check

The Iceberg REST API endpoints require authentication. When checked without
credentials, the API returns 403 Forbidden (not 401 Unauthorized).  The
readiness check now accepts both auth error codes (401/403) as indicators
that the service is up and ready, it just needs credentials.

This fixes the 'Iceberg REST API did not become ready' test failure.

* Fix AWS SigV4 signature verification for base64-encoded payload hashes

   AWS SigV4 canonical requests must use hex-encoded SHA256 hashes,
   but the X-Amz-Content-Sha256 header may be transmitted as base64.

   Changes:
   - Added normalizePayloadHash() function to convert base64 to hex
   - Call normalizePayloadHash() in extractV4AuthInfoFromHeader()
   - Added encoding/base64 import

   Fixes 403 Forbidden errors on POST requests to Iceberg REST API
   when clients send base64-encoded content hashes in the header.

   Impacted services: Iceberg REST API, S3Tables

* Fix AWS SigV4 signature verification for base64-encoded payload hashes

   AWS SigV4 canonical requests must use hex-encoded SHA256 hashes,
   but the X-Amz-Content-Sha256 header may be transmitted as base64.

   Changes:
   - Added normalizePayloadHash() function to convert base64 to hex
   - Call normalizePayloadHash() in extractV4AuthInfoFromHeader()
   - Added encoding/base64 import
   - Removed unused fmt import

   Fixes 403 Forbidden errors on POST requests to Iceberg REST API
   when clients send base64-encoded content hashes in the header.

   Impacted services: Iceberg REST API, S3Tables

* pass sigv4

* s3api: fix identity preservation and logging levels

- Ensure environment-based identities are preserved during config replacement
- Update accessKeyIdent and nameToIdentity maps correctly
- Downgrade informational logs to V(2) to reduce noise

* test: fix trino integration test and s3 policy test

- Pin Trino image version to 479
- Fix port binding to 0.0.0.0 for Docker connectivity
- Fix S3 policy test hang by correctly assigning MiniClusterCtx
- Improve port finding robustness in policy tests

* ci: pre-pull trino image to avoid timeouts

- Pull trinodb/trino:479 after Docker setup
- Ensure image is ready before integration tests start

* iceberg: remove unused checkAuth and improve logging

- Remove unused checkAuth method
- Downgrade informational logs to V(2)
- Ensure loggingMiddleware uses a status writer for accurate reported codes
- Narrow catch-all route to avoid interfering with other subsystems

* iceberg: fix build failure by removing unused s3api import

* Update iceberg.go

* use warehouse

* Update trino_catalog_test.go
2026-02-06 13:12:25 -08:00
Chris Lu
833bcde9f3 test: add Trino Iceberg catalog integration test
- Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog
- Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog
- Starts weed mini with all services and Trino in Docker container
- Validates Iceberg catalog schema creation and listing operations
- Uses native S3 filesystem support in Trino with path-style access
- Add workflow job to s3-tables-tests.yml for CI execution
2026-02-05 16:10:31 -08:00
Chris Lu
dfdace9a13 s3tables: enhance test robustness and resilience
Updated random string generation to use crypto/rand in s3tables tests.
Increased resilience of IAM distributed tests by adding "connection refused"
to retryable errors.
2026-01-28 13:25:32 -08:00
Chris Lu
551a31e156 Implement IAM propagation to S3 servers (#8130)
* Implement IAM propagation to S3 servers

- Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC
- Add Policy management RPCs to S3 proto and S3ApiServer
- Update CredentialManager to use PropagatingCredentialStore when MasterClient is available
- Wire FilerServer to enable propagation

* Implement parallel IAM propagation and fix S3 cluster registration

- Parallelized IAM change propagation with 10s timeout.
- Refined context usage in PropagatingCredentialStore.
- Added S3Type support to cluster node management.
- Enabled S3 servers to register with gRPC address to the master.
- Ensured IAM configuration reload after policy updates via gRPC.

* Optimize IAM propagation with direct in-memory cache updates

* Secure IAM propagation: Use metadata to skip persistence only on propagation

* pb: refactor IAM and S3 services for unidirectional IAM propagation

- Move SeaweedS3IamCache service from iam.proto to s3.proto.
- Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto.
- Enforce that S3 servers only use the synchronization interface.

* pb: regenerate Go code for IAM and S3 services

Updated generated code following the proto refactoring of IAM synchronization services.

* s3api: implement read-only mode for Embedded IAM API

- Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP.
- Enable read-only mode by default in S3ApiServer.
- Handle AccessDenied error in writeIamErrorResponse.
- Embed SeaweedS3IamCacheServer in S3ApiServer.

* credential: refactor PropagatingCredentialStore for unidirectional IAM flow

- Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers.
- Propagate full Identity object via PutIdentity for consistency.
- Remove redundant propagation of specific user/account/policy management RPCs.
- Add timeout context for propagation calls.

* s3api: implement SeaweedS3IamCacheServer for unidirectional sync

- Update S3ApiServer to implement the cache synchronization gRPC interface.
- Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates.
- Register SeaweedS3IamCacheServer in command/s3.go.
- Remove registration for the legacy and now empty SeaweedS3 service.

* s3api: update tests for read-only IAM and propagation

- Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode.
- Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests.
- Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior.

* s3api: add back temporary debug logs for IAM updates

Log IAM updates received via:
- gRPC propagation (PutIdentity, PutPolicy, etc.)
- Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager)
- Core identity management (UpsertIdentity, RemoveIdentity)

* IAM: finalize propagation fix with reduced logging and clarified architecture

* Allow configuring IAM read-only mode for S3 server integration tests

* s3api: add defensive validation to UpsertIdentity

* s3api: fix log message to reference correct IAM read-only flag

* test/s3/iam: ensure WaitForS3Service checks for IAM write permissions

* test: enable writable IAM in Makefile for integration tests

* IAM: add GetPolicy/ListPolicies RPCs to s3.proto

* S3: add GetBucketPolicy and ListBucketPolicies helpers

* S3: support storing generic IAM policies in IdentityAccessManagement

* S3: implement IAM policy RPCs using IdentityAccessManagement

* IAM: fix stale user identity on rename propagation
2026-01-26 22:59:43 -08:00
Chris Lu
43229b05ce Explicit IAM gRPC APIs for S3 Server (#8126)
* Update IAM and S3 protobuf definitions for explicit IAM gRPC APIs

* Refactor s3api: Extract generic ExecuteAction method for IAM operations

* Implement explicit IAM gRPC APIs in S3 server

* iam: remove deprecated GetConfiguration and PutConfiguration RPCs

* iamapi: refactor handlers to use CredentialManager directly

* s3api: refactor embedded IAM to use CredentialManager directly

* server: remove deprecated configuration gRPC handlers

* credential/grpc: refactor configuration calls to return error

* shell: update s3.configure to list users instead of full config

* s3api: fix CreateServiceAccount gRPC handler to map required fields

* s3api: fix UpdateServiceAccount gRPC handler to map fields and safe status

* s3api: enforce UserName in embedded IAM ListAccessKeys

* test: fix test_config.json structure to match proto definition

* Revert "credential/grpc: refactor configuration calls to return error"

This reverts commit cde707dd8b88c7d1bd730271518542eceb5ed069.

* Revert "server: remove deprecated configuration gRPC handlers"

This reverts commit 7307e205a083c8315cf84ddc2614b3e50eda2e33.

* Revert "s3api: enforce UserName in embedded IAM ListAccessKeys"

This reverts commit adf727ba52b4f3ffb911f0d0df85db858412ff83.

* Revert "s3api: fix UpdateServiceAccount gRPC handler to map fields and safe status"

This reverts commit 6a4be3314d43b6c8fda8d5e0558e83e87a19df3f.

* Revert "s3api: fix CreateServiceAccount gRPC handler to map required fields"

This reverts commit 9bb4425f07fbad38fb68d33e5c0aa573d8912a37.

* Revert "shell: update s3.configure to list users instead of full config"

This reverts commit f3304ead537b3e6be03d46df4cb55983ab931726.

* Revert "s3api: refactor embedded IAM to use CredentialManager directly"

This reverts commit 9012f27af82d11f0e824877712a5ae2505a65f86.

* Revert "iamapi: refactor handlers to use CredentialManager directly"

This reverts commit 3a148212236576b0a3aa4d991c2abb014fb46091.

* Revert "iam: remove deprecated GetConfiguration and PutConfiguration RPCs"

This reverts commit e16e08aa0099699338d3155bc7428e1051ce0a6a.

* s3api: address IAM code review comments (error handling, logging, gRPC response mapping)

* s3api: add robustness to startup by retrying KEK and IAM config loading from Filer

* s3api: address IAM gRPC code review comments (safety, validation, status logic)

* fix return
2026-01-26 13:38:15 -08:00
Chris Lu
6bf088cec9 IAM Policy Management via gRPC (#8109)
* Add IAM gRPC service definition

- Add GetConfiguration/PutConfiguration for config management
- Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management
- Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management
- Methods mirror existing IAM HTTP API functionality

* Add IAM gRPC handlers on filer server

- Implement IamGrpcServer with CredentialManager integration
- Handle configuration get/put operations
- Handle user CRUD operations
- Handle access key create/delete operations
- All methods delegate to CredentialManager for actual storage

* Wire IAM gRPC service to filer server

- Add CredentialManager field to FilerOption and FilerServer
- Import credential store implementations in filer command
- Initialize CredentialManager from credential.toml if available
- Register IAM gRPC service on filer gRPC server
- Enable credential management via gRPC alongside existing filer services

* Regenerate IAM protobuf with gRPC service methods

* iam_pb: add Policy Management to protobuf definitions

* credential: implement PolicyManager in credential stores

* filer: implement IAM Policy Management RPCs

* shell: add s3.policy command

* test: add integration test for s3.policy

* test: fix compilation errors in policy_test

* pb

* fmt

* test

* weed shell: add -policies flag to s3.configure

This allows linking/unlinking IAM policies to/from identities
directly from the s3.configure command.

* test: verify s3.configure policy linking and fix port allocation

- Added test case for linking policies to users via s3.configure
- Implemented findAvailablePortPair to ensure HTTP and gRPC ports
  are both available, avoiding conflicts with randomized port assignments.
- Updated assertion to match jsonpb output (policyNames)

* credential: add StoreTypeGrpc constant

* credential: add IAM gRPC store boilerplate

* credential: implement identity methods in gRPC store

* credential: implement policy methods in gRPC store

* admin: use gRPC credential store for AdminServer

This ensures that all IAM and policy changes made through the Admin UI
are persisted via the Filer's IAM gRPC service instead of direct file manipulation.

* shell: s3.configure use granular IAM gRPC APIs instead of full config patching

* shell: s3.configure use granular IAM gRPC APIs

* shell: replace deprecated ioutil with os in s3.policy

* filer: use gRPC FailedPrecondition for unconfigured credential manager

* test: improve s3.policy integration tests and fix error checks

* ci: add s3 policy shell integration tests to github workflow

* filer: fix LoadCredentialConfiguration error handling

* credential/grpc: propagate unmarshal errors in GetPolicies

* filer/grpc: improve error handling and validation

* shell: use gRPC status codes in s3.configure

* credential: document PutPolicy as create-or-replace

* credential/postgres: reuse CreatePolicy in PutPolicy to deduplicate logic

* shell: add timeout context and strictly enforce flags in s3.policy

* iam: standardize policy content field naming in gRPC and proto

* shell: extract slice helper functions in s3.configure

* filer: map credential store errors to gRPC status codes

* filer: add input validation for UpdateUser and CreateAccessKey

* iam: improve validation in policy and config handlers

* filer: ensure IAM service registration by defaulting credential manager

* credential: add GetStoreName method to manager

* test: verify policy deletion in integration test
2026-01-25 13:39:30 -08:00
Chris Lu
535be3096b Add AWS IAM integration tests and refactor admin authorization (#8098)
* Add AWS IAM integration tests and refactor admin authorization
- Added AWS IAM management integration tests (User, AccessKey, Policy)
- Updated test framework to support IAM client creation with JWT/OIDC
- Refactored s3api authorization to be policy-driven for IAM actions
- Removed hardcoded role name checks for admin privileges
- Added new tests to GitHub Actions basic test matrix

* test(s3/iam): add UpdateUser and UpdateAccessKey tests and fix nil pointer dereference

* feat(s3api): add DeletePolicy and update tests with cleanup logic

* test(s3/iam): use t.Cleanup for managed policy deletion in CreatePolicy test
2026-01-23 16:41:51 -08:00
Chris Lu
d664ca5ed3 fix: IAM authentication with AWS Signature V4 and environment credentials (#8099)
* fix: IAM authentication with AWS Signature V4 and environment credentials

Three key fixes for authenticated IAM requests to work:

1. Fix request body consumption before signature verification
   - iamMatcher was calling r.ParseForm() which consumed POST body
   - This broke AWS Signature V4 verification on subsequent reads
   - Now only check query string in matcher, preserving body for verification
   - File: weed/s3api/s3api_server.go

2. Preserve environment variable credentials across config reloads
   - After IAM mutations, config reload overwrote env var credentials
   - Extract env var loading into loadEnvironmentVariableCredentials()
   - Call after every config reload to persist credentials
   - File: weed/s3api/auth_credentials.go

3. Add authenticated IAM tests and test infrastructure
   - New TestIAMAuthenticated suite with AWS SDK + Signature V4
   - Dynamic port allocation for independent test execution
   - Flag reset to prevent state leakage between tests
   - CI workflow to run S3 and IAM tests separately
   - Files: test/s3/example/*, .github/workflows/s3-example-integration-tests.yml

All tests pass:
- TestIAMCreateUser (unauthenticated)
- TestIAMAuthenticated (with AWS Signature V4)
- S3 integration tests

* fmt

* chore: rename test/s3/example to test/s3/normal

* simplify: CI runs all integration tests in single job

* Update s3-example-integration-tests.yml

* ci: run each test group separately to avoid raft registry conflicts
2026-01-23 16:27:42 -08:00
Chris Lu
14f44379cb test: fix flaky S3 volume encryption test (#8083)
Specifically:
- Use bytes.NewReader for binary data instead of strings.NewReader
- Increase binary test data from 8 bytes to 1KB to avoid edge cases
- Add 50ms delay between subtests to prevent overwhelming the server
2026-01-21 19:15:53 -08:00
Chris Lu
51735e667c Fix S3 conditional writes with versioning (Issue #8073) (#8080)
* Fix S3 conditional writes with versioning (Issue #8073)

Refactors conditional header checks to properly resolve the latest object version when versioning is enabled. This prevents incorrect validation against non-versioned root objects.

* Add integration test for S3 conditional writes with versioning (Issue #8073)

* Refactor: Propagate internal errors in conditional header checks

- Make resolveObjectEntry return errors from isVersioningConfigured
- Update checkConditionalHeaders checks to return 500 on internal resolve errors

* Refactor: Stricter error handling and test assertions

- Propagate internal errors in checkConditionalHeaders*WithGetter functions
- Enforce strict 412 PreconditionFailed check in integration test

* Perf: Add early return for conditional headers + safety improvements

- Add fast path to skip resolveObjectEntry when no conditional headers present
- Avoids expensive getLatestObjectVersion retries in common case
- Add nil checks before dereferencing pointers in integration test
- Fix grammar in test comments
- Remove duplicate comment in resolveObjectEntry

* Refactor: Use errors.Is for robust ErrNotFound checking

- Update checkConditionalHeaders* to use errors.Is(err, filer_pb.ErrNotFound)
- Update resolveObjectEntry to use errors.Is for wrapped error compatibility
- Remove duplicate comment lines in s3api handlers

* Perf: Optimize resolveObjectEntry for conditional checks

- Refactor getLatestObjectVersion to doGetLatestObjectVersion supporting variable retries
- Use 1-retry path in resolveObjectEntry to avoid exponential backoff latency

* Test: Enhance integration test with content verification

- Verify actual object content equals expected content after successful conditional write
- Add missing io and errors imports to test file

* Refactor: Final refinements based on feedback

- Optimize header validation by passing parsed headers to avoid redundant parsing
- Simplify integration test assertions using require.Error and assert.True
- Fix build errors in s3api handler and test imports

* Test: Use smithy.APIError for robust error code checking

- Replace string-based error checking with structured API error
- Add smithy-go import for AWS SDK v2 error handling

* Test: Use types.PreconditionFailed and handle io.ReadAll error

- Replace smithy.APIError with more specific types.PreconditionFailed
- Add proper error handling for io.ReadAll in content verification

* Refactor: Use combined error checking and add nil guards

- Use smithy.APIError with ErrorCode() for robust error checking
- Add nil guards for entry.Attributes before accessing Mtime
- Prevents potential panics when Attributes is uninitialized
2026-01-21 16:36:18 -08:00
madavic
86c61e86c9 fix: S3 copying test Makefile syntax and add S3_ENDPOINT env support (#8042)
* fix: S3 copying test Makefile syntax and add S3_ENDPOINT env support

* fix: add weed mini to stop-seaweedfs target

Ensure weed mini process is properly killed when stopping SeaweedFS,
matching the process started in start-seaweedfs target.

* Clean up PID file in stop-seaweedfs and clean targets

Address review feedback to ensure /tmp/weed-mini.pid is removed
for a clean state after tests.
2026-01-16 12:23:12 -08:00
Chris Lu
ee3813787e feat(s3api): Implement S3 Policy Variables (#8039)
* feat: Add AWS IAM Policy Variables support to S3 API

Implements policy variables for dynamic access control in bucket policies.

Supported variables:
- aws:username - Extracted from principal ARN
- aws:userid - User identifier (same as username in SeaweedFS)
- aws:principaltype - IAMUser, IAMRole, or AssumedRole
- jwt:* - Any JWT claim (e.g., jwt:preferred_username, jwt:sub)

Key changes:
- Added PolicyVariableRegex to detect ${...} patterns
- Extended CompiledStatement with DynamicResourcePatterns, DynamicPrincipalPatterns, DynamicActionPatterns
- Added Claims field to PolicyEvaluationArgs for JWT claim access
- Implemented SubstituteVariables() for variable replacement from context and JWT claims
- Implemented extractPrincipalVariables() for ARN parsing
- Updated EvaluateConditions() to support variable substitution
- Comprehensive unit and integration tests

Resolves #8037

* feat: Add LDAP and PrincipalAccount variable support

Completes future enhancements for policy variables:

- Added ldap:* variable support for LDAP claims
  - ldap:username - LDAP username from claims
  - ldap:dn - LDAP distinguished name from claims
  - ldap:* - Any LDAP claim

- Added aws:PrincipalAccount extraction from ARN
  - Extracts account ID from principal ARN
  - Available as ${aws:PrincipalAccount} in policies

Updated SubstituteVariables() to check LDAP claims
Updated extractPrincipalVariables() to extract account ID
Added comprehensive tests for new variables

* feat(s3api): implement IAM policy variables core logic and optimization

* feat(s3api): integrate policy variables with S3 authentication and handlers

* test(s3api): add integration tests for policy variables

* cleanup: remove unused policy conversion files

* Add S3 policy variables integration tests and path support

- Add comprehensive integration tests for policy variables
- Test username isolation, JWT claims, LDAP claims
- Add support for IAM paths in principal ARN parsing
- Add tests for principals with paths

* Fix IAM Role principal variable extraction

IAM Roles should not have aws:userid or aws:PrincipalAccount
according to AWS behavior. Only IAM Users and Assumed Roles
should have these variables.

Fixes TestExtractPrincipalVariables test failures.

* Security fixes and bug fixes for S3 policy variables

SECURITY FIXES:
- Prevent X-SeaweedFS-Principal header spoofing by clearing internal
  headers at start of authentication (auth_credentials.go)
- Restrict policy variable substitution to safe allowlist to prevent
  client header injection (iam/policy/policy_engine.go)
- Add core policy validation before storing bucket policies

BUG FIXES:
- Remove unused sid variable in evaluateStatement
- Fix LDAP claim lookup to check both prefixed and unprefixed keys
- Add ValidatePolicy call in PutBucketPolicyHandler

These fixes prevent privilege escalation via header injection and
ensure only validated identity claims are used in policy evaluation.

* Additional security fixes and code cleanup

SECURITY FIXES:
- Fixed X-Forwarded-For spoofing by only trusting proxy headers from
  private/localhost IPs (s3_iam_middleware.go)
- Changed context key from "sourceIP" to "aws:SourceIp" for proper
  policy variable substitution

CODE IMPROVEMENTS:
- Kept aws:PrincipalAccount for IAM Roles to support condition evaluations
- Removed redundant STS principaltype override
- Removed unused service variable
- Cleaned up commented-out debug logging statements
- Updated tests to reflect new IAM Role behavior

These changes prevent IP spoofing attacks and ensure policy variables
work correctly with the safe allowlist.

* Add security documentation for ParseJWTToken

Added comprehensive security comments explaining that ParseJWTToken
is safe despite parsing without verification because:
- It's only used for routing to the correct verification method
- All code paths perform cryptographic verification before trusting claims
- OIDC tokens: validated via validateExternalOIDCToken
- STS tokens: validated via ValidateSessionToken

Enhanced function documentation with clear security warnings about
proper usage to prevent future misuse.

* Fix IP condition evaluation to use aws:SourceIp key

Fixed evaluateIPCondition in IAM policy engine to use "aws:SourceIp"
instead of "sourceIP" to match the updated extractRequestContext.

This fixes the failing IP-restricted role test where IP-based policy
conditions were not being evaluated correctly.

Updated all test cases to use the correct "aws:SourceIp" key.

* Address code review feedback: optimize and clarify

PERFORMANCE IMPROVEMENT:
- Optimized expandPolicyVariables to use regexp.ReplaceAllStringFunc
  for single-pass variable substitution instead of iterating through
  all safe variables. This improves performance from O(n*m) to O(m)
  where n is the number of safe variables and m is the pattern length.

CODE CLARITY:
- Added detailed comment explaining LDAP claim fallback mechanism
  (checks both prefixed and unprefixed keys for compatibility)
- Enhanced TODO comment for trusted proxy configuration with rationale
  and recommendations for supporting cloud load balancers, CDNs, and
  complex network topologies

All tests passing.

* Address Copilot code review feedback

BUG FIXES:
- Fixed type switch for int/int32/int64 - separated into individual cases
  since interface type switches only match the first type in multi-type cases
- Fixed grammatically incorrect error message in types.go

CODE QUALITY:
- Removed duplicate Resource/NotResource validation (already in ValidateStatement)
- Added comprehensive comment explaining isEnabled() logic and security implications
- Improved trusted proxy NOTE comment to be more concise while noting limitations

All tests passing.

* Fix test failures after extractSourceIP security changes

Updated tests to work with the security fix that only trusts
X-Forwarded-For/X-Real-IP headers from private IP addresses:

- Set RemoteAddr to 127.0.0.1 in tests to simulate trusted proxy
- Changed context key from "sourceIP" to "aws:SourceIp"
- Added test case for untrusted proxy (public RemoteAddr)
- Removed invalid ValidateStatement call (validation happens in ValidatePolicy)

All tests now passing.

* Address remaining Gemini code review feedback

CODE SAFETY:
- Deep clone Action field in CompileStatement to prevent potential data races
  if the original policy document is modified after compilation

TEST CLEANUP:
- Remove debug logging (fmt.Fprintf) from engine_notresource_test.go
- Remove unused imports in engine_notresource_test.go

All tests passing.

* Fix insecure JWT parsing in IAM auth flow

SECURITY FIX:
- Renamed ParseJWTToken to ParseUnverifiedJWTToken with explicit security warnings.
- Refactored AuthenticateJWT to use the trusted SessionInfo returned by ValidateSessionToken
  instead of relying on unverified claims from the initial parse.
- Refactored ValidatePresignedURLWithIAM to reuse the robust AuthenticateJWT logic, removing
  duplicated and insecure manual token parsing.

This ensures all identity information (Role, Principal, Subject) used for authorization
decisions is derived solely from cryptographically verified tokens.

* Security: Fix insecure JWT claim extraction in policy engine

- Refactored EvaluatePolicy to accept trusted claims from verified Identity instead of parsing unverified tokens
- Updated AuthenticateJWT to populate Claims in IAMIdentity from verified sources (SessionInfo/ExternalIdentity)
- Updated s3api_server and handlers to pass claims correctly
- Improved isPrivateIP to support IPv6 loopback, link-local, and ULA
- Fixed flaky distributed_session_consistency test with retry logic

* fix(iam): populate Subject in STSSessionInfo to ensure correct identity propagation

This fixes the TestS3IAMAuthentication/valid_jwt_token_authentication failure by ensuring the session subject (sub) is correctly mapped to the internal SessionInfo struct, allowing bucket ownership validation to succeed.

* Optimized isPrivateIP

* Create s3-policy-tests.yml

* fix tests

* fix tests

* tests(s3/iam): simplify policy to resource-based \ (step 1)

* tests(s3/iam): add explicit Deny NotResource for isolation (step 2)

* fixes

* policy: skip resource matching for STS trust policies to allow AssumeRole evaluation

* refactor: remove debug logging and hoist policy variables for performance

* test: fix TestS3IAMBucketPolicyIntegration cleanup to handle per-subtest object lifecycle

* test: fix bucket name generation to comply with S3 63-char limit

* test: skip TestS3IAMPolicyEnforcement until role setup is implemented

* test: use weed mini for simpler test server deployment

Replace 'weed server' with 'weed mini' for IAM tests to avoid port binding issues
and simplify the all-in-one server deployment. This improves test reliability
and execution time.

* security: prevent allocation overflow in policy evaluation

Add maxPoliciesForEvaluation constant to cap the number of policies evaluated
in a single request. This prevents potential integer overflow when allocating
slices for policy lists that may be influenced by untrusted input.

Changes:
- Add const maxPoliciesForEvaluation = 1024 to set an upper bound
- Validate len(policies) < maxPoliciesForEvaluation before appending bucket policy
- Use append() instead of make([]string, len+1) to avoid arithmetic overflow
- Apply fix to both IsActionAllowed policy evaluation paths
2026-01-16 11:12:28 -08:00
Chris Lu
905e7e72d9 Add remote.copy.local command to copy local files to remote storage (#8033)
* Add remote.copy.local command to copy local files to remote storage

This new command solves the issue described in GitHub Discussion #8031 where
files exist locally but are not synced to remote storage due to missing filer logs.

Features:
- Copies local-only files to remote storage
- Supports file filtering (include/exclude patterns)
- Dry run mode to preview actions
- Configurable concurrency for performance
- Force update option for existing remote files
- Comprehensive error handling with retry logic

Usage:
  remote.copy.local -dir=/path/to/mount/dir [options]

This addresses the need to manually sync files when filer logs were
deleted or when local files were never synced to remote storage.

* shell: rename commandRemoteLocalSync to commandRemoteCopyLocal

* test: add comprehensive remote cache integration tests

* shell: fix forceUpdate logic in remote.copy.local

The previous logic only allowed force updates when localEntry.RemoteEntry
was not nil, which defeated the purpose of using -forceUpdate to fix
inconsistencies where local metadata might be missing.

Now -forceUpdate will overwrite remote files whenever they exist,
regardless of local metadata state.

* shell: fix code review issues in remote.copy.local

- Return actual error from flag parsing instead of swallowing it
- Use sync.Once to safely capture first error in concurrent operations
- Add atomic counter to track actual successful copies
- Protect concurrent writes to output with mutex to prevent interleaving
- Fix path matching to prevent false positives with sibling directories
  (e.g., /mnt/remote2 no longer matches /mnt/remote)

* test: address code review nitpicks in integration tests

- Improve create_bucket error handling to fail on real errors
- Fix test assertions to properly verify expected failures
- Use case-insensitive string matching for error detection
- Replace weak logging-only tests with proper assertions
- Remove extra blank line in Makefile

* test: remove redundant edge case tests

Removed 5 tests that were either duplicates or didn't assert meaningful behavior:
- TestEdgeCaseEmptyDirectory (duplicate of TestRemoteCopyLocalEmptyDirectory)
- TestEdgeCaseRapidCacheUncache (no meaningful assertions)
- TestEdgeCaseConcurrentCommands (only logs errors, no assertions)
- TestEdgeCaseInvalidPaths (no security assertions)
- TestEdgeCaseFileNamePatterns (duplicate of pattern tests in cache tests)

Kept valuable stress tests: nested directories, special characters,
very large files (100MB), many small files (100), and zero-byte files.

* test: fix CI failures by forcing localhost IP advertising

Added -ip=127.0.0.1 flag to both primary and remote weed mini commands
to prevent IP auto-detection issues in CI environments. Without this flag,
the master would advertise itself using the actual IP (e.g., 10.1.0.17)
while binding to 127.0.0.1, causing connection refused errors when other
services tried to connect to the gRPC port.

* test: address final code review issues

- Add proper error assertions for concurrent commands test
- Require errors for invalid path tests instead of just logging
- Remove unused 'match' field from pattern test struct
- Add dry-run output assertion to verify expected behavior
- Simplify redundant condition in remote.copy.local (remove entry.RemoteEntry check)

* test: fix remote.configure tests to match actual validation rules

- Use only letters in remote names (no numbers) to match validation
- Relax missing parameter test expectations since validation may not be strict
- Generate unique names using letter suffix instead of numbers

* shell: rename pathToCopyCopy to localPath for clarity

Improved variable naming in concurrent copy loop to make the code
more readable and less repetitive.

* test: fix remaining test failures

- Remove strict error requirement for invalid paths (commands handle gracefully)
- Fix TestRemoteUncacheBasic to actually test uncache instead of cache
- Use simple numeric names for remote.configure tests (testcfg1234 format)
  to avoid validation issues with letter-only or complex name generation

* test: use only letters in remote.configure test names

The validation regex ^[A-Za-z][A-Za-z0-9]*$ requires names to start with
a letter, but using static letter-only names avoids any potential issues
with the validation.

* test: remove quotes from -name parameter in remote.configure tests

Single quotes were being included as part of the name value, causing
validation failures. Changed from -name='testremote' to -name=testremote.

* test: fix remote.configure assertion to be flexible about JSON formatting

Changed from checking exact JSON format with specific spacing to just
checking if the name appears in the output, since JSON formatting
may vary (e.g., "name":  "value" vs "name": "value").
2026-01-15 00:52:57 -08:00
Chris Lu
06391701ed Add AssumeRole and AssumeRoleWithLDAPIdentity STS actions (#8003)
* test: add integration tests for AssumeRole and AssumeRoleWithLDAPIdentity STS actions

- Add s3_sts_assume_role_test.go with comprehensive tests for AssumeRole:
  * Parameter validation (missing RoleArn, RoleSessionName, invalid duration)
  * AWS SigV4 authentication with valid/invalid credentials
  * Temporary credential generation and usage

- Add s3_sts_ldap_test.go with tests for AssumeRoleWithLDAPIdentity:
  * Parameter validation (missing LDAP credentials, RoleArn)
  * LDAP authentication scenarios (valid/invalid credentials)
  * Integration with LDAP server (when configured)

- Update Makefile with new test targets:
  * test-sts: run all STS tests
  * test-sts-assume-role: run AssumeRole tests only
  * test-sts-ldap: run LDAP STS tests only
  * test-sts-suite: run tests with full service lifecycle

- Enhance setup_all_tests.sh:
  * Add OpenLDAP container setup for LDAP testing
  * Create test LDAP users (testuser, ldapadmin)
  * Set LDAP environment variables for tests
  * Update cleanup to remove LDAP container

- Fix setup_keycloak.sh:
  * Enable verbose error logging for realm creation
  * Improve error diagnostics

Tests use fail-fast approach (t.Fatal) when server not configured,
ensuring clear feedback when infrastructure is missing.

* feat: implement AssumeRole and AssumeRoleWithLDAPIdentity STS actions

Implement two new STS actions to match MinIO's STS feature set:

**AssumeRole Implementation:**
- Add handleAssumeRole with full AWS SigV4 authentication
- Integrate with existing IAM infrastructure via verifyV4Signature
- Validate required parameters (RoleArn, RoleSessionName)
- Validate DurationSeconds (900-43200 seconds range)
- Generate temporary credentials with expiration
- Return AWS-compatible XML response

**AssumeRoleWithLDAPIdentity Implementation:**
- Add handleAssumeRoleWithLDAPIdentity handler (stub)
- Validate LDAP-specific parameters (LDAPUsername, LDAPPassword)
- Validate common STS parameters (RoleArn, RoleSessionName, DurationSeconds)
- Return proper error messages for missing LDAP provider
- Ready for LDAP provider integration

**Routing Fixes:**
- Add explicit routes for AssumeRole and AssumeRoleWithLDAPIdentity
- Prevent IAM handler from intercepting authenticated STS requests
- Ensure proper request routing priority

**Handler Infrastructure:**
- Add IAM field to STSHandlers for SigV4 verification
- Update NewSTSHandlers to accept IAM reference
- Add STS-specific error codes and response types
- Implement writeSTSErrorResponse for AWS-compatible errors

The AssumeRole action is fully functional and tested.
AssumeRoleWithLDAPIdentity requires LDAP provider implementation.

* fix: update IAM matcher to exclude STS actions from interception

Update the IAM handler matcher to check for STS actions (AssumeRole,
AssumeRoleWithWebIdentity, AssumeRoleWithLDAPIdentity) and exclude them
from IAM handler processing. This allows STS requests to be handled by
the STS fallback handler even when they include AWS SigV4 authentication.

The matcher now parses the form data to check the Action parameter and
returns false for STS actions, ensuring they are routed to the correct
handler.

Note: This is a work-in-progress fix. Tests are still showing some
routing issues that need further investigation.

* fix: address PR review security issues for STS handlers

This commit addresses all critical security issues from PR review:

Security Fixes:
- Use crypto/rand for cryptographically secure credential generation
  instead of time.Now().UnixNano() (fixes predictable credentials)
- Add sts:AssumeRole permission check via VerifyActionPermission to
  prevent unauthorized role assumption
- Generate proper session tokens using crypto/rand instead of
  placeholder strings

Code Quality Improvements:
- Refactor DurationSeconds parsing into reusable parseDurationSeconds()
  helper function used by all three STS handlers
- Create generateSecureCredentials() helper for consistent and secure
  temporary credential generation
- Fix iamMatcher to check query string as fallback when Action not
  found in form data

LDAP Provider Implementation:
- Add go-ldap/ldap/v3 dependency
- Create LDAPProvider implementing IdentityProvider interface with
  full LDAP authentication support (connect, bind, search, groups)
- Update ProviderFactory to create real LDAP providers
- Wire LDAP provider into AssumeRoleWithLDAPIdentity handler

Test Infrastructure:
- Add LDAP user creation verification step in setup_all_tests.sh

* fix: address PR feedback (Round 2) - config validation & provider improvements

- Implement `validateLDAPConfig` in `ProviderFactory`
- Improve `LDAPProvider.Initialize`:
  - Support `connectionTimeout` parsing (string/int/float) from config map
  - Warn if `BindDN` is present but `BindPassword` is empty
- Improve `LDAPProvider.GetUserInfo`:
  - Add fallback to `searchUserGroups` if `memberOf` returns no groups (consistent with Authenticate)

* fix: address PR feedback (Round 3) - LDAP connection improvements & build fix

- Improve `LDAPProvider` connection handling:
  - Use `net.Dialer` with configured timeout for connection establishment
  - Enforce TLS 1.2+ (`MinVersion: tls.VersionTLS12`) for both LDAPS and StartTLS
- Fix build error in `s3api_sts.go` (format verb for ErrorCode)

* fix: address PR feedback (Round 4) - LDAP hardening, Authz check & Routing fix

- LDAP Provider Hardening:
  - Prevent re-initialization
  - Enforce single user match in `GetUserInfo` (was explicit only in Authenticate)
  - Ensure connection closure if StartTLS fails
- STS Handlers:
  - Add robust provider detection using type assertion
  - **Security**: Implement authorization check (`VerifyActionPermission`) after LDAP authentication
- Routing:
  - Update tests to reflect that STS actions are handled by STS handler, not generic IAM

* fix: address PR feedback (Round 5) - JWT tokens, ARN formatting, PrincipalArn

CRITICAL FIXES:
- Replace standalone credential generation with STS service JWT tokens
  - handleAssumeRole now generates proper JWT session tokens
  - handleAssumeRoleWithLDAPIdentity now generates proper JWT session tokens
  - Session tokens can be validated across distributed instances

- Fix ARN formatting in responses
  - Extract role name from ARN using utils.ExtractRoleNameFromArn()
  - Prevents malformed ARNs like "arn:aws:sts::assumed-role/arn:aws:iam::..."

- Add configurable AccountId for federated users
  - Add AccountId field to STSConfig (defaults to "111122223333")
  - PrincipalArn now uses configured account ID instead of hardcoded "aws"
  - Enables proper trust policy validation

IMPROVEMENTS:
- Sanitize LDAP authentication error messages (don't leak internal details)
- Remove duplicate comment in provider detection
- Add utils import for ARN parsing utilities

* feat: implement LDAP connection pooling to prevent resource exhaustion

PERFORMANCE IMPROVEMENT:
- Add connection pool to LDAPProvider (default size: 10 connections)
- Reuse LDAP connections across authentication requests
- Prevent file descriptor exhaustion under high load

IMPLEMENTATION:
- connectionPool struct with channel-based connection management
- getConnection(): retrieves from pool or creates new connection
- returnConnection(): returns healthy connections to pool
- createConnection(): establishes new LDAP connection with TLS support
- Close(): cleanup method to close all pooled connections
- Connection health checking (IsClosing()) before reuse

BENEFITS:
- Reduced connection overhead (no TCP handshake per request)
- Better resource utilization under load
- Prevents "too many open files" errors
- Non-blocking pool operations (creates new conn if pool empty)

* fix: correct TokenGenerator access in STS handlers

CRITICAL FIX:
- Make TokenGenerator public in STSService (was private tokenGenerator)
- Update all references from Config.TokenGenerator to TokenGenerator
- Remove TokenGenerator from STSConfig (it belongs in STSService)

This fixes the "NotImplemented" errors in distributed and Keycloak tests.
The issue was that Round 5 changes tried to access Config.TokenGenerator
which didn't exist - TokenGenerator is a field in STSService, not STSConfig.

The TokenGenerator is properly initialized in STSService.Initialize() and
is now accessible for JWT token generation in AssumeRole handlers.

* fix: update tests to use public TokenGenerator field

Following the change to make TokenGenerator public in STSService,
this commit updates the test files to reference the correct public field name.
This resolves compilation errors in the IAM STS test suite.

* fix: update distributed tests to use valid Keycloak users

Updated s3_iam_distributed_test.go to use 'admin-user' and 'read-user'
which exist in the standard Keycloak setup provided by setup_keycloak.sh.
This resolves 'unknown test user' errors in distributed integration tests.

* fix: ensure iam_config.json exists in setup target for CI

The GitHub Actions workflow calls 'make setup' which was not creating
iam_config.json, causing the server to start without IAM integration
enabled (iamIntegration = nil), resulting in NotImplemented errors.

Now 'make setup' copies iam_config.local.json to iam_config.json if
it doesn't exist, ensuring IAM is properly configured in CI.

* fix(iam/ldap): fix connection pool race and rebind corruption

- Add atomic 'closed' flag to connection pool to prevent racing on Close()
- Rebind authenticated user connections back to service account before returning to pool
- Close connections on error instead of returning potentially corrupted state to pool

* fix(iam/ldap): populate standard TokenClaims fields in ValidateToken

- Set Subject, Issuer, Audience, IssuedAt, and ExpiresAt to satisfy the interface
- Use time.Time for timestamps as required by TokenClaims struct
- Default to 1 hour TTL for LDAP tokens

* fix(s3api): include account ID in STS AssumedRoleUser ARN

- Consistent with AWS, include the account ID in the assumed-role ARN
- Use the configured account ID from STS service if available, otherwise default to '111122223333'
- Apply to both AssumeRole and AssumeRoleWithLDAPIdentity handlers
- Also update .gitignore to ignore IAM test environment files

* refactor(s3api): extract shared STS credential generation logic

- Move common logic for session claims and credential generation to prepareSTSCredentials
- Update handleAssumeRole and handleAssumeRoleWithLDAPIdentity to use the helper
- Remove stale comments referencing outdated line numbers

* feat(iam/ldap): make pool size configurable and add audience support

- Add PoolSize to LDAPConfig (default 10)
- Add Audience to LDAPConfig to align with OIDC validation
- Update initialization and ValidateToken to use new fields

* update tests

* debug

* chore(iam): cleanup debug prints and fix test config port

* refactor(iam): use mapstructure for LDAP config parsing

* feat(sts): implement strict trust policy validation for AssumeRole

* test(iam): refactor STS tests to use AWS SDK signer

* test(s3api): implement ValidateTrustPolicyForPrincipal in MockIAMIntegration

* fix(s3api): ensure IAM matcher checks query string on ParseForm error

* fix(sts): use crypto/rand for secure credentials and extract constants

* fix(iam): fix ldap connection leaks and add insecure warning

* chore(iam): improved error wrapping and test parameterization

* feat(sts): add support for LDAPProviderName parameter

* Update weed/iam/ldap/ldap_provider.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/s3api/s3api_sts.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(sts): use STSErrSTSNotReady when LDAP provider is missing

* fix(sts): encapsulate TokenGenerator in STSService and add getter

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-12 10:45:24 -08:00
Chris Lu
217d8b9e0e Fix: ListObjectVersions delimiter support (#7987)
* Fix: Add delimiter support to ListObjectVersions with proper truncation

- Implemented delimiter support to group keys into CommonPrefixes
- Fixed critical truncation bug: now merges versions and common prefixes into single sorted list before truncation
- Ensures total items never exceed MaxKeys (prevents infinite pagination loops)
- Properly sets NextKeyMarker and NextVersionIdMarker for pagination
- Added integration tests in test/s3/versioning/s3_versioning_delimiter_test.go
- Verified behavior matches S3 API specification

* Fix: Add delimiter support to ListObjectVersions with proper truncation

- Implemented delimiter support to group keys into CommonPrefixes
- Fixed critical truncation bug: now merges versions and common prefixes before truncation
- Added safety guard for maxKeys=0 to prevent panics
- Condensed verbose comments for better readability
- Added robust Go integration tests with nil checks for AWS SDK pointers
- Verified behavior matches S3 API specification
- Resolved compilation error in integration tests
- Refined pagination comments and ensured exclusive KeyMarker behavior
- Refactored listObjectVersions into helper methods for better maintainability
2026-01-08 14:02:19 -08:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00