Commit Graph

12272 Commits

Author SHA1 Message Date
Chris Lu
d6d893c8c3 s3: add s3:ExistingObjectTag condition support for bucket policies (#7677)
* s3: add s3:ExistingObjectTag condition support in policy engine

Add support for s3:ExistingObjectTag/<tag-key> condition keys in bucket
policies, allowing access control based on object tags.

Changes:
- Add ObjectEntry field to PolicyEvaluationArgs (entry.Extended metadata)
- Update EvaluateConditions to handle s3:ExistingObjectTag/<key> format
- Extract tag value from entry metadata using X-Amz-Tagging-<key> prefix

This enables policies like:
{
  "Condition": {
    "StringEquals": {
      "s3:ExistingObjectTag/status": ["public"]
    }
  }
}

Fixes: https://github.com/seaweedfs/seaweedfs/issues/7447

* s3: update EvaluatePolicy to accept object entry for tag conditions

Update BucketPolicyEngine.EvaluatePolicy to accept objectEntry parameter
(entry.Extended metadata) for evaluating tag-based policy conditions.

Changes:
- Add objectEntry parameter to EvaluatePolicy method
- Update callers in auth_credentials.go and s3api_bucket_handlers.go
- Pass nil for objectEntry in auth layer (entry fetched later in handlers)

For tag-based conditions to work, handlers should call EvaluatePolicy
with the object's entry.Extended after fetching the entry from filer.

* s3: add tests for s3:ExistingObjectTag policy conditions

Add comprehensive tests for object tag-based policy conditions:

- TestExistingObjectTagCondition: Basic tag matching scenarios
  - Matching/non-matching tag values
  - Missing tags, no tags, empty tags
  - Multiple tags with one matching

- TestExistingObjectTagConditionMultipleTags: Multiple tag conditions
  - Both tags match
  - Only one tag matches

- TestExistingObjectTagDenyPolicy: Deny policies with tag conditions
  - Default allow without tag
  - Deny when specific tag present

* s3: document s3:ExistingObjectTag support and feature status

Update policy engine documentation:

- Add s3:ExistingObjectTag/<tag-key> to supported condition keys
- Add 'Object Tag-Based Access Control' section with examples
- Add 'Feature Status' section with implemented and planned features

Planned features for future implementation:
- s3:RequestObjectTag/<key>
- s3:RequestObjectTagKeys
- s3:x-amz-server-side-encryption
- Cross-account access

* Implement tag-based policy re-check in handlers

- Add checkPolicyWithEntry helper to S3ApiServer for handlers to re-check
  policy after fetching object entry (for s3:ExistingObjectTag conditions)
- Add HasPolicyForBucket method to policy engine for efficient check
- Integrate policy re-check in GetObjectHandler after entry is fetched
- Integrate policy re-check in HeadObjectHandler after entry is fetched
- Update auth_credentials.go comments to explain two-phase evaluation
- Update documentation with supported operations for tag-based conditions

This implements 'Approach 1' where handlers re-check the policy with
the object entry after fetching it, allowing tag-based conditions to
be properly evaluated.

* Add integration tests for s3:ExistingObjectTag conditions

- Add TestCheckPolicyWithEntry: tests checkPolicyWithEntry helper with various
  tag scenarios (matching tags, non-matching tags, empty entry, nil entry)
- Add TestCheckPolicyWithEntryNoPolicyForBucket: tests early return when no policy
- Add TestCheckPolicyWithEntryNilPolicyEngine: tests nil engine handling
- Add TestCheckPolicyWithEntryDenyPolicy: tests deny policies with tag conditions
- Add TestHasPolicyForBucket: tests HasPolicyForBucket method

These tests cover the Phase 2 policy evaluation with object entry metadata,
ensuring tag-based conditions are properly evaluated.

* Address code review nitpicks

- Remove unused extractObjectTags placeholder function (engine.go)
- Add clarifying comment about s3:ExistingObjectTag/<key> evaluation
- Consolidate duplicate tag-based examples in README
- Factor out tagsToEntry helper to package level in tests

* Address code review feedback

- Fix unsafe type assertions in GetObjectHandler and HeadObjectHandler
  when getting identity from context (properly handle type assertion failure)
- Extract getConditionContextValue helper to eliminate duplicated logic
  between EvaluateConditions and EvaluateConditionsLegacy
- Ensure consistent handling of missing condition keys (always return
  empty slice)

* Fix GetObjectHandler to match HeadObjectHandler pattern

Add safety check for nil objectEntryForSSE before tag-based policy
evaluation, ensuring tag-based conditions are always evaluated rather
than silently skipped if entry is unexpectedly nil.

Addresses review comment from Copilot.

* Fix HeadObject action name in docs for consistency

Change 'HeadObject' to 's3:HeadObject' to match other action names.

* Extract recheckPolicyWithObjectEntry helper to reduce duplication

Move the repeated identity extraction and policy re-check logic from
GetObjectHandler and HeadObjectHandler into a shared helper method.

* Add validation for empty tag key in s3:ExistingObjectTag condition

Prevent potential issues with malformed policies containing
s3:ExistingObjectTag/ (empty tag key after slash).
2025-12-09 09:48:13 -08:00
Chris Lu
d5f21fd8ba fix: add missing backslash for volume extraArgs in helm chart (#7676)
Fixes #7467

The -mserver argument line in volume-statefulset.yaml was missing a
trailing backslash, which prevented extraArgs from being passed to
the weed volume process.

Also:
- Extracted master server list generation logic into shared helper
  templates in _helpers.tpl for better maintainability
- Updated all occurrences of deprecated -mserver flag to -master
  across docker-compose files, test files, and documentation
2025-12-08 23:21:02 -08:00
Chris Lu
cea12ba3c4 fix: prevent makeslice panic in ReadNeedleMeta with corrupted needle (#7675)
* fix: prevent makeslice panic in ReadNeedleMeta with corrupted needle

When a needle's DataSize in the .dat file is corrupted to a very large
value, the calculation of metaSize can become negative, causing a panic
with 'makeslice: len out of range' when creating the metadata slice.

This fix adds validation to check if metaSize is negative before
creating the slice, returning a descriptive error instead of panicking.

Fixes #7475

* Update weed/storage/needle/needle_read_page.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-08 22:26:03 -08:00
Chris Lu
9196696278 mount: add mutex to DirectoryHandle to fix race condition (#7674)
* mount: add mutex to DirectoryHandle to fix race condition

When using Ganesha NFS on top of FUSE mount, ls operations would hang
forever on directories with hundreds of files. This was caused by a
race condition in DirectoryHandle where multiple concurrent readdir
operations could modify shared state (entryStream, entryStreamOffset,
isFinished) without synchronization.

The fix adds a mutex to DirectoryHandle and holds it for the entire
duration of doReadDirectory. This serializes concurrent readdir calls
on the same handle, which is the correct behavior for a directory
handle and fixes the race condition.

Key changes:
- Added sync.Mutex to DirectoryHandle struct
- Lock the mutex at the start of doReadDirectory
- This ensures thread-safe access to entryStream and other state

The lock is per-handle (not global), so different directories can
still be listed concurrently. Only concurrent operations on the
same directory handle are serialized.

Fixes: https://github.com/seaweedfs/seaweedfs/issues/7672

* mount: add mutex to DirectoryHandle to fix race condition

When using Ganesha NFS on top of FUSE mount, ls operations would hang
forever on directories with hundreds of files. This was caused by a
race condition in DirectoryHandle where multiple concurrent readdir
operations could modify shared state (entryStream, entryStreamOffset,
isFinished) without synchronization.

The fix adds a mutex to DirectoryHandle and holds it for the entire
duration of doReadDirectory. This serializes concurrent readdir calls
on the same handle, which is the correct behavior for a directory
handle and fixes the race condition.

Key changes:
- Added sync.Mutex to DirectoryHandle struct
- Lock the mutex at the start of doReadDirectory
- Optimized reset() to reuse slice capacity and allow GC of old entries

The lock is per-handle (not global), so different directories can
still be listed concurrently. Only concurrent operations on the
same directory handle are serialized.

Fixes: https://github.com/seaweedfs/seaweedfs/issues/7672
2025-12-08 21:38:09 -08:00
Chris Lu
ff4855dcbe sts: limit session duration to incoming token's exp claim (#7670)
* sts: limit session duration to incoming token's exp claim

This fixes the issue where AssumeRoleWithWebIdentity would issue sessions
that outlive the source identity token's expiration.

For use cases like GitLab CI Jobs where the ID Token has an exp claim
limited to the CI job's timeout, the STS session should not exceed that
expiration.

Changes:
- Add TokenExpiration field to ExternalIdentity struct
- Extract exp/iat/nbf claims in OIDC provider's ValidateToken
- Pass token expiration from Authenticate to ExternalIdentity
- Modify calculateSessionDuration to cap at source token's exp
- Add comprehensive tests for the new behavior

Fixes: https://github.com/seaweedfs/seaweedfs/discussions/7653

* refactor: reduce duplication in time claim extraction

Use a loop over claim names instead of repeating the same
extraction logic three times for exp, iat, and nbf claims.

* address review: add defense-in-depth for expired tokens

- Handle already-expired tokens defensively with 1 minute minimum duration
- Enforce MaxSessionLength from config as additional cap
- Fix potential nil dereference in test mock
- Add test case for expired token scenario

* remove issue reference from test

* fix: remove early return to ensure MaxSessionLength is always checked
2025-12-08 17:38:35 -08:00
Chris Lu
772459f93c fix: restore volume mount when VolumeConfigure fails (#7669)
* fix: restore volume mount when VolumeConfigure fails

When volume.configure.replication command fails (e.g., due to corrupted
.vif file), the volume was left unmounted and the master was already
notified that the volume was deleted, causing the volume to disappear.

This fix attempts to re-mount the volume when ConfigureVolume fails,
restoring the volume state and preventing data loss.

Fixes #7666

* include mount restore error in response message
2025-12-08 16:43:35 -08:00
Chris Lu
086ab3e28c Fix webhook duplicate deliveries and POST to GET conversion (#7668)
* Fix webhook duplicate deliveries and POST to GET conversion

Fixes #7667

This commit addresses two critical issues with the webhook notification system:

1. Duplicate webhook deliveries based on worker count
2. POST requests being converted to GET when following redirects

Issue 1: Multiple webhook deliveries
------------------------------------
Problem: The webhook queue was creating multiple handlers (one per worker)
that all subscribed to the same topic. With Watermill's gochannel, each
handler creates a separate subscription, and all subscriptions receive
their own copy of every message, resulting in duplicate webhook calls
equal to the worker count.

Solution: Use a single handler instead of multiple handlers to ensure
each webhook event is sent only once, regardless of worker configuration.

Issue 2: POST to GET conversion with intelligent redirect handling
------------------------------------------------------------------
Problem: When webhook endpoints returned redirects (301/302/303), Go's
default HTTP client would automatically follow them and convert POST
requests to GET requests per HTTP specification.

Solution: Implement intelligent redirect handling that:
- Prevents automatic redirects to preserve POST method
- Manually follows redirects by recreating POST requests
- Caches the final redirect destination for performance
- Invalidates cache and retries on failures (network or HTTP errors)
- Provides automatic recovery from cached endpoint failures

Benefits:
- Webhooks are now sent exactly once per event
- POST method is always preserved through redirects
- Reduced latency through redirect destination caching
- Automatic failover when cached destinations become unavailable
- Thread-safe concurrent webhook delivery

Testing:
- Added TestQueueNoDuplicateWebhooks to verify single delivery
- Added TestHttpClientFollowsRedirectAsPost for redirect handling
- Added TestHttpClientUsesCachedRedirect for caching behavior
- Added cache invalidation tests for error scenarios
- All 18 webhook tests pass successfully

* Address code review comments

- Add maxWebhookRetryDepth constant to avoid magic number
- Extract cache invalidation logic into invalidateCache() helper method
- Fix redirect handling to properly follow redirects even on retry attempts
- Remove misleading comment about nWorkers controlling handler parallelism
- Fix test assertions to match actual execution flow
- Remove trailing whitespace in test file

All tests passing.

* Refactor: use setFinalURL() instead of invalidateCache()

Replace invalidateCache() with more explicit setFinalURL() function.
This is cleaner as it makes the intent clear - we're setting the URL
(either to a value or to empty string to clear it), rather than having
a separate function just for clearing.

No functional changes, all tests passing.

* Add concurrent webhook delivery using nWorkers configuration

Webhooks were previously sent sequentially (one-by-one), which could be
a performance bottleneck for high-throughput scenarios. Now nWorkers
configuration is properly used to control concurrent webhook delivery.

Implementation:
- Added semaphore channel (buffered to nWorkers capacity)
- handleWebhook acquires semaphore slot before sending (blocks if at capacity)
- Releases slot after webhook completes
- Allows up to nWorkers concurrent webhook HTTP requests

Benefits:
- Improved throughput for slow webhook endpoints
- nWorkers config now has actual purpose (was validated but unused)
- Default 5 workers provides good balance
- Configurable from 1-100 workers based on needs

Example performance improvement:
- Before: 500ms webhook latency = ~2 webhooks/sec max
- After (5 workers): 500ms latency = ~10 webhooks/sec
- After (10 workers): 500ms latency = ~20 webhooks/sec

All tests passing.

* Replace deprecated AddNoPublisherHandler with AddConsumerHandler

AddNoPublisherHandler is deprecated in Watermill.
Use AddConsumerHandler instead, which is the current recommended API
for handlers that only consume messages without publishing.

No functional changes, all tests passing.

* Drain response bodies to enable HTTP connection reuse

Added drainBody() calls in all code paths to ensure response bodies
are consumed before returning. This is critical for HTTP keep-alive
connection reuse.

Without draining:
- Connections are closed after each request
- New TCP handshake + TLS handshake for every webhook
- Higher latency and resource usage

With draining:
- Connections are reused via HTTP keep-alive
- Significant performance improvement for repeated webhooks
- Lower latency (no handshake overhead)
- Reduced resource usage

Implementation:
- Added drainBody() helper that reads up to 1MB (prevents memory issues)
- Drain on success path (line 161)
- Drain on error responses before retry (lines 119, 152)
- Drain on redirect responses before following (line 118)
- Already had drainResponse() for network errors (line 99)

All tests passing.

* Use existing CloseResponse utility instead of custom drainBody

Replaced custom drainBody() function with the existing util_http.CloseResponse()
utility which is already used throughout the codebase. This provides:

- Consistent behavior with rest of the codebase
- Better logging (logs bytes drained via CountingReader)
- Full body drainage (not limited to 1MB)
- Cleaner code (no duplication)

CloseResponse properly drains and closes the response body to enable
HTTP keep-alive connection reuse.

All tests passing.

* Fix: Don't overwrite original error when draining response

Before: err was being overwritten by drainResponse() result
After: Use drainErr to avoid losing the original client.Do() error

This was a subtle bug where if drainResponse() succeeded (returned nil),
we would lose the original network error and potentially return a
confusing error message.

All tests passing.

* Optimize HTTP client: reuse client and remove redundant timeout

1. Reuse single http.Client instance instead of creating new one per request
   - Reduces allocation overhead
   - More efficient for high-volume webhooks

2. Remove redundant timeout configuration
   - Before: timeout set on both context AND http.Client
   - After: timeout only on context (cleaner, context fires first anyway)

Performance benefits:
- Reduced GC pressure (fewer client allocations)
- Better connection pooling (single transport instance)
- Cleaner code (no redundancy)

All tests passing.
2025-12-08 15:39:35 -08:00
Lisandro Pin
ca1ad9c4c2 Nit: have ec.encode exit immediately if no volumes are processed. (#7654)
* Nit: have `ec.encode` exit immediately if no volumes are processed.

* Update weed/shell/command_ec_encode.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

---------

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-12-08 11:12:36 -08:00
chrislu
1856eaca03 Update notification.toml 2025-12-08 11:09:59 -08:00
chrislu
95f6a2bc13 Added a complete webhook configuration example 2025-12-08 11:03:41 -08:00
dependabot[bot]
51c630b5ff chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 (#7664)
* chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0

Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.18.0 to 0.19.0.
- [Commits](https://github.com/golang/sync/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sync
  dependency-version: 0.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* tidy

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
2025-12-08 10:52:58 -08:00
dependabot[bot]
21ec3a51c1 chore(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.18.20 to 1.19.3 (#7663)
chore(deps): bump github.com/aws/aws-sdk-go-v2/credentials

Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.18.20 to 1.19.3.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.18.20...service/pi/v1.19.3)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/credentials
  dependency-version: 1.19.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 10:52:48 -08:00
dependabot[bot]
4709dbf4f8 chore(deps): bump github.com/klauspost/reedsolomon from 1.12.5 to 1.12.6 (#7662)
* chore(deps): bump github.com/klauspost/reedsolomon from 1.12.5 to 1.12.6

Bumps [github.com/klauspost/reedsolomon](https://github.com/klauspost/reedsolomon) from 1.12.5 to 1.12.6.
- [Release notes](https://github.com/klauspost/reedsolomon/releases)
- [Commits](https://github.com/klauspost/reedsolomon/compare/v1.12.5...v1.12.6)

---
updated-dependencies:
- dependency-name: github.com/klauspost/reedsolomon
  dependency-version: 1.12.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* tidy

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
2025-12-08 10:52:38 -08:00
dependabot[bot]
1a22bdbac1 chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.89.1 to 1.93.0 (#7661)
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.89.1 to 1.93.0.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/changelog-template.json)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.89.1...service/s3/v1.93.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.93.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 09:27:28 -08:00
dependabot[bot]
a912371f2f chore(deps): bump wangyoucao577/go-release-action from 1.53 to 1.54 (#7660)
Bumps [wangyoucao577/go-release-action](https://github.com/wangyoucao577/go-release-action) from 1.53 to 1.54.
- [Release notes](https://github.com/wangyoucao577/go-release-action/releases)
- [Commits](481a2c1a0f...ec4e3151b3)

---
updated-dependencies:
- dependency-name: wangyoucao577/go-release-action
  dependency-version: '1.54'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 09:27:20 -08:00
dependabot[bot]
52a7ed03bc chore(deps): bump actions/checkout from 4 to 6 (#7659)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 09:27:12 -08:00
dependabot[bot]
0c8e8fb411 chore(deps): bump cloud.google.com/go/kms from 1.23.1 to 1.23.2 (#7658)
Bumps [cloud.google.com/go/kms](https://github.com/googleapis/google-cloud-go) from 1.23.1 to 1.23.2.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/documentai/CHANGES.md)
- [Commits](https://github.com/googleapis/google-cloud-go/compare/kms/v1.23.1...kms/v1.23.2)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/kms
  dependency-version: 1.23.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 09:27:04 -08:00
dependabot[bot]
2f19409c32 chore(deps): bump actions/upload-artifact from 4 to 5 (#7657)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 09:26:55 -08:00
dependabot[bot]
887c9ee97a chore(deps): bump actions/setup-go from 5 to 6 (#7656)
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 5 to 6.
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](https://github.com/actions/setup-go/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-go
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 09:26:45 -08:00
Trim21
0a0eb21b86 fix random volume ids in master.html (#7655) 2025-12-08 07:55:42 -08:00
chrislu
805950b401 4.02 2025-12-08 01:38:16 -08:00
chrislu
e28629bb5f reduce minFreeSpacePercent to 1
addressing https://github.com/seaweedfs/seaweedfs/issues/7110#issuecomment-3622472545
2025-12-08 01:34:53 -08:00
Chris Lu
cadb2eeb05 fix: ARM v7 alignment issue for 64-bit atomic operations (#7652)
Fixes #7643

Reordered filerHealth struct fields to ensure int64 field comes first,
guaranteeing 8-byte alignment required for atomic operations on 32-bit
ARM architectures (ARMv7, as used in OpenWRT).
2025-12-08 01:32:50 -08:00
Chris Lu
982aae6d53 SFTP: support reloading user store on HUP signal (#7651)
Fixes #7650

This change enables the SFTP server to reload the user store configuration
(sftp_userstore.json) when a HUP signal is sent to the process, without
requiring a service restart.

Changes:
- Add Reload() method to FileStore to re-read users from disk
- Add Reload() method to SFTPService to handle reload requests
- Register reload hook with grace.OnReload() in sftp command

This allows administrators to add users or change access policies
dynamically by editing the user store file and sending a HUP signal
(e.g., 'systemctl reload seaweedfs' or 'kill -HUP <pid>').
2025-12-08 01:24:42 -08:00
Chris Lu
f5c0bcafa3 s3: fix ListBuckets not showing buckets created by authenticated users (#7648)
* s3: fix ListBuckets not showing buckets created by authenticated users

Fixes #7647

## Problem
Users with proper Admin permissions could create buckets but couldn't
list them. The issue occurred because ListBucketsHandler was not wrapped
with the Auth middleware, so the authenticated identity was never set in
the request context.

## Root Cause
- PutBucketHandler uses iam.Auth() middleware which sets identity in context
- ListBucketsHandler did NOT use iam.Auth() middleware
- Without the middleware, GetIdentityNameFromContext() returned empty string
- Bucket ownership checks failed because no identity was present

## Changes
1. Wrap ListBucketsHandler with iam.Auth() middleware (s3api_server.go)
2. Update ListBucketsHandler to get identity from context (s3api_bucket_handlers.go)
3. Add lookupByIdentityName() helper method (auth_credentials.go)
4. Add comprehensive test TestListBucketsIssue7647 (s3api_bucket_handlers_test.go)

## Testing
- All existing tests pass (1348 tests in s3api package)
- New test TestListBucketsIssue7647 validates the fix
- Verified admin users can see their created buckets
- Verified admin users can see all buckets
- Verified backward compatibility maintained

* s3: fix ListBuckets for JWT/Keycloak authentication

The previous fix broke JWT/Keycloak authentication because JWT identities
are created on-the-fly and not stored in the iam.identities list.
The lookupByIdentityName() would return nil for JWT users.

Solution: Store the full Identity object in the request context, not just
the name. This allows ListBucketsHandler to retrieve the complete identity
for all authentication types (SigV2, SigV4, JWT, Anonymous).

Changes:
- Add SetIdentityInContext/GetIdentityFromContext in s3_constants/header.go
- Update Auth middleware to store full identity in context
- Update ListBucketsHandler to retrieve identity from context first,
  with fallback to lookup for backward compatibility

* s3: optimize lookupByIdentityName to O(1) using map

Address code review feedback: Use a map for O(1) lookups instead of
O(N) linear scan through identities list.

Changes:
- Add nameToIdentity map to IdentityAccessManagement struct
- Populate map in loadS3ApiConfiguration (consistent with accessKeyIdent pattern)
- Update lookupByIdentityName to use map lookup instead of loop

This improves performance when many identities are configured and
aligns with the existing pattern used for accessKeyIdent lookups.

* s3: address code review feedback on nameToIdentity and logging

Address two code review points:

1. Wire nameToIdentity into env-var fallback path
   - The AWS env-var fallback in NewIdentityAccessManagementWithStore now
     populates nameToIdentity map along with accessKeyIdent
   - Keeps all identity lookup maps in sync
   - Avoids potential issues if handlers rely on lookupByIdentityName

2. Improve access key lookup logging
   - Reduce log verbosity: V(1) -> V(2) for failed lookups
   - Truncate access keys in logs (show first 4 chars + ***)
   - Include key length for debugging
   - Prevents credential exposure in production logs
   - Reduces log noise from misconfigured clients

* fmt

* s3: refactor truncation logic and improve error handling

Address additional code review feedback:

1. DRY principle: Extract key truncation logic into local function
   - Define truncate() helper at function start
   - Reuse throughout lookupByAccessKey
   - Eliminates code duplication

2. Enhanced security: Mask very short access keys
   - Keys <= 4 chars now show as '***' instead of full key
   - Prevents any credential exposure even for short keys
   - Consistent masking across all log statements

3. Improved robustness: Add warning log for type assertion failure
   - Log unexpected type when identity context object is wrong type
   - Helps debug potential middleware or context issues
   - Better production diagnostics

4. Documentation: Add comment about future optimization opportunity
   - Note potential for lightweight identity view in context
   - Suggests credential-free view for better data minimization
   - Documents design decision for future maintainers
2025-12-08 01:24:12 -08:00
Chris Lu
a9b3be416b fix: initialize missing S3 options in filer to prevent nil pointer dereference (#7646)
* fix: initialize missing S3 options in filer to prevent nil pointer dereference

Fixes #7644

When starting the S3 gateway from the filer, several S3Options fields
were not being initialized, which could cause nil pointer dereferences
during startup.

This commit adds initialization for:
- iamConfig: for advanced IAM configuration
- metricsHttpPort: for Prometheus metrics endpoint
- metricsHttpIp: for binding the metrics endpoint

Also ensures metricsHttpIp defaults to bindIp when not explicitly set,
matching the behavior of the standalone S3 server.

This prevents the panic that was occurring in the s3.go:226 area when
these pointer fields were accessed but never initialized.

* fix: copy value instead of pointer for metricsHttpIp default

Address review comment to avoid pointer aliasing. Copy the value
instead of the pointer to prevent unexpected side effects if the
bindIp value is modified later.
2025-12-07 19:56:05 -08:00
Chris Lu
5d53edb93b Optimize database connection pool settings for MySQL and PostgreSQL (#7645)
- Reduce connection_max_idle from 100 to 10 (PostgreSQL) and from 2 to 10 (MySQL)
- Reduce connection_max_open from 100 to 50 for all database stores
- Set connection_max_lifetime_seconds to 300 (5 minutes) to force connection recycling

This prevents 'cannot assign requested address' errors under high load by:
1. Limiting the number of concurrent connections to reduce ephemeral port usage
2. Forcing connection recycling to prevent stale connections and port exhaustion
3. Reducing idle connections to minimize resource consumption

Fixes #6887
2025-12-07 13:26:47 -08:00
chrislu
dcc200fec0 Remove allowEmptyFolder from s3tests.yml workflow 2025-12-06 21:59:00 -08:00
chrislu
5167bbd2a9 Remove deprecated allowEmptyFolder CLI option
The allowEmptyFolder option is no longer functional because:
1. The code that used it was already commented out
2. Empty folder cleanup is now handled asynchronously by EmptyFolderCleaner

The CLI flags are kept for backward compatibility but marked as deprecated
and ignored. This removes:
- S3ApiServerOption.AllowEmptyFolder field
- The actual usage in s3api_object_handlers_list.go
- Helm chart values and template references
- References in test Makefiles and docker-compose files
2025-12-06 21:54:12 -08:00
Chris Lu
55f0fbf364 s3: optimize DELETE by skipping lock check for buckets without Object Lock (#7642)
This optimization avoids an expensive filer gRPC call for every DELETE
operation on buckets that don't have Object Lock enabled.

Before this change, enforceObjectLockProtections() would always call
getObjectEntry() to fetch object metadata to check for retention/legal
hold, even for buckets that never had Object Lock configured.

Changes:
1. Add early return in enforceObjectLockProtections() if bucket has no
   Object Lock config or bucket doesn't exist
2. Add isObjectLockEnabled() helper function to check if a bucket has
   Object Lock configured
3. Fix validateObjectLockHeaders() to check ObjectLockConfig instead of
   just versioningEnabled - this ensures object-lock headers are properly
   rejected on buckets without Object Lock enabled, which aligns with
   AWS S3 semantics
4. Make bucket creation with Object Lock atomic - set Object Lock config
   in the same CreateEntry call as bucket creation, preventing race
   conditions where bucket exists without Object Lock enabled
5. Properly handle Object Lock setup failures during bucket creation -
   if StoreObjectLockConfigurationInExtended fails, roll back the bucket
   creation and return an error instead of leaving a bucket without
   the requested Object Lock configuration

This significantly improves DELETE latency for non-Object-Lock buckets,
which is the common case (lockCheck time reduced from 1-10ms to ~1µs).
2025-12-06 21:37:25 -08:00
Chris Lu
62a83ed469 helm: enhance all-in-one deployment configuration (#7639)
* helm: enhance all-in-one deployment configuration

Fixes #7110

This PR addresses multiple issues with the all-in-one Helm chart configuration:

## New Features

### Configurable Replicas
- Added `allInOne.replicas` (was hardcoded to 1)

### S3 Gateway Configuration
- Added full S3 config under `allInOne.s3`:
  - port, httpsPort, domainName, allowEmptyFolder
  - enableAuth, existingConfigSecret, auditLogConfig
  - createBuckets for declarative bucket creation

### SFTP Server Configuration
- Added full SFTP config under `allInOne.sftp`:
  - port, sshPrivateKey, hostKeysFolder, authMethods
  - maxAuthTries, bannerMessage, loginGraceTime
  - clientAliveInterval, clientAliveCountMax, enableAuth

### Command Line Arguments
- Added `allInOne.extraArgs` for custom CLI arguments

### Update Strategy
- Added `allInOne.updateStrategy.type` (Recreate/RollingUpdate)

### Secret Environment Variables
- Added `allInOne.secretExtraEnvironmentVars` for injecting secrets

### Ingress Support
- Added `allInOne.ingress` with S3, filer, and master sub-configs

### Storage Options
- Enhanced `allInOne.data` with existingClaim support
- Added PVC template for persistentVolumeClaim type

## CI Enhancements
- Added comprehensive tests for all-in-one configurations
- Tests cover replicas, S3, SFTP, extraArgs, strategies, PVC, ingress

* helm: add real cluster deployment tests to CI

- Deploy all-in-one cluster with S3 enabled on kind cluster
- Test Master API (/cluster/status endpoint)
- Test Filer API (file upload/download)
- Test S3 API (/status endpoint)
- Test S3 operations with AWS CLI:
  - Create/delete buckets
  - Upload/download/delete objects
  - Verify file content integrity

* helm: simplify CI and remove all-in-one ingress

Address review comments:
- Remove detailed all-in-one template rendering tests from CI
- Remove real cluster deployment tests from CI
- Remove all-in-one ingress template and values configuration

Keep the core improvements:
- allInOne.replicas configuration
- allInOne.s3.* full configuration
- allInOne.sftp.* full configuration
- allInOne.extraArgs support
- allInOne.updateStrategy configuration
- allInOne.secretExtraEnvironmentVars support

* helm: address review comments

- Fix post-install-bucket-hook.yaml: add filer.s3.enableAuth and
  filer.s3.existingConfigSecret to or statements for consistency
- Fix all-in-one-deployment.yaml: use default function for s3.domainName
- Fix all-in-one-deployment.yaml: use hasKey function for s3.allowEmptyFolder

* helm: clarify updateStrategy multi-replica behavior

Expand comment to warn users that RollingUpdate with multiple replicas
requires shared storage (ReadWriteMany) to avoid data loss.

* helm: address gemini-code-assist review comments

- Make PVC accessModes configurable to support ReadWriteMany for
  multi-replica deployments (defaults to ReadWriteOnce)
- Use configured readiness probe paths in post-install bucket hook
  instead of hardcoded paths, respecting custom configurations

* helm: simplify allowEmptyFolder logic using coalesce

Use coalesce function for cleaner template code as suggested in review.

* helm: fix extraArgs trailing backslash issue

Remove trailing backslash after the last extraArgs argument to avoid
shell syntax error. Use counter to only add backslash between arguments.

* helm: fix fallback logic for allInOne s3/sftp configuration

Changes:
- Set allInOne.s3.* and allInOne.sftp.* override parameters to null by default
  This allows proper inheritance from global s3.* and sftp.* settings
- Fix allowEmptyFolder logic to use explicit nil checking instead of coalesce
  The coalesce/default functions treat 'false' as empty, causing incorrect
  fallback behavior when users want to explicitly set false values

Addresses review feedback about default value conflicts with fallback logic.

* helm: fix exec in bucket creation loop causing premature termination

Remove 'exec' from the range loops that create and configure S3 buckets.
The exec command replaces the current shell process, causing the script
to terminate after the first bucket, preventing creation/configuration
of subsequent buckets.

* helm: quote extraArgs to handle arguments with spaces

Use the quote function to ensure each item in extraArgs is treated as
a single, complete argument even if it contains spaces.

* helm: make s3/filer ingress work for both normal and all-in-one modes

Modified s3-ingress.yaml and filer-ingress.yaml to dynamically select
the service name based on deployment mode:
- Normal mode: points to seaweedfs-s3 / seaweedfs-filer services
- All-in-one mode: points to seaweedfs-all-in-one service

This eliminates the need for separate all-in-one ingress templates.
Users can now use the standard s3.ingress and filer.ingress settings
for both deployment modes.

* helm: fix allInOne.data.size and storageClass to use null defaults

Change size and storageClass from empty strings to null so the template
defaults (10Gi for size, cluster default for storageClass) will apply
correctly. Empty strings prevent the Helm | default function from working.

* helm: fix S3 ingress to include standalone S3 gateway case

Add s3.enabled check to the $s3Enabled logic so the ingress works for:
1. Standalone S3 gateway (s3.enabled)
2. S3 on Filer (filer.s3.enabled) when not in all-in-one mode
3. S3 in all-in-one mode (allInOne.s3.enabled)
2025-12-06 18:54:28 -08:00
Chris Lu
9c266fac29 fix: CompleteMultipartUpload fails for uploads with more than 1000 parts (#7641)
When completing a multipart upload, the code was listing parts with limit=0,
which relies on the server's DirListingLimit default. In 'weed server' mode,
this defaults to 1000, causing uploads with more than 1000 parts to fail
with InvalidPart error.

For a 38GB file with 8MB parts (AWS CLI default), this results in ~4564 parts,
far exceeding the 1000 limit.

Fix: Use explicit limit of MaxS3MultipartParts+1 (10001) to ensure all parts
are listed regardless of server configuration.

Fixes #7638
2025-12-06 11:25:27 -08:00
chrislu
5dd2d44858 Update README.md 2025-12-05 19:52:43 -08:00
Chris Lu
28ac536280 fix: normalize Windows backslash paths in weed admin file uploads (#7636)
fix: normalize Windows backslash paths in file uploads

When uploading files from a Windows client to a Linux server,
file paths containing backslashes were not being properly interpreted as
directory separators. This caused files intended for subdirectories to be
created in the root directory with backslashes in their filenames.

Changes:
- Add util.CleanWindowsPath and util.CleanWindowsPathBase helper functions
  in weed/util/fullpath.go for reusable path normalization
- Use path.Join/path.Clean/path.Base instead of filepath equivalents
  for URL path semantics (filepath is OS-specific)
- Apply normalization in weed admin handlers and filer upload parsing

Fixes #7628
2025-12-05 17:40:32 -08:00
Chris Lu
89b6deaefa fix: Use mime.FormatMediaType for RFC 6266 compliant Content-Disposition (#7635)
Updated Content-Disposition header generation to use mime.FormatMediaType
from the standard library, which properly handles non-ASCII characters
and special characters per RFC 6266.

Changes:
- weed/server/common.go: Updated adjustHeaderContentDisposition to use
  mime.FormatMediaType instead of manual escaping with fileNameEscaper
- weed/operation/upload_content.go: Updated multipart form Content-Disposition
  to use mime.FormatMediaType
- weed/server/volume_server_handlers_read.go: Removed unused fileNameEscaper

This ensures correct filename display for international users across
filer downloads and file uploads.

Fixes #7634
2025-12-05 15:59:12 -08:00
Chris Lu
f1384108e8 fix: Admin UI file browser uses https.client TLS config for filer communication (#7633)
* fix: Admin UI file browser uses https.client TLS config for filer communication

When filer is configured with HTTPS (https.filer section in security.toml),
the Admin UI file browser was still using plain HTTP for file uploads,
downloads, and viewing. This caused TLS handshake errors:
'http: TLS handshake error: client sent an HTTP request to an HTTPS server'

This fix:
- Updates FileBrowserHandlers to use the HTTPClient from weed/util/http/client
  which properly loads TLS configuration from https.client section
- The HTTPClient automatically uses HTTPS when https.client.enabled=true
- All file operations (upload, download, view) now respect TLS configuration
- Falls back to plain HTTP if TLS client creation fails

Fixes #7631

* fix: Address code review comments

- Fix fallback client Transport wiring (properly assign transport to http.Client)
- Use per-operation timeouts instead of unified 60s timeout:
  - uploadFileToFiler: 60s (for large file uploads)
  - ViewFile: 30s (original timeout)
  - isLikelyTextFile: 10s (original timeout)

* fix: Proxy file downloads through Admin UI for mTLS support

The DownloadFile function previously used browser redirect, which would
fail when filer requires mutual TLS (client certificates) since the
browser doesn't have these certificates.

Now the Admin UI server proxies the download, using its TLS-aware HTTP
client with the configured client certificates, then streams the
response to the browser.

* fix: Ensure HTTP response body is closed on non-200 responses

In ViewFile, the response body was only closed on 200 OK paths,
which could leak connections on non-200 responses. Now the body
is always closed via defer immediately after checking err == nil,
before checking the status code.

* refactor: Extract fetchFileContent helper to reduce nesting in ViewFile

Extracted the deeply nested file fetch logic (7+ levels) into a
separate fetchFileContent helper method. This improves readability
while maintaining the same TLS-aware behavior and error handling.

* refactor: Use idiomatic Go error handling in fetchFileContent

Changed fetchFileContent to return (string, error) instead of
(content string, reason string) for idiomatic Go error handling.
This enables error wrapping and standard 'if err != nil' checks.

Also improved error messages to be more descriptive for debugging,
including the HTTP status code and response body on non-200 responses.

* refactor: Extract newClientWithTimeout helper to reduce code duplication

- Added newClientWithTimeout() helper method that creates a temporary
  http.Client with the specified timeout, reusing the TLS transport
- Updated uploadFileToFiler, fetchFileContent, DownloadFile, and
  isLikelyTextFile to use the new helper
- Improved error message in DownloadFile to include response body
  for better debuggability (consistent with fetchFileContent)

* fix: Address CodeRabbit review comments

- Fix connection leak in isLikelyTextFile: ensure resp.Body.Close()
  is called even when status code is not 200
- Use http.NewRequestWithContext in DownloadFile so the filer request
  is cancelled when the client disconnects, improving resource cleanup

* fix: Escape Content-Disposition filename per RFC 2616

Filenames containing quotes, backslashes, or special characters could
break the Content-Disposition header or cause client-side parsing issues.
Now properly escapes these characters before including in the header.

* fix: Handle io.ReadAll errors when reading error response bodies

In fetchFileContent and DownloadFile, the error from io.ReadAll was
ignored when reading the filer's error response body. Now properly
handles these errors to provide complete error messages.

* fix: Fail fast when TLS client creation fails

If TLS is enabled (https.client.enabled=true) but misconfigured,
fail immediately with glog.Fatalf rather than silently falling back
to plain HTTP. This prevents confusing runtime errors when the filer
only accepts HTTPS connections.

* fix: Use mime.FormatMediaType for RFC 6266 compliant Content-Disposition

Replace manual escaping with mime.FormatMediaType which properly handles
non-ASCII characters and special characters per RFC 6266, ensuring
correct filename display for international users.
2025-12-05 15:39:26 -08:00
msementsov
c0dad091f1 Separate vacuum speed from replication speed (#7632) 2025-12-05 12:24:38 -08:00
Chris Lu
4cc6a2a4e5 fix: Admin UI user creation fails before filer discovery (#7624) (#7625)
* fix: Admin UI user creation fails before filer discovery (#7624)

The credential manager's filer address function was not configured quickly
enough after admin server startup, causing 'filer address function not
configured' errors when users tried to create users immediately.

Changes:
- Use exponential backoff (200ms -> 5s) instead of fixed 5s polling for
  faster filer discovery on startup
- Improve error messages to be more user-friendly and actionable

Fixes #7624

* Add more debug logging to help diagnose filer discovery issues

* fix: Use dynamic filer address function to eliminate race condition

Instead of using a goroutine to wait for filer discovery before setting
the filer address function, we now set a dynamic function immediately
that returns the current filer address whenever it's called.

This eliminates the race condition where users could create users before
the goroutine completed, and provides clearer error messages when no
filer is available.

The dynamic function is HA-aware - it automatically returns whatever
filer is currently available, adapting to filer failovers.
2025-12-05 12:19:06 -08:00
Chris Lu
5c1de633cb mount: improve read throughput with parallel chunk fetching (#7627)
* filer: remove lock contention during chunk download

This addresses issue #7504 where a single weed mount FUSE instance
does not fully utilize node network bandwidth when reading large files.

The SingleChunkCacher was holding a mutex during the entire HTTP download,
causing readers to block until the download completed. This serialized
chunk reads even when multiple goroutines were downloading in parallel.

Changes:
- Add sync.Cond to SingleChunkCacher for efficient waiting
- Move HTTP download outside the critical section in startCaching()
- Use condition variable in readChunkAt() to wait for download completion
- Add isComplete flag to track download state

Now multiple chunk downloads can proceed truly in parallel, and readers
wait efficiently using the condition variable instead of blocking on
a mutex held during I/O operations.

Ref: #7504

* filer: parallel chunk fetching within doReadAt

This addresses issue #7504 by enabling parallel chunk downloads within
a single read operation.

Previously, doReadAt() processed chunks sequentially in a loop, meaning
each chunk had to be fully downloaded before the next one started.
This left significant network bandwidth unused when chunks resided on
different volume servers.

Changes:
- Collect all chunk read tasks upfront
- Use errgroup to fetch multiple chunks in parallel
- Each chunk reads directly into its correct buffer position
- Limit concurrency to prefetchCount (min 4) to avoid overwhelming the system
- Handle gaps and zero-filling before parallel fetch
- Trigger prefetch after parallel reads complete

For a read spanning N chunks on different volume servers, this can
now utilize up to N times the bandwidth of a single connection.

Ref: #7504

* http: direct buffer read to reduce memory copies

This addresses issue #7504 by reducing memory copy overhead during
chunk downloads.

Previously, RetriedFetchChunkData used ReadUrlAsStream which:
1. Allocated a 64KB intermediate buffer
2. Read data in 64KB chunks
3. Called a callback to copy each chunk to the destination

For a 16MB chunk, this meant 256 copy operations plus the callback
overhead. Profiling showed significant time spent in memmove.

Changes:
- Add readUrlDirectToBuffer() that reads directly into the destination
- Add retriedFetchChunkDataDirect() for unencrypted, non-gzipped chunks
- Automatically use direct read path when possible (cipher=nil, gzip=false)
- Use http.NewRequestWithContext for proper cancellation

For unencrypted chunks (the common case), this eliminates the
intermediate buffer entirely, reading HTTP response bytes directly
into the final destination buffer.

Ref: #7504

* address review comments

- Use channel (done) instead of sync.Cond for download completion signaling
  This integrates better with context cancellation patterns
- Remove redundant groupErr check in reader_at.go (errors are already captured in task.err)
- Remove buggy URL encoding logic from retriedFetchChunkDataDirect
  (The existing url.PathEscape on full URL is a pre-existing bug that should be fixed separately)

* address review comments (round 2)

- Return io.ErrUnexpectedEOF when HTTP response is truncated
  This prevents silent data corruption from incomplete reads
- Simplify errgroup error handling by using g.Wait() error directly
  Remove redundant task.err field and manual error aggregation loop
- Define minReadConcurrency constant instead of magic number 4
  Improves code readability and maintainability

Note: Context propagation to startCaching() is intentionally NOT changed.
The downloaded chunk is a shared resource that may be used by multiple
readers. Using context.Background() ensures the download completes even
if one reader cancels, preventing data loss for other waiting readers.

* http: inject request ID for observability in direct read path

Add request_id.InjectToRequest() call to readUrlDirectToBuffer() for
consistency with ReadUrlAsStream path. This ensures full-chunk reads
carry the same tracing/correlation headers for server logs and metrics.

* filer: consistent timestamp handling in sequential read path

Use max(ts, task.chunk.ModifiedTsNs) in sequential path to match
parallel path behavior. Also update ts before error check so that
on failure, the returned timestamp reflects the max of all chunks
processed so far.

* filer: document why context.Background() is used in startCaching

Add comment explaining the intentional design decision: the downloaded
chunk is a shared resource that may be used by multiple concurrent
readers. Using context.Background() ensures the download completes
even if one reader cancels, preventing errors for other waiting readers.

* filer: propagate context for reader cancellation

Address review comment: pass context through ReadChunkAt call chain so
that a reader can cancel its wait for a download. The key distinction is:

- Download uses context.Background() - shared resource, always completes
- Reader wait uses request context - can be cancelled individually

If a reader cancels, it stops waiting and returns ctx.Err(), but the
download continues to completion for other readers waiting on the same
chunk. This properly handles the shared resource semantics while still
allowing individual reader cancellation.

* filer: use defer for close(done) to guarantee signal on panic

Move close(s.done) to a defer statement at the start of startCaching()
to ensure the completion signal is always sent, even if an unexpected
panic occurs. This prevents readers from blocking indefinitely.

* filer: remove unnecessary code

- Remove close(s.cacheStartedCh) in destroy() - the channel is only used
  for one-time synchronization, closing it provides no benefit
- Remove task := task loop variable capture - Go 1.22+ fixed loop variable
  semantics, this capture is no longer necessary (go.mod specifies Go 1.24.0)

* filer: restore fallback to chunkCache when cacher returns no data

Fix critical issue where ReadChunkAt would return 0,nil immediately
if SingleChunkCacher couldn't provide data for the requested offset,
without trying the chunkCache fallback. Now if cacher.readChunkAt
returns n=0 and err=nil, we fall through to try chunkCache.

* filer: add comprehensive tests for ReaderCache

Tests cover:
- Context cancellation while waiting for download
- Fallback to chunkCache when cacher returns n=0, err=nil
- Multiple concurrent readers waiting for same chunk
- Partial reads at different offsets
- Downloader cleanup when exceeding cache limit
- Done channel signaling (no hangs on completion)

* filer: prioritize done channel over context cancellation

If data is already available (done channel closed), return it even if
the reader's context is also cancelled. This avoids unnecessary errors
when the download has already completed.

* filer: add lookup error test and document test limitations

Add TestSingleChunkCacherLookupError to test error handling when lookup
fails. Document that full HTTP integration tests for SingleChunkCacher
require global HTTP client initialization which is complex in unit tests.
The download path is tested via FUSE integration tests.

* filer: add tests that exercise SingleChunkCacher concurrency logic

Add tests that use blocking lookupFileIdFn to exercise the actual
SingleChunkCacher wait/cancellation logic:

- TestSingleChunkCacherContextCancellationDuringLookup: tests reader
  cancellation while lookup is blocked
- TestSingleChunkCacherMultipleReadersWaitForDownload: tests multiple
  readers waiting on the same download
- TestSingleChunkCacherOneReaderCancelsOthersContinue: tests that when
  one reader cancels, other readers continue waiting

These tests properly exercise the done channel wait/cancel logic without
requiring HTTP calls - the blocking lookup simulates a slow download.
2025-12-04 23:40:56 -08:00
Chris Lu
3183a49698 fix: S3 downloads failing after idle timeout (#7626)
* fix: S3 downloads failing after idle timeout (#7618)

The idle timeout was incorrectly terminating active downloads because
read and write deadlines were managed independently. During a download,
the server writes data but rarely reads, so the read deadline would
expire even though the connection was actively being used.

Changes:
1. Simplify to single Timeout field - since this is a 'no activity timeout'
   where any activity extends the deadline, separate read/write timeouts
   are unnecessary. Now uses SetDeadline() which sets both at once.

2. Implement proper 'no activity timeout' - any activity (read or write)
   now extends the deadline. The connection only times out when there's
   genuinely no activity in either direction.

3. Increase default S3 idleTimeout from 10s to 120s for additional safety
   margin when fetching chunks from slow storage backends.

Fixes #7618

* Update weed/util/net_timeout.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 18:31:46 -08:00
Chris Lu
f9b4a4c396 fix: check freeEcSlot before evacuating EC shards to prevent data loss (#7621)
* fix: check freeEcSlot before evacuating EC shards to prevent data loss

Related to #7619

The moveAwayOneEcVolume function was missing the freeEcSlot check that
exists in other EC shard placement functions. This could cause EC shards
to be moved to volume servers that have no capacity, resulting in:
1. 0-byte shard files when disk is full
2. Data loss when source shards are deleted after 'successful' copy

Changes:
- Add freeEcSlot check before attempting to move EC shards
- Sort destinations by both shard count and free slots
- Refresh topology during evacuation to get updated slot counts
- Log when nodes are skipped due to no free slots
- Update freeEcSlot count after successful moves

* fix: clarify comment wording per CodeRabbit review

The comment stated 'after each move' but the code executes before
calling moveAwayOneEcVolume. Updated to 'before moving each EC volume'
for accuracy.

* fix: collect topology once and track capacity changes locally

Remove the topology refresh within the loop as it gives a false sense
of correctness - the refreshed topology could still be stale (minutes old).

Instead, we:
1. Collect topology once at the start
2. Track capacity changes ourselves via freeEcSlot decrement after each move

This is more accurate because we know exactly what moves we've made,
rather than relying on potentially stale topology refreshes.

* fix: ensure partial EC volume moves are reported as failures

Set hasMoved=false when a shard fails to move, even if previous shards
succeeded. This prevents the caller from incorrectly assuming the entire
volume was evacuated, which could lead to data loss if the source server
is decommissioned based on this incorrect status.

* fix: also reset hasMoved on moveMountedShardToEcNode error

Same issue as the previous fix: if moveMountedShardToEcNode fails
after some shards succeeded, hasMoved would incorrectly be true.
Ensure partial moves are always reported as failures.
2025-12-04 16:05:06 -08:00
Chris Lu
fdb888729b fix: properly handle errors in writeToFile to prevent 0-byte EC shards (#7620)
Fixes #7619

The writeToFile function had two critical bugs that could cause data loss
during EC shard evacuation when the destination disk is full:

Bug 1: When os.OpenFile fails (e.g., disk full), the error was silently
ignored and nil was returned. This caused the caller to think the copy
succeeded.

Bug 2: When dst.Write fails (e.g., 'no space left on device'), the error
was completely ignored because the return value was not checked.

When evacuating EC shards to a full volume server (especially on BTRFS):
1. OpenFile may succeed (creates 0-byte file inode)
2. Write fails with 'no space left on device'
3. Errors were ignored, function returned nil
4. Caller thinks copy succeeded and deletes source shard
5. Result: 0-byte shard on destination, data loss!

This fix ensures both errors are properly returned, preventing data loss.
Added unit tests to verify the fix.
2025-12-04 14:52:03 -08:00
Chris Lu
716f21fbd3 s3: support STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER for signed chunked uploads with checksums (#7623)
* s3: support STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER for signed chunked uploads with checksums

When AWS SDK v2 clients upload with both chunked encoding and checksum
validation enabled, they use the x-amz-content-sha256 header value of
STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER instead of the simpler
STREAMING-AWS4-HMAC-SHA256-PAYLOAD.

This caused the chunked reader to not be properly activated, resulting
in chunk-signature metadata being stored as part of the file content.

Changes:
- Add streamingSignedPayloadTrailer constant for the new header value
- Update isRequestSignStreamingV4() to recognize this header
- Update newChunkedReader() to handle this streaming type
- Update calculateSeedSignature() to accept this header
- Add unit test for signed streaming upload with trailer

Fixes issue where Quarkus/AWS SDK v2 uploads with checksum validation
resulted in corrupted file content containing chunk-signature data.

* address review comments: add trailer signature to test, fix constant alignment

* test: separate canonical trailer text (\n) from on-wire format (\r\n)

* test: add negative test for invalid trailer signature

* refactor: check HTTP method first in streaming auth checks (fail-fast)

* test: handle crc32 Write error return for completeness

* refactor: extract createTrailerStreamingRequest helper to reduce test duplication

* fmt

* docs: clarify test comment about trailer signature validation status

* refactor: calculate chunk data length dynamically instead of hardcoding

* Update weed/s3api/chunked_reader_v4_test.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: use current time for signatures instead of hardcoded past date

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 14:51:37 -08:00
Chris Lu
a5ab05ec03 fix: S3 GetObject/HeadObject with PartNumber should return object ETag, not part ETag (#7622)
AWS S3 behavior: when calling GetObject or HeadObject with the PartNumber
query parameter, the ETag header should still return the complete object's
ETag (e.g., 'abc123-4' for a 4-part multipart upload), not the individual
part's ETag.

The previous implementation incorrectly overrode the ETag with the part's
ETag, causing test_multipart_get_part to fail.

This fix removes the ETag override logic while keeping:
- x-amz-mp-parts-count header (correct)
- Content-Length adjusted to part size (correct)
- Range calculation for part boundaries (correct)
2025-12-04 12:18:57 -08:00
chrislu
8d110b29dd fmt 2025-12-04 10:40:01 -08:00
Chris Lu
39ba19eea6 filer: async empty folder cleanup via metadata events (#7614)
* filer: async empty folder cleanup via metadata events

Implements asynchronous empty folder cleanup when files are deleted in S3.

Key changes:

1. EmptyFolderCleaner - New component that handles folder cleanup:
   - Uses consistent hashing (LockRing) to determine folder ownership
   - Each filer owns specific folders, avoiding duplicate cleanup work
   - Debounces delete events (10s delay) to batch multiple deletes
   - Caches rough folder counts to skip unnecessary checks
   - Cancels pending cleanup when new files are created
   - Handles both file and subdirectory deletions

2. Integration with metadata events:
   - Listens to both local and remote filer metadata events
   - Processes create/delete/rename events to track folder state
   - Only processes folders under /buckets/<bucket>/...

3. Removed synchronous empty folder cleanup from S3 handlers:
   - DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
   - DeleteMultipleObjectsHandler no longer tracks/cleans directories
   - Cleanup now happens asynchronously via metadata events

Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically

* filer: add CleanupQueue data structure for deduplicated folder cleanup

CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)

Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection

* filer: use CleanupQueue for empty folder cleanup

Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup

Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios

* filer: CleanupQueue.Add moves duplicate to back with updated time

When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time

This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.

* filer: CleanupQueue uses event time and inserts in sorted order

Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)

This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.

* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)

Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order

Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging

* filer: initialize cache entry on first delete/create event

Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.

Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.

* filer: skip adding to cleanup queue if roughCount > 0

If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.

* filer: don't create cache entry on create event

Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.

* filer: move empty folder cleanup to its own package

- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions

* filer: make isUnderPath and isUnderBucketPath private

These helpers are only used within the empty_folder_cleanup package.
2025-12-03 21:12:19 -08:00
Chris Lu
268cc84e8c [helm] Fix liveness/readiness probe scheme path in templates (#7616)
Fix the templates to read scheme from httpGet.scheme instead of the
probe level, matching the structure defined in values.yaml.

This ensures that changing *.livenessProbe.httpGet.scheme or
*.readinessProbe.httpGet.scheme in values.yaml now correctly affects
the rendered manifests.

Affected components: master, filer, volume, s3, all-in-one

Fixes #7615
2025-12-03 18:53:06 -08:00
Chris Lu
e361daa754 fix: SFTP HomeDir path translation for user operations (#7611)
* fix: SFTP HomeDir path translation for user operations

When users have a non-root HomeDir (e.g., '/sftp/user'), their SFTP
operations should be relative to that directory. Previously, when a
user uploaded to '/' via SFTP, the path was not translated to their
home directory, causing 'permission denied for / for permission write'.

This fix adds a toAbsolutePath() method that implements chroot-like
behavior where the user's HomeDir becomes their root. All file and
directory operations now translate paths through this method.

Example: User with HomeDir='/sftp/user' uploading to '/' now correctly
maps to '/sftp/user'.

Fixes: https://github.com/seaweedfs/seaweedfs/issues/7470

* test: add SFTP integration tests

Add comprehensive integration tests for the SFTP server including:
- HomeDir path translation tests (verifies fix for issue #7470)
- Basic file upload/download operations
- Directory operations (mkdir, rmdir, list)
- Large file handling (1MB test)
- File rename operations
- Stat/Lstat operations
- Path edge cases (trailing slashes, .., unicode filenames)
- Admin root access verification

The test framework starts a complete SeaweedFS cluster with:
- Master server
- Volume server
- Filer server
- SFTP server with test user credentials

Test users are configured in testdata/userstore.json:
- admin: HomeDir=/ with full access
- testuser: HomeDir=/sftp/testuser with access to home
- readonly: HomeDir=/public with read-only access

* fix: correct SFTP HomeDir path translation and add CI

Fix path.Join issue where paths starting with '/' weren't joined correctly.
path.Join('/sftp/user', '/file') returns '/file' instead of '/sftp/user/file'.
Now we strip the leading '/' before joining.

Test improvements:
- Update go.mod to Go 1.24
- Fix weed binary discovery to prefer local build over PATH
- Add stabilization delay after service startup
- All 8 SFTP integration tests pass locally

Add GitHub Actions workflow for SFTP tests:
- Runs on push/PR affecting sftpd code or tests
- Tests HomeDir path translation, file ops, directory ops
- Covers issue #7470 fix verification

* security: update golang.org/x/crypto to v0.45.0

Addresses security vulnerability in golang.org/x/crypto < 0.45.0

* security: use proper SSH host key verification in tests

Replace ssh.InsecureIgnoreHostKey() with ssh.FixedHostKey() that
verifies the server's host key matches the known test key we generated.
This addresses CodeQL warning go/insecure-hostkeycallback.

Also updates go.mod to specify go 1.24.0 explicitly.

* security: fix path traversal vulnerability in SFTP toAbsolutePath

The previous implementation had a critical security vulnerability:
- Path traversal via '../..' could escape the HomeDir chroot jail
- Absolute paths were not correctly prefixed with HomeDir

The fix:
1. Concatenate HomeDir with userPath directly, then clean
2. Add security check to ensure final path stays within HomeDir
3. If traversal detected, safely return HomeDir instead

Also adds path traversal prevention tests to verify the fix.

* fix: address PR review comments

1. Fix SkipCleanup check to use actual test config instead of default
   - Added skipCleanup field to SftpTestFramework struct
   - Store config.SkipCleanup during Setup()
   - Use f.skipCleanup in Cleanup() instead of DefaultTestConfig()

2. Fix path prefix check false positive in mkdir
   - Changed from strings.HasPrefix(absPath, fs.user.HomeDir)
   - To: absPath == fs.user.HomeDir || strings.HasPrefix(absPath, fs.user.HomeDir+"/")
   - Prevents matching partial directory names (e.g., /sftp/username when HomeDir is /sftp/user)

* fix: check write permission on parent dir for mkdir

Aligns makeDir's permission check with newFileWriter for consistency.
To create a directory, a user needs write permission on the parent
directory, not mkdir permission on the new directory path.

* fix: refine SFTP path traversal logic and tests

1. Refine toAbsolutePath:
   - Use path.Join with strings.TrimPrefix for idiomatic path construction
   - Return explicit error on path traversal attempt instead of clamping
   - Updated all call sites to handle the error

2. Add Unit Tests:
   - Added sftp_server_test.go to verify toAbsolutePath logic
   - Covers normal paths, root path, and various traversal attempts

3. Update Integration Tests:
   - Updated PathTraversalPrevention test to reflect that standard SFTP clients
     sanitize paths before sending. The test now verifies successful containment
     within the jail rather than blocking (since the server receives a clean path).
   - The server-side blocking is verified by the new unit tests.

4. Makefile:
   - Removed -v from default test target

* fix: address PR comments on tests and makefile

1. Enhanced Unit Tests:
   - Added edge cases (empty path, multiple slashes, trailing slash) to sftp_server_test.go

2. Makefile Improvements:
   - Added 'all' target as default entry point

3. Code Clarity:
   - Added comment to mkdir permission check explaining defensive nature of HomeDir check

* fix: address PR review comments on permissions and tests

1. Security:
   - Added write permission check on target directory in renameEntry

2. Logging:
   - Changed dispatch log verbosity from V(0) to V(1)

3. Testing:
   - Updated Makefile .PHONY targets
   - Added unit test cases for empty/root HomeDir behavior in toAbsolutePath

* fix: set SFTP starting directory to virtual root

1. Critical Fix:
   - Changed sftp.WithStartDirectory from fs.user.HomeDir to '/'
   - Prevents double-prefixing when toAbsolutePath translates paths
   - Users now correctly start at their virtual root which maps to HomeDir

2. Test Improvements:
   - Use pointer for homeDir in tests for clearer nil vs empty distinction

* fix: clean HomeDir at config load time

Clean HomeDir path when loading users from JSON config.
This handles trailing slashes and other path anomalies at the source,
ensuring consistency throughout the codebase and avoiding repeated
cleaning on every toAbsolutePath call.

* test: strengthen assertions and add error checking in SFTP tests

1. Add error checking for cleanup operations in TestWalk
2. Strengthen cwd assertion to expect '/' explicitly in TestCurrentWorkingDirectory
3. Add error checking for cleanup in PathTraversalPrevention test
2025-12-03 13:42:05 -08:00
Lisandro Pin
d59cc1b09f Fix handling of fixed read-only volumes for volume.check.disk. (#7612)
There's unfortunatley no way to tell whether a volume is flagged read-only
because it got full, or because it is faulty. To address this, modify the
check logic so all read-only volumes are processed; if no changes are written
(i.e. the volume is healthy) it is kept as read-only.

Volumes which are modified in this process are deemed fixed, and switched to writable.
2025-12-03 11:33:35 -08:00
Xiao Wei
3bcadc9f90 fix: update getVersioningState to signal non-existent buckets with Er… (#7613)
* fix: update getVersioningState to signal non-existent buckets with ErrNotFound

This change modifies the getVersioningState function to return filer_pb.ErrNotFound when a requested bucket does not exist, allowing callers to handle the situation appropriately, such as auto-creating the bucket in PUT handlers. This improves error handling and clarity in the API's behavior regarding bucket existence.

* Update weed/s3api/s3api_bucket_config.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: 洪晓威 <xiaoweihong@deepglint.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-03 10:23:59 -08:00