Commit Graph

12508 Commits

Author SHA1 Message Date
Vladimir Shishkaryov
b49f3ce6d3 fix(chart): place backoffLimit correctly in resize hook (#8036)
Signed-off-by: Vladimir Shishkaryov <vladimir@jckls.com>
2026-01-15 12:45:49 -08:00
Chris Lu
7eb90fdfd7 Enhance EC balancing to separate parity and data shards (#8038)
* Enhance EC balancing to separate parity and data shards across racks

* Rename avoidRacks to antiAffinityRacks for clarity

* Implement server-level EC separation for parity/data shards

* Optimize EC balancing: consolidate helpers and extract two-pass selection logic

* Add comprehensive edge case tests for EC balancing logic

* Apply code review feedback: rename select_(), add divide-by-zero guard, fix comment

* Remove unused parameters from doBalanceEcShardsWithinOneRack and add explicit anti-affinity check

* Add disk-level anti-affinity for data/parity shard separation

- Modified pickBestDiskOnNode to accept shardId and dataShardCount
- Implemented explicit anti-affinity: 1000-point penalty for placing data shards on disks with parity (and vice versa)
- Updated all call sites including balancing and evacuation
- For evacuation, disabled anti-affinity by passing dataShardCount=0
2026-01-15 12:43:44 -08:00
Chris Lu
905e7e72d9 Add remote.copy.local command to copy local files to remote storage (#8033)
* Add remote.copy.local command to copy local files to remote storage

This new command solves the issue described in GitHub Discussion #8031 where
files exist locally but are not synced to remote storage due to missing filer logs.

Features:
- Copies local-only files to remote storage
- Supports file filtering (include/exclude patterns)
- Dry run mode to preview actions
- Configurable concurrency for performance
- Force update option for existing remote files
- Comprehensive error handling with retry logic

Usage:
  remote.copy.local -dir=/path/to/mount/dir [options]

This addresses the need to manually sync files when filer logs were
deleted or when local files were never synced to remote storage.

* shell: rename commandRemoteLocalSync to commandRemoteCopyLocal

* test: add comprehensive remote cache integration tests

* shell: fix forceUpdate logic in remote.copy.local

The previous logic only allowed force updates when localEntry.RemoteEntry
was not nil, which defeated the purpose of using -forceUpdate to fix
inconsistencies where local metadata might be missing.

Now -forceUpdate will overwrite remote files whenever they exist,
regardless of local metadata state.

* shell: fix code review issues in remote.copy.local

- Return actual error from flag parsing instead of swallowing it
- Use sync.Once to safely capture first error in concurrent operations
- Add atomic counter to track actual successful copies
- Protect concurrent writes to output with mutex to prevent interleaving
- Fix path matching to prevent false positives with sibling directories
  (e.g., /mnt/remote2 no longer matches /mnt/remote)

* test: address code review nitpicks in integration tests

- Improve create_bucket error handling to fail on real errors
- Fix test assertions to properly verify expected failures
- Use case-insensitive string matching for error detection
- Replace weak logging-only tests with proper assertions
- Remove extra blank line in Makefile

* test: remove redundant edge case tests

Removed 5 tests that were either duplicates or didn't assert meaningful behavior:
- TestEdgeCaseEmptyDirectory (duplicate of TestRemoteCopyLocalEmptyDirectory)
- TestEdgeCaseRapidCacheUncache (no meaningful assertions)
- TestEdgeCaseConcurrentCommands (only logs errors, no assertions)
- TestEdgeCaseInvalidPaths (no security assertions)
- TestEdgeCaseFileNamePatterns (duplicate of pattern tests in cache tests)

Kept valuable stress tests: nested directories, special characters,
very large files (100MB), many small files (100), and zero-byte files.

* test: fix CI failures by forcing localhost IP advertising

Added -ip=127.0.0.1 flag to both primary and remote weed mini commands
to prevent IP auto-detection issues in CI environments. Without this flag,
the master would advertise itself using the actual IP (e.g., 10.1.0.17)
while binding to 127.0.0.1, causing connection refused errors when other
services tried to connect to the gRPC port.

* test: address final code review issues

- Add proper error assertions for concurrent commands test
- Require errors for invalid path tests instead of just logging
- Remove unused 'match' field from pattern test struct
- Add dry-run output assertion to verify expected behavior
- Simplify redundant condition in remote.copy.local (remove entry.RemoteEntry check)

* test: fix remote.configure tests to match actual validation rules

- Use only letters in remote names (no numbers) to match validation
- Relax missing parameter test expectations since validation may not be strict
- Generate unique names using letter suffix instead of numbers

* shell: rename pathToCopyCopy to localPath for clarity

Improved variable naming in concurrent copy loop to make the code
more readable and less repetitive.

* test: fix remaining test failures

- Remove strict error requirement for invalid paths (commands handle gracefully)
- Fix TestRemoteUncacheBasic to actually test uncache instead of cache
- Use simple numeric names for remote.configure tests (testcfg1234 format)
  to avoid validation issues with letter-only or complex name generation

* test: use only letters in remote.configure test names

The validation regex ^[A-Za-z][A-Za-z0-9]*$ requires names to start with
a letter, but using static letter-only names avoids any potential issues
with the validation.

* test: remove quotes from -name parameter in remote.configure tests

Single quotes were being included as part of the name value, causing
validation failures. Changed from -name='testremote' to -name=testremote.

* test: fix remote.configure assertion to be flexible about JSON formatting

Changed from checking exact JSON format with specific spacing to just
checking if the name appears in the output, since JSON formatting
may vary (e.g., "name":  "value" vs "name": "value").
2026-01-15 00:52:57 -08:00
Jaehoon Kim
f2e7af257d Fix volume.fsck -forcePurging -reallyDeleteFromVolume to fail fast on filer traversal errors (#8015)
* Add TraverseBfsWithContext and fix race conditions in error handling

- Add TraverseBfsWithContext function to support context cancellation
- Fix race condition in doTraverseBfsAndSaving using atomic.Bool and sync.Once
- Improve error handling with fail-fast behavior and proper error propagation
- Update command_volume_fsck to use error-returning saveFn callback
- Enhance error messages in readFilerFileIdFile with detailed context

* refactoring

* fix error format

* atomic

* filer_pb: make enqueue return void

* shell: simplify fs.meta.save error handling

* filer_pb: handle enqueue return value

* Revert "atomic"

This reverts commit 712648bc354b186d6654fdb8a46fd4848fdc4e00.

* shell: refine fs.meta.save logic

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 21:37:50 -08:00
Walnuts
691aea84c3 feat: add TLS configuration options for Cassandra2 store (#7998)
* feat: add TLS configuration options for Cassandra2 store

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>

* fix: use 9142 port in tls connection

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>

* Align the setting field names with gocql's SSLOpts.

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>

* Removed: store.cluster.Port = 9142

* chore: update gocql dependency to v2

* refactor: improve Cassandra TLS configuration and port logic

* docs: update filer.toml scaffold with ssl_enable_host_verification

---------

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 17:59:59 -08:00
Chris Lu
f47bc8c539 Fix remote.meta.sync TTL issue (#8021) (#8030)
* Fix remote.meta.sync TTL issue (#8021)

Remote entries should not have TTL applied because they represent files
in remote storage, not local SeaweedFS files. When TTL was configured on
a prefix, remote.meta.sync would create entries that immediately expired,
causing them to be deleted and recreated on each sync.

Changes:
- Set TtlSec=0 explicitly when creating remote entries in remote.meta.sync
- Skip TTL application in CreateEntry handler for entries with Remote field set

Fixes #8021

* Add TTL protection for remote entries in update path

- Set TtlSec=0 in doSaveRemoteEntry before calling UpdateEntry
- Add server-side TTL protection in UpdateEntry handler for remote entries
- Ensures remote entries don't inherit or preserve TTL when updated
2026-01-14 14:45:52 -08:00
Chris Lu
ba74185700 fix: CompactMap race condition causing runtime panic (#8029)
Fixed critical race condition in CompactMap where Set(), Delete(), and
Get() methods had issues with concurrent map access.

Root cause: segmentForKey() can create new map segments, which modifies
the cm.segments map. Calling this under a read lock caused concurrent
map write panics when multiple goroutines accessed the map simultaneously
(e.g., during VolumeNeedleStatus gRPC calls).

Changes:
- Set() method: Changed RLock/RUnlock to Lock/Unlock
- Delete() method: Changed RLock/RUnlock to Lock/Unlock, optimized to
  avoid creating empty segments when key doesn't exist
- Get() method: Removed segmentForKey() call to avoid race condition,
  now checks segment existence directly and returns early if segment
  doesn't exist (optimization: avoids unnecessary segment creation)

This fix resolves the runtime/maps.fatal panic that occurred under
concurrent load.

Tested with race detector: go test -v -race ./weed/storage/needle_map/...
2026-01-14 14:12:49 -08:00
Feng Shao
8abcdc6d00 use "s" flag of regexp to let . match \n (#8024)
* use "s" flag of regexp to let . match \n

the partten "/{object:.+}" cause the mux failed to match URI of object
with new line char, and the request fall thru into bucket handlers.

* refactor

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 13:49:49 -08:00
Chris Lu
df3f308740 s3api: use updateAuthenticationState helper and clarified log message 2026-01-14 13:14:14 -08:00
Chris Lu
e11c0425f8 s3api: extract updateAuthenticationState helper method 2026-01-14 13:13:08 -08:00
Chris Lu
39c4155ba6 s3api: remove redundant isAuthEnabled assignment in constructor 2026-01-14 13:12:49 -08:00
Chris Lu
1950c31786 Remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY exports
Removed AWS environment variable exports from README.
2026-01-14 13:09:11 -08:00
Chris Lu
12a1a131c9 s3api: allow-all default when no credentials are configured (#8027)
* s3api: allow-all default for weed mini and handle dynamic credential updates

* s3api: refactor authentication initialization for clarity

* s3api: reduce lock contention in NewIdentityAccessManagementWithStore

* s3api: reduce lock contention and enforce one-way auth in replaceS3ApiConfiguration

* s3api: reduce lock contention in mergeS3ApiConfiguration

* s3api: simplify auth initialization and remove redundant variables
2026-01-14 13:06:27 -08:00
Chris Lu
2388a2b036 Update Stargazers image link to adaptive variant 2026-01-12 16:02:04 -08:00
Chris Lu
c023eed842 Update Stargazers image link in README 2026-01-12 16:00:57 -08:00
Chris Lu
da83a790c7 Fix link to Google's Colossus File System 2026-01-12 15:59:39 -08:00
Chris Lu
851b92fe35 Implement optional path-prefix and method scoping for Filer HTTP JWT 2026-01-12 15:57:18 -08:00
Chris Lu
ba97f3cc8e Update README.md 2026-01-12 15:53:27 -08:00
Chris Lu
1046bd009a feat: Optional path-prefix and method scoping for Filer HTTP JWT (#8014)
* Implement optional path-prefix and method scoping for Filer HTTP JWT

* Fix security vulnerability and improve test error handling

* Address PR feedback: replace debug logging and improve tests

* Use URL.Path in logs to avoid leaking query params
2026-01-12 13:21:48 -08:00
dependabot[bot]
60f7dbec4d chore(deps): bump github.com/mattn/go-sqlite3 from 1.14.32 to 1.14.33 (#8012)
Bumps [github.com/mattn/go-sqlite3](https://github.com/mattn/go-sqlite3) from 1.14.32 to 1.14.33.
- [Release notes](https://github.com/mattn/go-sqlite3/releases)
- [Commits](https://github.com/mattn/go-sqlite3/compare/v1.14.32...v1.14.33)

---
updated-dependencies:
- dependency-name: github.com/mattn/go-sqlite3
  dependency-version: 1.14.33
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 12:40:42 -08:00
Chris Lu
269092c8c3 fix(gcs): resolve credential conflict in remote storage mount (#8013)
* fix(gcs): resolve credential conflict in remote storage mount

Manually handle GCS credentials to avoid conflict with automatic discovery.
Fixes #8007

* fix(gcs): use %w for error wrapping in gcs_storage_client.go

Address review feedback to use idiomatic error wrapping.
2026-01-12 12:22:42 -08:00
dependabot[bot]
64a34ff69b chore(deps): bump github.com/shirou/gopsutil/v4 from 4.25.11 to 4.25.12 (#8011)
Bumps [github.com/shirou/gopsutil/v4](https://github.com/shirou/gopsutil) from 4.25.11 to 4.25.12.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v4.25.11...v4.25.12)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v4
  dependency-version: 4.25.12
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 12:20:52 -08:00
dependabot[bot]
9ccc844df0 chore(deps): bump github.com/klauspost/reedsolomon from 1.12.6 to 1.13.0 (#8010)
Bumps [github.com/klauspost/reedsolomon](https://github.com/klauspost/reedsolomon) from 1.12.6 to 1.13.0.
- [Release notes](https://github.com/klauspost/reedsolomon/releases)
- [Commits](https://github.com/klauspost/reedsolomon/compare/v1.12.6...v1.13.0)

---
updated-dependencies:
- dependency-name: github.com/klauspost/reedsolomon
  dependency-version: 1.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 12:20:23 -08:00
dependabot[bot]
138371ce4a chore(deps): bump google.golang.org/grpc from 1.77.0 to 1.78.0 (#8009)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.77.0 to 1.78.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 12:20:10 -08:00
dependabot[bot]
d6417c9167 chore(deps): bump github.com/parquet-go/parquet-go from 0.26.3 to 0.26.4 (#8008)
Bumps [github.com/parquet-go/parquet-go](https://github.com/parquet-go/parquet-go) from 0.26.3 to 0.26.4.
- [Release notes](https://github.com/parquet-go/parquet-go/releases)
- [Changelog](https://github.com/parquet-go/parquet-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/parquet-go/parquet-go/compare/v0.26.3...v0.26.4)

---
updated-dependencies:
- dependency-name: github.com/parquet-go/parquet-go
  dependency-version: 0.26.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 11:56:56 -08:00
Chris Lu
6b0eade6d4 storage: upgrade protobuf API in store_state.go 2026-01-12 10:50:47 -08:00
Chris Lu
587e782feb storage: use non-blocking send to StateUpdateChan 2026-01-12 10:50:46 -08:00
Lisandro Pin
2af293ce60 Boostrap persistent state for volume servers. (#7984)
This PR implements logic load/save persistent state information for storages
associated with volume servers, and reporting state changes back to masters
via heartbeat messages.

More work ensues!

See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.
2026-01-12 10:49:59 -08:00
Chris Lu
06391701ed Add AssumeRole and AssumeRoleWithLDAPIdentity STS actions (#8003)
* test: add integration tests for AssumeRole and AssumeRoleWithLDAPIdentity STS actions

- Add s3_sts_assume_role_test.go with comprehensive tests for AssumeRole:
  * Parameter validation (missing RoleArn, RoleSessionName, invalid duration)
  * AWS SigV4 authentication with valid/invalid credentials
  * Temporary credential generation and usage

- Add s3_sts_ldap_test.go with tests for AssumeRoleWithLDAPIdentity:
  * Parameter validation (missing LDAP credentials, RoleArn)
  * LDAP authentication scenarios (valid/invalid credentials)
  * Integration with LDAP server (when configured)

- Update Makefile with new test targets:
  * test-sts: run all STS tests
  * test-sts-assume-role: run AssumeRole tests only
  * test-sts-ldap: run LDAP STS tests only
  * test-sts-suite: run tests with full service lifecycle

- Enhance setup_all_tests.sh:
  * Add OpenLDAP container setup for LDAP testing
  * Create test LDAP users (testuser, ldapadmin)
  * Set LDAP environment variables for tests
  * Update cleanup to remove LDAP container

- Fix setup_keycloak.sh:
  * Enable verbose error logging for realm creation
  * Improve error diagnostics

Tests use fail-fast approach (t.Fatal) when server not configured,
ensuring clear feedback when infrastructure is missing.

* feat: implement AssumeRole and AssumeRoleWithLDAPIdentity STS actions

Implement two new STS actions to match MinIO's STS feature set:

**AssumeRole Implementation:**
- Add handleAssumeRole with full AWS SigV4 authentication
- Integrate with existing IAM infrastructure via verifyV4Signature
- Validate required parameters (RoleArn, RoleSessionName)
- Validate DurationSeconds (900-43200 seconds range)
- Generate temporary credentials with expiration
- Return AWS-compatible XML response

**AssumeRoleWithLDAPIdentity Implementation:**
- Add handleAssumeRoleWithLDAPIdentity handler (stub)
- Validate LDAP-specific parameters (LDAPUsername, LDAPPassword)
- Validate common STS parameters (RoleArn, RoleSessionName, DurationSeconds)
- Return proper error messages for missing LDAP provider
- Ready for LDAP provider integration

**Routing Fixes:**
- Add explicit routes for AssumeRole and AssumeRoleWithLDAPIdentity
- Prevent IAM handler from intercepting authenticated STS requests
- Ensure proper request routing priority

**Handler Infrastructure:**
- Add IAM field to STSHandlers for SigV4 verification
- Update NewSTSHandlers to accept IAM reference
- Add STS-specific error codes and response types
- Implement writeSTSErrorResponse for AWS-compatible errors

The AssumeRole action is fully functional and tested.
AssumeRoleWithLDAPIdentity requires LDAP provider implementation.

* fix: update IAM matcher to exclude STS actions from interception

Update the IAM handler matcher to check for STS actions (AssumeRole,
AssumeRoleWithWebIdentity, AssumeRoleWithLDAPIdentity) and exclude them
from IAM handler processing. This allows STS requests to be handled by
the STS fallback handler even when they include AWS SigV4 authentication.

The matcher now parses the form data to check the Action parameter and
returns false for STS actions, ensuring they are routed to the correct
handler.

Note: This is a work-in-progress fix. Tests are still showing some
routing issues that need further investigation.

* fix: address PR review security issues for STS handlers

This commit addresses all critical security issues from PR review:

Security Fixes:
- Use crypto/rand for cryptographically secure credential generation
  instead of time.Now().UnixNano() (fixes predictable credentials)
- Add sts:AssumeRole permission check via VerifyActionPermission to
  prevent unauthorized role assumption
- Generate proper session tokens using crypto/rand instead of
  placeholder strings

Code Quality Improvements:
- Refactor DurationSeconds parsing into reusable parseDurationSeconds()
  helper function used by all three STS handlers
- Create generateSecureCredentials() helper for consistent and secure
  temporary credential generation
- Fix iamMatcher to check query string as fallback when Action not
  found in form data

LDAP Provider Implementation:
- Add go-ldap/ldap/v3 dependency
- Create LDAPProvider implementing IdentityProvider interface with
  full LDAP authentication support (connect, bind, search, groups)
- Update ProviderFactory to create real LDAP providers
- Wire LDAP provider into AssumeRoleWithLDAPIdentity handler

Test Infrastructure:
- Add LDAP user creation verification step in setup_all_tests.sh

* fix: address PR feedback (Round 2) - config validation & provider improvements

- Implement `validateLDAPConfig` in `ProviderFactory`
- Improve `LDAPProvider.Initialize`:
  - Support `connectionTimeout` parsing (string/int/float) from config map
  - Warn if `BindDN` is present but `BindPassword` is empty
- Improve `LDAPProvider.GetUserInfo`:
  - Add fallback to `searchUserGroups` if `memberOf` returns no groups (consistent with Authenticate)

* fix: address PR feedback (Round 3) - LDAP connection improvements & build fix

- Improve `LDAPProvider` connection handling:
  - Use `net.Dialer` with configured timeout for connection establishment
  - Enforce TLS 1.2+ (`MinVersion: tls.VersionTLS12`) for both LDAPS and StartTLS
- Fix build error in `s3api_sts.go` (format verb for ErrorCode)

* fix: address PR feedback (Round 4) - LDAP hardening, Authz check & Routing fix

- LDAP Provider Hardening:
  - Prevent re-initialization
  - Enforce single user match in `GetUserInfo` (was explicit only in Authenticate)
  - Ensure connection closure if StartTLS fails
- STS Handlers:
  - Add robust provider detection using type assertion
  - **Security**: Implement authorization check (`VerifyActionPermission`) after LDAP authentication
- Routing:
  - Update tests to reflect that STS actions are handled by STS handler, not generic IAM

* fix: address PR feedback (Round 5) - JWT tokens, ARN formatting, PrincipalArn

CRITICAL FIXES:
- Replace standalone credential generation with STS service JWT tokens
  - handleAssumeRole now generates proper JWT session tokens
  - handleAssumeRoleWithLDAPIdentity now generates proper JWT session tokens
  - Session tokens can be validated across distributed instances

- Fix ARN formatting in responses
  - Extract role name from ARN using utils.ExtractRoleNameFromArn()
  - Prevents malformed ARNs like "arn:aws:sts::assumed-role/arn:aws:iam::..."

- Add configurable AccountId for federated users
  - Add AccountId field to STSConfig (defaults to "111122223333")
  - PrincipalArn now uses configured account ID instead of hardcoded "aws"
  - Enables proper trust policy validation

IMPROVEMENTS:
- Sanitize LDAP authentication error messages (don't leak internal details)
- Remove duplicate comment in provider detection
- Add utils import for ARN parsing utilities

* feat: implement LDAP connection pooling to prevent resource exhaustion

PERFORMANCE IMPROVEMENT:
- Add connection pool to LDAPProvider (default size: 10 connections)
- Reuse LDAP connections across authentication requests
- Prevent file descriptor exhaustion under high load

IMPLEMENTATION:
- connectionPool struct with channel-based connection management
- getConnection(): retrieves from pool or creates new connection
- returnConnection(): returns healthy connections to pool
- createConnection(): establishes new LDAP connection with TLS support
- Close(): cleanup method to close all pooled connections
- Connection health checking (IsClosing()) before reuse

BENEFITS:
- Reduced connection overhead (no TCP handshake per request)
- Better resource utilization under load
- Prevents "too many open files" errors
- Non-blocking pool operations (creates new conn if pool empty)

* fix: correct TokenGenerator access in STS handlers

CRITICAL FIX:
- Make TokenGenerator public in STSService (was private tokenGenerator)
- Update all references from Config.TokenGenerator to TokenGenerator
- Remove TokenGenerator from STSConfig (it belongs in STSService)

This fixes the "NotImplemented" errors in distributed and Keycloak tests.
The issue was that Round 5 changes tried to access Config.TokenGenerator
which didn't exist - TokenGenerator is a field in STSService, not STSConfig.

The TokenGenerator is properly initialized in STSService.Initialize() and
is now accessible for JWT token generation in AssumeRole handlers.

* fix: update tests to use public TokenGenerator field

Following the change to make TokenGenerator public in STSService,
this commit updates the test files to reference the correct public field name.
This resolves compilation errors in the IAM STS test suite.

* fix: update distributed tests to use valid Keycloak users

Updated s3_iam_distributed_test.go to use 'admin-user' and 'read-user'
which exist in the standard Keycloak setup provided by setup_keycloak.sh.
This resolves 'unknown test user' errors in distributed integration tests.

* fix: ensure iam_config.json exists in setup target for CI

The GitHub Actions workflow calls 'make setup' which was not creating
iam_config.json, causing the server to start without IAM integration
enabled (iamIntegration = nil), resulting in NotImplemented errors.

Now 'make setup' copies iam_config.local.json to iam_config.json if
it doesn't exist, ensuring IAM is properly configured in CI.

* fix(iam/ldap): fix connection pool race and rebind corruption

- Add atomic 'closed' flag to connection pool to prevent racing on Close()
- Rebind authenticated user connections back to service account before returning to pool
- Close connections on error instead of returning potentially corrupted state to pool

* fix(iam/ldap): populate standard TokenClaims fields in ValidateToken

- Set Subject, Issuer, Audience, IssuedAt, and ExpiresAt to satisfy the interface
- Use time.Time for timestamps as required by TokenClaims struct
- Default to 1 hour TTL for LDAP tokens

* fix(s3api): include account ID in STS AssumedRoleUser ARN

- Consistent with AWS, include the account ID in the assumed-role ARN
- Use the configured account ID from STS service if available, otherwise default to '111122223333'
- Apply to both AssumeRole and AssumeRoleWithLDAPIdentity handlers
- Also update .gitignore to ignore IAM test environment files

* refactor(s3api): extract shared STS credential generation logic

- Move common logic for session claims and credential generation to prepareSTSCredentials
- Update handleAssumeRole and handleAssumeRoleWithLDAPIdentity to use the helper
- Remove stale comments referencing outdated line numbers

* feat(iam/ldap): make pool size configurable and add audience support

- Add PoolSize to LDAPConfig (default 10)
- Add Audience to LDAPConfig to align with OIDC validation
- Update initialization and ValidateToken to use new fields

* update tests

* debug

* chore(iam): cleanup debug prints and fix test config port

* refactor(iam): use mapstructure for LDAP config parsing

* feat(sts): implement strict trust policy validation for AssumeRole

* test(iam): refactor STS tests to use AWS SDK signer

* test(s3api): implement ValidateTrustPolicyForPrincipal in MockIAMIntegration

* fix(s3api): ensure IAM matcher checks query string on ParseForm error

* fix(sts): use crypto/rand for secure credentials and extract constants

* fix(iam): fix ldap connection leaks and add insecure warning

* chore(iam): improved error wrapping and test parameterization

* feat(sts): add support for LDAPProviderName parameter

* Update weed/iam/ldap/ldap_provider.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/s3api/s3api_sts.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(sts): use STSErrSTSNotReady when LDAP provider is missing

* fix(sts): encapsulate TokenGenerator in STSService and add getter

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-12 10:45:24 -08:00
Chris Lu
d7c30fdb2b fix: admin does not show all master servers #7999 (#8002) 2026-01-11 12:31:46 -08:00
Sheya Bernstein
844859de7f fix: add filer fallback after consecutive connection failures (#8000) 2026-01-11 12:14:27 -08:00
Sheya Bernstein
8740a087b9 fix: apply tpl function to all component extraEnvironmentVars (#8001) 2026-01-11 12:14:16 -08:00
Chris Lu
2b5e951390 use context.WithoutCancel to avoid context cancellation when the client connection is closed 2026-01-11 11:59:44 -08:00
Chris Lu
ce6e9be66b 4.06 2026-01-10 12:08:16 -08:00
Chris Lu
379c032868 Fix chown Input/output error on large file sets (#7996)
* Fix chown Input/output error on large file sets (Fixes #7911)

Implemented retry logic for MySQL/MariaDB backend to handle transient errors like deadlocks and timeouts.

* Fix syntax error: missing closing brace

* Refactor: Use %w for error wrapping and errors.As for extraction

* Fix: Disable retry logic inside transactions
2026-01-09 18:02:59 -08:00
Nicholas Boyd Isacsson
88e9e2c471 fix: Invalid volume mount conditional in filer template (#7992)
There is a mistmatch in the conditionals for the definition and mounting of the `config-users` volume in the filer's template. 

Volume definition:
```
        {{- if and .Values.filer.s3.enabled .Values.filer.s3.enableAuth }}
```
Mount:
```
            {{- if .Values.filer.s3.enableAuth }}
```

This leads to an invalid specification in the case where s3 is disabled but the enableAuth value is set to true, as it tries to mount in an undefined volume. I've fixed it here by adding the extra check to the latter conditional.
2026-01-09 12:10:40 -08:00
Chris Lu
ad76487e9d Fix special characters in admin-generated secret keys (#7994)
Fixes #7990

The issue was that the Charset constant used for generating secret keys
included the '/' character, which is URL-unsafe. When secret keys
containing '/' were used in HTTP requests, they would be URL-encoded,
causing a mismatch during signature verification.

Changes:
- Removed '/' from the Charset constant in weed/iam/constants.go
- Added TestGenerateSecretAccessKey_URLSafe to verify generated keys
  don't contain URL-unsafe characters like '/' or '+'

This ensures all newly generated secret keys are URL-safe and will
work correctly with S3 authentication. Existing keys continue to work.
2026-01-09 11:55:17 -08:00
Chris Lu
1ea6b0c0d9 cleanup: deduplicate environment variable credential loading
Previously, `weed mini` logic duplicated the credential loading process
by creating a temporary IAM config file from environment variables.
`auth_credentials.go` also had fallback logic to load these variables.

This change:
1. Updates `auth_credentials.go` to *always* check for and merge
   AWS environment variable credentials (`AWS_ACCESS_KEY_ID`, etc.)
   into the identity list. This ensures they are available regardless
   of whether other configurations (static file or filer) are loaded.
2. Removes the redundant file creation logic from `weed/command/mini.go`.
3. Updates `weed mini` user messages to accurately reflect that
   credentials are loaded from environment variables in-memory.

This results in a cleaner implementation where `weed/s3api` manages
all credential loading logic, and `weed mini` simply relies on it.
2026-01-08 20:35:37 -08:00
Chris Lu
7f1182472a fix: enable dual loading of static and dynamic IAM configuration
Refactored `NewIdentityAccessManagementWithStore` to remove mutual
exclusivity between static (file-based) and dynamic (filer-based)
configuration loading.

Previously, if a static config configuration was present (including the
legacy `IamConfig` option used by `weed mini`), it prevented loading
users from the filer.

Now, the system loads the static configuration first (if present), and
then *always* attempts to merge in the dynamic configuration from the
filer. This ensures that:
1. Static users (e.g. from `weed mini` env vars or `-s3.config`) are loaded and protected.
2. Dynamic users (e.g. created via Admin UI and stored in Filer) are also loaded and available.
2026-01-08 20:22:04 -08:00
Chris Lu
451b897d56 fix: support loading static config from IamConfig option for mini mode
`weed mini` sets the `-s3.iam.config` flag instead of `-s3.config`,
which populates `S3ApiServerOption.IamConfig`.

Previously, `NewIdentityAccessManagementWithStore` only checked
`option.Config`. This caused `weed mini` generated credentials (written
to a temp file passed via IamConfig) to be ignored, breaking S3 access
in mini mode even when environment variables were provided.

This change ensures we try to load the configuration from `IamConfig`
if `Config` is empty, restoring functionality for `weed mini`.
2026-01-08 20:17:33 -08:00
Chris Lu
48ded6b965 fix: allow environment variable fallback when filer config is empty
Fixed regression where AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
environment variables were not being loaded as fallback credentials.

The issue was that configLoaded was set to true when filer call
succeeded, even if it returned an empty configuration. This blocked
the environment variable fallback logic.

Now only set configLoaded = true when we actually have loaded
identities, allowing env vars to work correctly in mini mode and
other scenarios where filer config is empty.
2026-01-08 20:11:57 -08:00
Chris Lu
4e835a1d81 fix(s3api): ensure S3 configuration persistence and refactor authorization tests (#7989)
* fix(s3api): ensure static config file takes precedence over dynamic updates

When a static S3 configuration file is provided, avoid overwriting
the configuration from dynamic filer updates. This ensures the
documented "Highest Priority" for the configuration file is respected.

* refactor(s3api): implement merge-based static config with immutable identities

Static identities from config file are now immutable and protected from
dynamic updates. Dynamic identities (from admin panel) can be added and
updated without affecting static entries.

- Track identity names loaded from static config file
- Implement merge logic that preserves static identities
- Allow dynamic identities to be added or updated
- Remove blanket block on config file updates

* fix: address PR review comments for static config merge logic

Critical Bugs:
- Fix existingIdx always-false condition causing duplicate identities
- Fix race condition in static config initialization (move useStaticConfig inside mutex)

Security & Robustness:
- Add nil identity check in VerifyActionPermission to fail closed
- Mask access keys in STS validation logs to avoid exposing credentials
- Add nil guard for s3a.iam in subscription handler

Test Improvements:
- Add authCalled tracking to MockIAMIntegration for explicit verification
- Lower log level for static config messages to reduce noise

* fix: prevent duplicates and race conditions in merge logic

Data Integrity:
- Prevent service account credential duplicates on repeated merges
- Clean up stale accessKeyIdent entries when replacing identities
- Check existing credentials before appending

Concurrency Safety:
- Add synchronization to IsStaticConfig method

Test Improvements:
- Add mux route vars for proper GetBucketAndObject extraction
- Add STS session token header to trigger correct auth path
2026-01-08 19:29:54 -08:00
Chris Lu
6bf0c16862 fix admin copy text functions 2026-01-08 18:44:36 -08:00
Chris Lu
abfa64456b Fix STS authorization in streaming/chunked uploads (#7988)
* Fix STS authorization in streaming/chunked uploads

During streaming/chunked uploads (SigV4 streaming), authorization happens
twice:
1. Initial authorization in authRequestWithAuthType() - works correctly
2. Second authorization in verifyV4Signature() - was failing for STS

The issue was that verifyV4Signature() only used identity.canDo() for
permission checks, which always denies STS identities (they have empty
Actions). This bypassed IAM authorization completely.

This commit makes verifyV4Signature() IAM-aware by adding the same
fallback logic used in authRequestWithAuthType():
- Traditional identities (with Actions) use legacy canDo() check
- STS/JWT identities (empty Actions) fall back to IAM authorization

Fixes: https://github.com/seaweedfs/seaweedfs/pull/7986#issuecomment-3723196038

* Add comprehensive unit tests for STS authorization in streaming uploads

Created test suite to verify that verifyV4Signature properly handles STS
identities by falling back to IAM authorization when shouldCheckPermissions
is true.

Tests cover:
- STS identities with IAM integration (allow and deny cases)
- STS identities without IAM integration (should deny)
- Traditional identities with Actions (canDo check)
- Permission check bypass when shouldCheckPermissions=false
- Specific streaming upload scenario from bug report
- Action determination based on HTTP method

All tests pass successfully.

* Refactor authorization logic to avoid duplication

Centralized the authorization logic into IdentityAccessManagement.VerifyActionPermission.
Updated auth_signature_v4.go and auth_credentials.go to use this new helper.
Updated tests to clarify that they mirror the centralized logic.

* Refactor tests to use VerifyActionPermission directly

Introduced IAMIntegration interface to facilitate mocking of internal IAM integration logic.
Updated IdentityAccessManagement to use the interface.
Updated tests to directy call VerifyActionPermission using a mocked IAM integration, eliminating duplicated logic in tests.

* fix(s3api): ensure static config file takes precedence and refactor tests

- Track if configuration was loaded from a static file using `useStaticConfig`.
- Ignore filer-based IAM updates when a static configuration is in use to respect "Highest Priority" rule.
- Refactor `TestVerifyV4SignatureWithSTSIdentity` to use `VerifyActionPermission` directly.
- Fix typed nil interface panic in authorization test.
2026-01-08 17:06:56 -08:00
Chris Lu
217d8b9e0e Fix: ListObjectVersions delimiter support (#7987)
* Fix: Add delimiter support to ListObjectVersions with proper truncation

- Implemented delimiter support to group keys into CommonPrefixes
- Fixed critical truncation bug: now merges versions and common prefixes into single sorted list before truncation
- Ensures total items never exceed MaxKeys (prevents infinite pagination loops)
- Properly sets NextKeyMarker and NextVersionIdMarker for pagination
- Added integration tests in test/s3/versioning/s3_versioning_delimiter_test.go
- Verified behavior matches S3 API specification

* Fix: Add delimiter support to ListObjectVersions with proper truncation

- Implemented delimiter support to group keys into CommonPrefixes
- Fixed critical truncation bug: now merges versions and common prefixes before truncation
- Added safety guard for maxKeys=0 to prevent panics
- Condensed verbose comments for better readability
- Added robust Go integration tests with nil checks for AWS SDK pointers
- Verified behavior matches S3 API specification
- Resolved compilation error in integration tests
- Refined pagination comments and ensured exclusive KeyMarker behavior
- Refactored listObjectVersions into helper methods for better maintainability
2026-01-08 14:02:19 -08:00
Chris Lu
4ba89bf73b adjust log level 2026-01-08 10:57:59 -08:00
Chris Lu
bd237999bb weed mini can optionally skip s3 2026-01-08 10:05:42 -08:00
Chris Lu
5a3aade445 less logs 2026-01-08 10:00:22 -08:00
Chris Lu
f02e283ad2 add a nginx with ssl for easier testing 2026-01-08 09:59:31 -08:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00