Go to file

Chris Lu 2f155ee5ee feat: Add S3 Tables support for Iceberg tabular data (#8147 )

* s3tables: extract utility and filer operations to separate modules

- Move ARN parsing, path helpers, and metadata structures to utils.go
- Extract all extended attribute and filer operations to filer_ops.go
- Reduces code duplication and improves modularity
- Improves code organization and maintainability

* s3tables: split table bucket operations into focused modules

- Create bucket_create.go for CreateTableBucket operation
- Create bucket_get_list_delete.go for Get, List, Delete operations
- Related operations grouped for better maintainability
- Each file has a single, clear responsibility
- Improves code clarity and makes it easier to test

* s3tables: simplify handler by removing duplicate utilities

- Reduce handler.go from 370 to 195 lines (47% reduction)
- Remove duplicate ARN parsing and path helper functions
- Remove filer operation methods moved to filer_ops.go
- Remove metadata structure definitions moved to utils.go
- Keep handler focused on request routing and response formatting
- Maintains all functionality with improved code organization

* s3tables: complete s3tables package implementation

- namespace.go: namespace CRUD operations (310 lines)
- table.go: table CRUD operations with Iceberg schema support (409 lines)
- policy.go: resource policies and tagging operations (419 lines)
- types.go: request/response types and error definitions (290 lines)
- All handlers updated to use standalone utilities from utils.go
- All files follow single responsibility principle

* s3api: add S3 Tables integration layer

- Create s3api_tables.go to integrate S3 Tables with S3 API server
- Implement S3 Tables route matcher for X-Amz-Target header
- Register S3 Tables routes with API router
- Provide gRPC filer client interface for S3 Tables handlers
- All S3 Tables operations accessible via S3 API endpoint

* s3api: register S3 Tables routes in API server

- Add S3 Tables route registration in s3api_server.go registerRouter method
- Enable S3 Tables API operations to be routed through S3 API server
- Routes handled by s3api_tables.go integration layer
- Minimal changes to existing S3 API structure

* test: add S3 Tables test infrastructure

- Create setup.go with TestCluster and S3TablesClient definitions
- Create client.go with HTTP client methods for all operations
- Test utilities and client methods organized for reusability
- Foundation for S3 Tables integration tests

* test: add S3 Tables integration tests

- Comprehensive integration tests for all 23 S3 Tables operations
- Test cluster setup based on existing S3 integration tests
- Tests cover:
  * Table bucket lifecycle (create, get, list, delete)
  * Namespace operations
  * Table CRUD with Iceberg schema
  * Table bucket and table policies
  * Resource tagging operations
- Ready for CI/CD pipeline integration

* ci: add S3 Tables integration tests to GitHub Actions

- Create new workflow for S3 Tables integration testing
- Add build verification job for s3tables package and s3api integration
- Add format checking for S3 Tables code
- Add go vet checks for code quality
- Workflow runs on all pull requests
- Includes test output logging and artifact upload on failure

* s3tables: add handler_ prefix to operation handler files

- Rename bucket_create.go → handler_bucket_create.go
- Rename bucket_get_list_delete.go → handler_bucket_get_list_delete.go
- Rename namespace.go → handler_namespace.go
- Rename table.go → handler_table.go
- Rename policy.go → handler_policy.go

Improves file organization by clearly identifying handler implementations.
No code changes, refactoring only.

* s3tables test: refactor to eliminate duplicate definitions

- Move all client methods to client.go
- Remove duplicate types/constants from s3tables_integration_test.go
- Keep setup.go for test infrastructure
- Keep integration test logic in s3tables_integration_test.go
- Clean up unused imports
- Test compiles successfully

* Delete client_methods.go

* s3tables: add bucket name validation and fix error handling

- Add isValidBucketName validation function for [a-z0-9_-] characters
- Validate bucket name characters match ARN parsing regex
- Fix error handling in WithFilerClient closure - properly check for lookup errors
- Add error handling for json.Marshal calls (metadata and tags)
- Improve error messages and logging

* s3tables: add error handling for json.Marshal calls

- Add error handling in handler_namespace.go (metadata marshaling)
- Add error handling in handler_table.go (metadata and tags marshaling)
- Add error handling in handler_policy.go (tag marshaling in TagResource and UntagResource)
- Return proper errors with context instead of silently ignoring failures

* s3tables: replace custom splitPath with stdlib functions

- Remove custom splitPath implementation (23 lines)
- Use filepath.Dir and filepath.Base from stdlib
- More robust and handles edge cases correctly
- Reduces code duplication

* s3tables: improve error handling specificity in ListTableBuckets

- Specifically check for 'not found' errors instead of catching all errors
- Return empty list only when directory doesn't exist
- Propagate other errors (network, permission) with context
- Prevents masking real errors

* s3api_tables: optimize action validation with map lookup

- Replace O(n) slice iteration with O(1) map lookup
- Move s3TablesActionsMap to package level
- Avoid recreating the map on every function call
- Improves performance for request validation

* s3tables: implement permission checking and authorization

- Add permissions.go with permission definitions and checks
- Define permissions for all 21 S3 Tables operations
- Add permission checking helper functions
- Add getPrincipalFromRequest to extract caller identity
- Implement access control in CreateTableBucket, GetTableBucket, DeleteTableBucket
- Return 403 Forbidden for unauthorized operations
- Only bucket owner can perform operations (extensible for future policies)
- Add AuthError type for authorization failures

* workflow: fix s3 tables tests path and working directory

The workflow was failing because it was running inside 'weed' directory,
but the tests are at the repository root. Removed working-directory
default and updated relative paths to weed source.

* workflow: remove emojis from echo statements

* test: format s3tables client.go

* workflow: fix go install path to ./weed

* ci: fail s3 tables tests if any command in pipeline fails

* s3tables: use path.Join for path construction and align namespace paths

* s3tables: improve integration test stability and error reporting

* s3tables: propagate request context to filer operations

* s3tables: clean up unused code and improve error response formatting

* Refine S3 Tables implementation to address code review feedback

- Standardize namespace representation to []string
- Improve listing logic with pagination and StartFromFileName
- Enhance error handling with sentinel errors and robust checks
- Add JSON encoding error logging
- Fix CI workflow to use gofmt -l
- Standardize timestamps in directory creation
- Validate single-level namespaces

* s3tables: further refinements to filer operations and utilities

- Add multi-segment namespace support to ARN parsing
- Refactor permission checking to use map lookup
- Wrap lookup errors with ErrNotFound in filer operations
- Standardize splitPath to use path package

* test: improve S3 Tables client error handling and cleanup

- Add detailed error reporting when decoding failure responses
- Remove orphaned comments and unused sections

* command: implement graceful shutdown for mini cluster

- Introduce MiniClusterCtx to coordinate shutdown across mini services
- Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation
- Ensure all resources are cleaned up properly during test teardown
- Integrate MiniClusterCtx in s3tables integration tests

* s3tables: fix pagination and enhance error handling in list/delete operations

- Fix InclusiveStartFrom logic to ensure exclusive start on continued pages
- Prevent duplicates in bucket, namespace, and table listings
- Fail fast on listing errors during bucket and namespace deletion
- Stop swallowing errors in handleListTables and return proper HTTP error responses

* s3tables: align ARN formatting and optimize resource handling

- Update generateTableARN to match AWS S3 Tables specification
- Move defer r.Body.Close() to follow standard Go patterns
- Remove unused generateNamespaceARN helper

* command: fix stale error variable logging in filer serving goroutines

- Use local 'err' variable instead of stale 'e' from outer scope
- Applied to both TLS and non-TLS paths for local listener

* s3tables: implement granular authorization and refine error responses

- Remove mandatory ACTION_ADMIN at the router level
- Enforce granular permissions in bucket and namespace handlers
- Prioritize AccountID in ExtractPrincipalFromContext for ARN matching
- Distinguish between 404 (NoSuchBucket) and 500 (InternalError) in metadata lookups
- Clean up unused imports in s3api_tables.go

* test: refactor S3 Tables client for DRYness and multi-segment namespaces

- Implement doRequestAndDecode to eliminate HTTP boilerplate
- Update client API to accept []string for namespaces to support hierarchy
- Standardize error response decoding across all client methods

* test: update integration tests to match refactored S3 Tables client

- Pass namespaces as []string to support hierarchical structures
- Adapt test calls to new client API signatures

* s3tables: normalize filer errors and use standard helpers

- Migrate from custom ErrNotFound to filer_pb.ErrNotFound
- Use filer_pb.LookupEntry for automatic error normalization
- Normalize entryExists and attribute lookups

* s3tables: harden namespace validation and correct ARN parsing

- Prohibit path traversal (".", "..") and "/" in namespaces
- Restrict namespace characters to [a-z0-9_] for consistency
- Switch to url.PathUnescape for correct decoding of ARN path components
- Align ARN parsing regex with single-segment namespace validation

* s3tables: improve robustness, security, and error propagation in handlers

- Implement strict table name validation (prevention of path traversal and character enforcement)
- Add nil checks for entry.Entry in all listing loops to prevent panics
- Propagate backend errors instead of swallowing them or assuming 404
- Correctly map filer_pb.ErrNotFound to appropriate S3 error codes
- Standardize existence checks across bucket, namespace, and table handlers

* test: add miniClusterMutex to prevent race conditions

- Introduce sync.Mutex to protect global state (os.Args, os.Chdir)
- Ensure serialized initialization of the mini cluster runner
- Fix intermittent race conditions during parallel test execution

* s3tables: improve error handling and permission logic

- Update handleGetNamespace to distinguish between 404 and 500 errors
- Refactor CanManagePolicy to use CheckPermission for consistent enforcement
- Ensure empty identities are correctly handled in policy management checks

* s3tables: optimize regex usage and improve version token uniqueness

- Pre-compile regex patterns as package-level variables to avoid re-compilation overhead on every call
- Add a random component to version token generation to reduce collision probability under high concurrency

* s3tables: harden auth and error handling

- Add authorization checks to all S3 Tables handlers (policy, table ops) to enforce security
- Improve error handling to distinguish between NotFound (404) and InternalError (500)
- Fix directory FileMode usage in filer_ops
- Improve test randomness for version tokens
- Update permissions comments to acknowledge IAM gaps

* S3 Tables: fix gRPC stream loop handling for list operations

- Correctly handle io.EOF to terminate loops gracefully.
- Propagate other errors to prevent silent failures.
- Ensure all list results are processed effectively.

* S3 Tables: validate ARN namespace to prevent path traversal

- Enforce validation on decoded namespace in parseTableFromARN.
- Ensures path components are safe after URL unescaping.

* S3 Tables: secure API router with IAM authentication

- Wrap S3 Tables handler with authenticateS3Tables.
- Use AuthSignatureOnly to enforce valid credentials while delegating granular authorization to handlers.
- Prevent anonymous access to all S3 Tables endpoints.

* S3 Tables: fix gRPC stream loop handling in namespace handlers

- Correctly handle io.EOF in handleListNamespaces and handleDeleteNamespace.
- Propagate other errors to prevent silent failures or accidental data loss.
- Added necessary io import.

* S3 Tables: use os.ModeDir constant in filer_ops.go

- Replace magic number 1<<31 with os.ModeDir for better readability.
- Added necessary os import.

* s3tables: improve principal extraction using identity context

* s3tables: remove duplicate comment in permissions.go

* s3tables test: improve error reporting on decoding failure

* s3tables: implement validateTableName helper

* s3tables: add table name validation and 404 propagation to policy handlers

* s3tables: add table name validation and cleanup duplicated logic in table handlers

* s3tables: ensure root tables directory exists before bucket creation

* s3tables: implement token-based pagination for table buckets listing

* s3tables: implement token-based pagination for namespace listing

* s3tables: refine permission helpers to align with operation names

* s3tables: return 404 in handleDeleteNamespace if namespace not found

* s3tables: fix cross-namespace pagination in listTablesInAllNamespaces

* s3tables test: expose pagination parameters in client list methods

* s3tables test: update integration tests for new client API

* s3tables: use crypto/rand for secure version token generation

Replaced math/rand with crypto/rand to ensure version tokens are
cryptographically secure and unpredictable for optimistic concurrency control.

* s3tables: improve account ID handling and define missing error codes

Updated getPrincipalFromRequest to prioritize X-Amz-Account-ID header and
added getAccountID helper. Defined ErrVersionTokenMismatch and ErrCodeConflict
for better optimistic concurrency support.

* s3tables: update bucket handlers for multi-account support

Ensured bucket ownership is correctly attributed to the authenticated
account ID and updated ARNs to use the request-derived account ID. Added
standard S3 existence checks for bucket deletion.

* s3tables: update namespace handlers for multi-account support

Updated namespace creation to use authenticated account ID for ownership
and unified permission checks across all namespace operations to use the
correct account principal.

* s3tables: implement optimistic concurrency for table deletion

Added VersionToken validation to handleDeleteTable. Refactored table
listing to use request context for accurate ARN generation and fixed
cross-namespace pagination issues.

* s3tables: improve resource resolution and error mapping for policies and tagging

Refactored resolveResourcePath to return resource type, enabling accurate
NoSuchBucket vs NoSuchTable error codes. Added existence checks before
deleting policies.

* s3tables: enhance test robustness and resilience

Updated random string generation to use crypto/rand in s3tables tests.
Increased resilience of IAM distributed tests by adding "connection refused"
to retryable errors.

* s3tables: remove legacy principal fallback header

Removed the fallback to X-Amz-Principal in getPrincipalFromRequest as
S3 Tables is a new feature and does not require legacy header support.

* s3tables: remove unused ExtractPrincipalFromContext function

Removed the unused ExtractPrincipalFromContext utility and its
accompanying iam/utils import to keep the new s3tables codebase clean.

* s3tables: allow hyphens in namespace and table names

Relaxed regex validation in utils.go to support hyphens in S3 Tables
namespaces and table names, improving consistency with S3 bucket naming
and allowing derived names from services like S3 Storage Lens.

* s3tables: add isAuthError helper to handler.go

* s3tables: refactor permission checks to use resource owner in bucket handlers

* s3tables: refactor permission checks to use resource owner in namespace handlers

* s3tables: refactor permission checks to use resource owner in table handlers

* s3tables: refactor permission checks to use resource owner in policy and tagging handlers

* ownerAccountID

* s3tables: implement strict AWS-aligned name validation for buckets, namespaces, and tables

* s3tables: enforce strict resource ownership and implement result filtering for buckets

* s3tables: enforce strict resource ownership and implement result filtering for namespaces

* s3tables: enforce strict resource ownership and implement result filtering for tables

* s3tables: align getPrincipalFromRequest with account ID for IAM compatibility

* s3tables: fix inconsistent permission check in handleCreateTableBucket

* s3tables: improve pagination robustness and error handling in table listing handlers

* s3tables: refactor handleDeleteTableBucket to use strongly typed AuthError

* s3tables: align ARN regex patterns with S3 standards and refactor to constants

* s3tables: standardize access denied errors using ErrAccessDenied constant

* go fmt

* s3tables: fix double-write issue in handleListTables

Remove premature HTTP error writes from within WithFilerClient closure
to prevent duplicate status code responses. Error handling is now
consistently performed at the top level using isAuthError.

* s3tables: update bucket name validation message

Remove "underscores" from error message to accurately reflect that
bucket names only allow lowercase letters, numbers, and hyphens.

* s3tables: add table policy test coverage

Add comprehensive test coverage for table policy operations:
- Added PutTablePolicy, GetTablePolicy, DeleteTablePolicy methods to test client
- Implemented testTablePolicy lifecycle test validating Put/Get/Delete operations
- Verified error handling for missing policies

* follow aws spec

* s3tables: add request body size limiting

Add request body size limiting (10MB) to readRequestBody method:
- Define maxRequestBodySize constant to prevent unbounded reads
- Use io.LimitReader to enforce size limit
- Add explicit error handling for oversized requests
- Prevents potential DoS attacks via large request bodies

* S3 Tables API now properly enforces resource policies

addressing the critical security gap where policies were created but never evaluated.

* s3tables: Add upper bound validation for MaxTables parameter

MaxTables is user-controlled and influences gRPC ListEntries limits via
uint32(maxTables*2). Without an upper bound, very large values can overflow
uint32 or cause excessively large directory scans. Cap MaxTables to 1000 and
return InvalidRequest for out-of-range values, consistent with S3 MaxKeys
handling.

* s3tables: Add upper bound validation for MaxBuckets parameter

MaxBuckets is user-controlled and used in uint32(maxBuckets*2) for ListEntries.
Very large values can overflow uint32 or trigger overly expensive scans. Cap
MaxBuckets to 1000 and reject out-of-range values, consistent with MaxTables
handling and S3 MaxKeys validation elsewhere in the codebase.

* s3tables: Validate bucket name in parseBucketNameFromARN()

Enforce the same bucket name validation rules (length, characters, reserved
prefixes/suffixes) when extracting from ARN. This prevents accepting ARNs
that the system would never create and ensures consistency with
CreateTableBucket validation.

* s3tables: Fix parseTableFromARN() namespace and table name validation

- Remove dead URL unescape for namespace (regex [a-z0-9_]+ cannot contain
  percent-escapes)
- Add URL decoding and validation of extracted table name via
  validateTableName() to prevent callers from bypassing request validation
  done in other paths

* s3tables: Rename tableMetadataInternal.Schema to Metadata

The field name 'Schema' was confusing given it holds a *TableMetadata struct
and serializes as 'metadata' in JSON. Rename to 'Metadata' for clarity and
consistency with the JSON tag and intended meaning.

* s3tables: Improve bucket name validation error message

Replace misleading character-only error message with generic 'invalid bucket
name'. The isValidBucketName() function checks multiple constraints beyond
character set (length, reserved prefixes/suffixes, start/end rules), so a
specific character message is inaccurate.

* s3tables: Separate permission checks for tagging and untagging

- Add CanTagResource() to check TagResource permission
- Add CanUntagResource() to check UntagResource permission
- Update CanManageTags() to check both operations (OR logic)

This prevents UntagResource from incorrectly checking 'ManageTags' permission
and ensures each operation validates the correct permission when per-operation
permissions are enforced.

* s3tables: Consolidate getPrincipalFromRequest and getAccountID into single method

Both methods had identical implementations - they return the account ID from
request header or fall back to handler's default. Remove the duplicate
getPrincipalFromRequest and use getAccountID throughout, with updated comment
explaining its dual role as both caller identity and principal for permission
checks.

* s3tables: Fetch bucket policy in handleListTagsForResource for permission evaluation

Update handleListTagsForResource to fetch and pass bucket policy to
CheckPermission, matching the behavior of handleTagResource/handleUntagResource.
This enables bucket-policy-based permission grants to be evaluated for
ListTagsForResource, not just ownership-based checks.

* s3tables: Extract resource owner and bucket extraction into helper method

Create extractResourceOwnerAndBucket() helper to consolidate the repeated pattern
of unmarshaling metadata and extracting bucket name from resource path. This
pattern was duplicated in handleTagResource, handleListTagsForResource, and
handleUntagResource. Update all three handlers to use the helper.

Also update remaining uses of getPrincipalFromRequest() (in handler_bucket_create,
handler_bucket_get_list_delete, handler_namespace) to use getAccountID() after
consolidating the two identical methods.

* s3tables: Add log message when cluster shutdown times out

The timeout path (2 second wait for graceful shutdown) was silent. Add a
warning log message when it occurs to help diagnose flaky test issues and
indicate when the mini cluster didn't shut down cleanly.

* s3tables: Use policy_engine wildcard matcher for complete IAM compatibility

Replace the custom suffix-only wildcard implementation in matchesActionPattern
and matchesPrincipal with the policy_engine.MatchesWildcard function from
PR #8052. This enables full wildcard support including:

- Middle wildcards: s3tables:Get*Table matches GetTable
- Question mark wildcards: Get? matches any single character
- Combined patterns: s3tables:*Table* matches any action containing 'Table'

Benefits:
- Code reuse: eliminates duplicate wildcard logic
- Complete IAM compatibility: supports all AWS wildcard patterns
- Performance: uses efficient O(n) backtracking algorithm
- Consistency: same wildcard behavior across S3 API and S3 Tables

Add comprehensive unit tests covering exact matches, suffix wildcards,
middle wildcards, question marks, and combined patterns for both action
and principal matching.

* go fmt

* s3tables: Fix vet error - remove undefined c.t reference in Stop()

The TestCluster.Stop() method doesn't have access to testing.T object.
Remove the log statement and keep the timeout handling comment for clarity.
The original intent (warning about shutdown timeout) is still captured in
the code comment explaining potential issues.

* clean up

* s3tables: Add t field to TestCluster for logging

Add *testing.T field to TestCluster struct and initialize it in
startMiniCluster. This allows Stop() to properly log warnings when
cluster shutdown times out. Includes the t field in the test cluster
initialization and restores the logging statement in Stop().

* s3tables: Fix bucket policy error handling in permission checks

Replace error-swallowing pattern where all errors from getExtendedAttribute
were ignored for bucket policy reads. Now properly distinguish between:

- ErrAttributeNotFound: Policy not found is expected; continue with empty policy
- Other errors: Return internal server error and stop processing

Applied fix to all bucket policy reads in:
- handleDeleteTableBucketPolicy (line 220)
- handleTagResource (line 313)
- handleUntagResource (line 405)
- handleListTagsForResource (line 488)
- And additional occurrences in closures

This prevents silent failures and ensures policy-related errors are surfaced
to callers rather than being silently ignored.

* s3tables: Pre-validate namespace to return 400 instead of 500

Move validateNamespace call outside of filerClient.WithFilerClient closure
so that validation errors return HTTP 400 (InvalidRequest) instead of 500
(InternalError).

Before: Validation error inside closure → treated as internal error → 500
After: Validation error before closure → handled as bad request → 400

This provides correct error semantics: namespace validation is an input
validation issue, not a server error.

* Update weed/s3api/s3tables/handler.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* s3tables: Normalize action names to include service prefix

Add automatic normalization of operations to full IAM-style action names
(e.g., 's3tables:CreateTableBucket') in CheckPermission(). This ensures
policy statements using prefixed actions (s3tables:*) correctly match
operations evaluated by permission helpers.

Also fixes incorrect r.Context() passed to GetIdentityNameFromContext
which expects *http.Request. Now passes r directly.

* s3tables: Use policy framework for table creation authorization

Replace strict ownership check in CreateTable with policy-based authorization.
Now checks both namespace and bucket policies for CreateTable permission,
allowing delegation via resource policies while still respecting owner bypass.

Authorization logic:
- Namespace policy grants CreateTable → allowed
- Bucket policy grants CreateTable → allowed
- Otherwise → denied (even if same owner)

This enables cross-principal table creation via policies while maintaining
security through explicit allow/deny semantics.

* s3tables: Use policy framework for GetTable authorization

Replace strict ownership check with policy-based authorization in GetTable.
Now checks both table and bucket policies for GetTable permission, allowing
authorized non-owners to read table metadata.

Authorization logic:
- Table policy grants GetTable → allowed
- Bucket policy grants GetTable → allowed
- Otherwise → 404 NotFound (no access disclosed)

Maintains security through policy evaluation while enabling read delegation.

* s3tables: Generate ARNs using resource owner account ID

Change ARN generation to use resource OwnerAccountID instead of caller
identity (h.getAccountID(r)). This ensures ARNs are stable and consistent
regardless of which principal accesses the resource.

Updated generateTableBucketARN and generateTableARN function signatures
to accept ownerAccountID parameter. All call sites updated to pass the
resource owner's account ID from metadata.

This prevents ARN inconsistency issues when multiple principals have
access to the same resource via policies.

* s3tables: Fix remaining policy error handling in namespace and bucket handlers

Replace silent error swallowing (err == nil) with proper error distinction
for bucket policy reads. Now properly checks ErrAttributeNotFound and
propagates other errors as internal server errors.

Fixed 5 locations:
- handleCreateNamespace (policy fetch)
- handleDeleteNamespace (policy fetch)
- handleListNamespaces (policy fetch)
- handleGetNamespace (policy fetch)
- handleGetTableBucket (policy fetch)

This prevents masking of filer issues when policies cannot be read due
to I/O errors or other transient failures.

* ci: Pin GitHub Actions to commit SHAs for s3-tables-tests

Update all action refs to use pinned commit SHAs instead of floating tags:
- actions/checkout: @v6 → @8e8c483 (v4)
- actions/setup-go: @v6 → @0c52d54 (v5)
- actions/upload-artifact: @v6 → @65d8626 (v4)

Pinned SHAs improve reproducibility and reduce supply chain risk by
preventing accidental or malicious changes in action releases. Aligns
with repository conventions used in other workflows (e.g., go.yml).

* s3tables: Add resource ARN validation to policy evaluation

Implement resource-specific policy validation to prevent over-broad
permission grants. Add matchesResource and matchesResourcePattern functions
to validate statement Resource fields against specific resource ARNs.

Add new CheckPermissionWithResource function that includes resource ARN
validation, while keeping CheckPermission unchanged for backward compatibility.

This enables policies to grant access to specific resources only:
- statements with Resource: "arn:aws:s3tables:...:bucket/specific-bucket/*"
  will only match when accessing that specific bucket
- statements without Resource field match all resources (implicit *)
- resource patterns support wildcards (* for any sequence, ? for single char)

For future use: Handlers can call CheckPermissionWithResource with the
target resource ARN to enforce resource-level access control.

* Revert "ci: Pin GitHub Actions to commit SHAs for s3-tables-tests"

This reverts commit 01da26fbcb.

* s3tables: Remove duplicate bucket extraction logic in helper

Move bucket name extraction outside the if/else block in
extractResourceOwnerAndBucket since the logic is identical for both
ResourceTypeTable and ResourceTypeBucket cases. This reduces code
duplication and improves maintainability.

The extraction pattern (parts[1] from /tables/{bucket}/...) works for
both resource types, so it's now performed once before the type-specific
metadata unmarshaling.

* go fmt

* s3tables: Fix ownership consistency across handlers

Address three related ownership consistency issues:

1. CreateNamespace now sets OwnerAccountID to bucketMetadata.OwnerAccountID
   instead of request principal. This prevents namespaces created by
   delegated callers (via bucket policy) from becoming unmanageable, since
   ListNamespaces filters by bucket owner.

2. CreateTable now:
   - Fetches bucket metadata to use correct owner for bucket policy evaluation
   - Uses namespaceMetadata.OwnerAccountID for namespace policy checks
   - Uses bucketMetadata.OwnerAccountID for bucket policy checks
   - Sets table OwnerAccountID to namespaceMetadata.OwnerAccountID (inherited)

3. GetTable now:
   - Fetches bucket metadata to use correct owner for bucket policy evaluation
   - Uses metadata.OwnerAccountID for table policy checks
   - Uses bucketMetadata.OwnerAccountID for bucket policy checks

This ensures:
- Bucket owner retains implicit "owner always allowed" behavior even when
  evaluating bucket policies
- Ownership hierarchy is consistent (namespace owned by bucket, table owned by namespace)
- Cross-principal delegation via policies doesn't break ownership chains

* s3tables: Fix ListTables authorization and policy parsing

Make ListTables authorization consistent with GetTable/CreateTable:

1. ListTables authorization now evaluates policies instead of owner-only checks:
   - For namespace listing: checks namespace policy AND bucket policy
   - For bucket-wide listing: checks bucket policy
   - Uses CanListTables permission framework

2. Remove owner-only filter in listTablesWithClient that prevented policy-based
   sharing of tables. Authorization is now enforced at the handler level, so all
   tables in the namespace/bucket are returned to authorized callers (who have
   access either via ownership or policy).

3. Add flexible PolicyDocument.UnmarshalJSON to support both single-object and
   array forms of Statement field:
   - Handles: {"Statement": {...}}
   - Handles: {"Statement": [{...}, {...}]}
   - Improves AWS IAM compatibility

This ensures cross-account table listing works when delegated via bucket/namespace
policies, consistent with the authorization model for other operations.

* go fmt

* s3tables: Separate table name pattern constant for clarity

Define a separate tableNamePatternStr constant for the table name component in
the ARN regex, even though it currently has the same value as
tableNamespacePatternStr. This improves code clarity and maintainability, making
it easier to modify if the naming rules for tables and namespaces diverge in the
future.

* refactor

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2026-01-28 19:39:48 -08:00

.github

feat: Add S3 Tables support for Iceberg tabular data (#8147 )

2026-01-28 19:39:48 -08:00

docker

Fix jwt error in admin UI (#8140 )

2026-01-27 17:27:02 -08:00

k8s/charts

Update Helm hook annotations for post-install and upgrade (#8150 )

2026-01-28 13:08:20 -08:00

note

Update Wiki images (#8069 )

2026-01-20 14:12:14 -08:00

other

chore(deps-dev): bump org.assertj:assertj-core from 3.24.2 to 3.27.7 in /other/java/s3copier (#8129 )

2026-01-26 14:45:01 -08:00

postgres-examples

Message Queue: Add sql querying (#7185 )

2025-09-09 01:01:03 -07:00

seaweedfs-rdma-sidecar

chore: execute goimports to format the code (#7983 )

2026-01-07 13:06:08 -08:00

snap

move to https://github.com/seaweedfs/seaweedfs

2022-07-29 00:17:28 -07:00

telemetry

Prevent split-brain: Persistent ClusterID and Join Validation (#8022 )

2026-01-18 14:02:34 -08:00

test

feat: Add S3 Tables support for Iceberg tabular data (#8147 )

2026-01-28 19:39:48 -08:00

unmaintained

add a nginx with ssl for easier testing

2026-01-08 09:59:31 -08:00

util

util: added gostd script

2019-04-30 03:23:20 +00:00

weed

feat: Add S3 Tables support for Iceberg tabular data (#8147 )

2026-01-28 19:39:48 -08:00

.gitignore

Fix jwt error in admin UI (#8140 )

2026-01-27 17:27:02 -08:00

backers.md

chore: add nimbus web services to backers.md (#4769 )

2023-08-20 15:31:23 -07:00

CODE_OF_CONDUCT.md

add code of conduct (#4109 )

2023-01-05 11:01:22 -08:00

go.mod

feat: Add S3 Tables support for Iceberg tabular data (#8147 )

2026-01-28 19:39:48 -08:00

go.sum

feat: Add S3 Tables support for Iceberg tabular data (#8147 )

2026-01-28 19:39:48 -08:00

LICENSE

Update LICENSE, fix copyright license year (#6405 )

2025-01-01 01:55:42 -08:00

Makefile

Remove deprecated allowEmptyFolder CLI option

2025-12-06 21:54:12 -08:00

README.md

Remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY exports

2026-01-14 13:09:11 -08:00

SeaweedFS

SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible entirely thanks to the support of these awesome backers. If you'd like to grow SeaweedFS even stronger, please consider joining our sponsors on Patreon.

Your support will be really appreciated by me and other supporters!

Quick Start

Quick Start with weed mini

The easiest way to get started with SeaweedFS for development and testing:

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe.

Example:

# remove quarantine on macOS
# xattr -d com.apple.quarantine  ./weed

./weed mini -dir=/data

This single command starts a complete SeaweedFS setup with:

Master UI: http://localhost:9333
Volume Server: http://localhost:9340
Filer UI: http://localhost:8888
S3 Endpoint: http://localhost:8333
WebDAV: http://localhost:7333
Admin UI: http://localhost:23646

Perfect for development, testing, learning SeaweedFS, and single node deployments!

Quick Start for S3 API on Docker

docker run -p 8333:8333 chrislusf/seaweedfs server -s3

Quick Start with Single Binary

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe. Or run go install github.com/seaweedfs/seaweedfs/weed@latest.
export AWS_ACCESS_KEY_ID=admin ; export AWS_SECRET_ACCESS_KEY=key as the admin credentials to access the object store.
Run weed server -dir=/some/data/dir -s3 to start one master, one volume server, one filer, and one S3 gateway. The difference with weed mini is that weed mini can auto configure based on the single host environment, while weed server requires manual configuration and are designed for production use.

Also, to increase capacity, just add more volume servers by running weed volume -dir="/some/data/dir2" -master="<master_host>:9333" -port=8081 locally, or on a different machine, or on thousands of machines. That is it!

Introduction

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:

to store billions of files!
to serve the files fast!

SeaweedFS started as a blob store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation).

There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.

SeaweedFS started by implementing Facebook's Haystack design paper. Also, SeaweedFS implements erasure coding with ideas from f4: Facebook’s Warm BLOB Storage System, and has a lot of similarities with Facebook’s Tectonic Filesystem and Google's Colossus File System

On top of the blob store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, YDB, etc.

SeaweedFS can transparently integrate with the cloud. With hot data on local cluster, and warm data on the cloud with O(1) access time, SeaweedFS can achieve both fast local access time and elastic cloud storage capacity. What's more, the cloud storage access API cost is minimized. Faster and cheaper than direct cloud storage!

System	File Metadata	File Content Read	POSIX	REST API	Optimized for large number of small files
SeaweedFS	lookup volume id, cacheable	O(1) disk seek		Yes	Yes
SeaweedFS Filer	Linearly Scalable, Customizable	O(1) disk seek	FUSE	Yes	Yes
GlusterFS	hashing		FUSE, NFS
Ceph	hashing + rules		FUSE	Yes
MooseFS	in memory		FUSE		No
MinIO	separate meta file for each file			Yes	No

SeaweedFS	comparable to Ceph	advantage
Master	MDS	simpler
Volume	OSD	optimized for small files
Filer	Ceph FS	linearly scalable, Customizable, O(1) or O(logN)

README.md Unescape Escape

SeaweedFS

Sponsor SeaweedFS via Patreon

Gold Sponsors

Table of Contents

Quick Start

Quick Start with weed mini

Quick Start for S3 API on Docker

Quick Start with Single Binary

Introduction

Features

Additional Blob Store Features

Filer Features

Kubernetes

Example: Using Seaweed Blob Store

Start Master Server

Start Volume Servers

Write A Blob

Save Blob Id

Read a Blob

Rack-Aware and Data Center-Aware Replication

Allocate Blob Key on Specific Data Center

Other Features

Blob Store Architecture

Master Server and Volume Server

Write and Read files

Saving memory

Tiered Storage to the cloud

SeaweedFS Filer

Compared to Other File Systems

Compared to HDFS

Compared to GlusterFS, Ceph

Compared to GlusterFS

Compared to MooseFS

Compared to Ceph

Compared to MinIO

Dev Plan

Installation Guide

Disk Related Topics

Hard Drive Performance

Solid State Disk

Benchmark

Run WARP and launch a mixed benchmark.

Enterprise

License

Stargazers over time

README.md