Commit Graph

12790 Commits

Author SHA1 Message Date
Yalın Doğu Şahin
ef3b5f7efa helm/add iceberg rest catalog ingress for s3 (#8205)
* helm: add Iceberg REST catalog support to S3 service

* helm: add Iceberg REST catalog support to S3 service

* add ingress for iceberg catalog endpoint

* helm: conditionally render ingressClassName in s3-iceberg-ingress.yaml

* helm: refactor s3-iceberg-ingress.yaml to use named template for paths

* helm: remove unused $serviceName variable in s3-iceberg-ingress.yaml

---------

Co-authored-by: yalin.sahin <yalin.sahin@tradition.ch>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-04 12:00:59 -08:00
Chris Lu
bd4e7ff14e command: fix s3 panic in filer command (#8208)
command: fix s3 panic in filer command due to uninitialized options

This fixes a nil pointer dereference panic when starting the S3 server
via the `weed filer` command by correctly initializing `iamReadOnly`
and `portIceberg` flags.

Relates to #8200
2026-02-04 10:37:20 -08:00
Chris Lu
72a8f598f2 Fix Maintenance Task Sorting and Refactor Log Persistence (#8199)
* fix float stepping

* do not auto refresh

* only logs when non 200 status

* fix maintenance task sorting and cleanup redundant handler logic

* Refactor log retrieval to persist to disk and fix slowness

- Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail
- Implement background log fetching on task completion in worker_grpc_server.go
- Implement async background refresh for in-progress tasks
- Completely remove blocking gRPC calls from the UI path to fix 10s timeouts
- Cleanup debug logs and performance profiling code

* Ensure consistent deterministic sorting in config_persistence cleanup

* Replace magic numbers with constants and remove debug logs

- Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go
- Replaced magic numbers with these constants throughout the codebase
- Verified removal of stdout debug printing
- Ensured consistent truncation logic during log persistence

* Address code review feedback on history truncation and logging logic

- Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail
- Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs)
- Remove unused Timeout field from LogRequestContext and sync select timeouts with constants
- Ensure AssignmentHistory is only provided in the top-level field for better JSON structure

* Implement goroutine leak protection and request deduplication

- Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task
- Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map
- Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming
- Ensure all persistent log-fetching goroutines are bounded and efficiently managed

* Fix potential nil pointer panics in maintenance handlers

- Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig
- Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized
- Ensure internal helper methods consistently check for adminServer initialization before use

* Strictly enforce disk-only log reading

- Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view
- Remove unused lastLogFetch tracking fields to clean up dead code
- Ensure logs are only updated upon task completion via handleTaskCompletion

* Refactor GetWorkerLogs to read from disk

- Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs
- Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors
- Ensure consistent log retrieval behavior across the application (disk-only)

* Fix timestamp parsing in log viewer

- Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps
- Fix "Invalid time value" error when displaying logs fetched from disk
- Regenerate templates

* master: fallback to HDD if SSD volumes are full in Assign

* worker: improve EC detection logging and fix skip counters

* worker: add Sync method to TaskLogger interface

* worker: implement Sync and ensure logs are flushed before task completion

* admin: improve task log retrieval with retries and better timeouts

* admin: robust timestamp parsing in task detail view
2026-02-04 08:48:55 -08:00
Chris Lu
2ff1cd9fc9 format 2026-02-03 18:39:01 -08:00
Chris Lu
5a5cc38692 4.09 2026-02-03 17:56:25 -08:00
Chris Lu
f66a23b472 Fix: filer not yet available in s3.configure (#8198)
* Fix: Initialize filer CredentialManager with filer address

* The fix involves checking for directory existence before creation.

* adjust error message

* Fix: Implement FilerAddressSetter in PropagatingCredentialStore

* Refactor: Reorder credential manager initialization in filer server

* refactor
2026-02-03 17:43:58 -08:00
Chris Lu
b244bb58aa s3tables: redesign Iceberg REST Catalog using iceberg-go and automate integration tests (#8197)
* full integration with iceberg-go

* Table Commit Operations (handleUpdateTable)

* s3tables: fix Iceberg v2 compliance and namespace properties

This commit ensures SeaweedFS Iceberg REST Catalog is compliant with
Iceberg Format Version 2 by:
- Using iceberg-go's table.NewMetadataWithUUID for strict v2 compliance.
- Explicitly initializing namespace properties to empty maps.
- Removing omitempty from required Iceberg response fields.
- Fixing CommitTableRequest unmarshaling using table.Requirements and table.Updates.

* s3tables: automate Iceberg integration tests

- Added Makefile for local test execution and cluster management.
- Added docker-compose for PyIceberg compatibility kit.
- Added Go integration test harness for PyIceberg.
- Updated GitHub CI to run Iceberg catalog tests automatically.

* s3tables: update PyIceberg test suite for compatibility

- Updated test_rest_catalog.py to use latest PyIceberg transaction APIs.
- Updated Dockerfile to include pyarrow and pandas dependencies.
- Improved namespace and table handling in integration tests.

* s3tables: address review feedback on Iceberg Catalog

- Implemented robust metadata version parsing and incrementing.
- Ensured table metadata changes are persisted during commit (handleUpdateTable).
- Standardized namespace property initialization for consistency.
- Fixed unused variable and incorrect struct field build errors.

* s3tables: finalize Iceberg REST Catalog and optimize tests

- Implemented robust metadata versioning and persistence.
- Standardized namespace property initialization.
- Optimized integration tests using pre-built Docker image.
- Added strict property persistence validation to test suite.
- Fixed build errors from previous partial updates.

* Address PR review: fix Table UUID stability, implement S3Tables UpdateTable, and support full metadata persistence individually

* fix: Iceberg catalog stable UUIDs, metadata persistence, and file writing

- Ensure table UUIDs are stable (do not regenerate on load).
- Persist full table metadata (Iceberg JSON) in s3tables extended attributes.
- Add `MetadataVersion` to explicitly track version numbers, replacing regex parsing.
- Implement `saveMetadataFile` to persist metadata JSON files to the Filer on commit.
- Update `CreateTable` and `UpdateTable` handlers to use the new logic.

* test: bind weed mini to 0.0.0.0 in integration tests to fix Docker connectivity

* Iceberg: fix metadata handling in REST catalog

- Add nil guard in createTable
- Fix updateTable to correctly load existing metadata from storage
- Ensure full metadata persistence on updates
- Populate loadTable result with parsed metadata

* S3Tables: add auth checks and fix response fields in UpdateTable

- Add CheckPermissionWithContext to UpdateTable handler
- Include TableARN and MetadataLocation in UpdateTable response
- Use ErrCodeConflict (409) for version token mismatches

* Tests: improve Iceberg catalog test infrastructure and cleanup

- Makefile: use PID file for precise process killing
- test_rest_catalog.py: remove unused variables and fix f-strings

* Iceberg: fix variable shadowing in UpdateTable

- Rename inner loop variable `req` to `requirement` to avoid shadowing outer request variable

* S3Tables: simplify MetadataVersion initialization

- Use `max(req.MetadataVersion, 1)` instead of anonymous function

* Tests: remove unicode characters from S3 tables integration test logs

- Remove unicode checkmarks from test output for cleaner logs

* Iceberg: improve metadata persistence robustness

- Fix MetadataLocation in LoadTableResult to fallback to generated location
- Improve saveMetadataFile to ensure directory hierarchy existence and robust error handling
2026-02-03 15:30:04 -08:00
Yalın Doğu Şahin
47fc9e771f helm: add Iceberg REST catalog support to S3 service (#8193)
* helm: add Iceberg REST catalog support to S3 service

* helm: add Iceberg REST catalog support to S3 service

---------

Co-authored-by: yalin.sahin <yalin.sahin@tradition.ch>
2026-02-03 13:44:52 -08:00
dependabot[bot]
f01876e051 build(deps): bump bytes from 1.10.1 to 1.11.1 in /seaweedfs-rdma-sidecar/rdma-engine (#8195)
build(deps): bump bytes in /seaweedfs-rdma-sidecar/rdma-engine

Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.10.1 to 1.11.1.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.10.1...v1.11.1)

---
updated-dependencies:
- dependency-name: bytes
  dependency-version: 1.11.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 13:40:42 -08:00
Chris Lu
1274cf038c s3: enforce authentication and JSON error format for Iceberg REST Catalog (#8192)
* s3: enforce authentication and JSON error format for Iceberg REST Catalog

* s3/iceberg: align error exception types with OpenAPI spec examples

* s3api: refactor AuthenticateRequest to return identity object

* s3/iceberg: propagate full identity object to request context

* s3/iceberg: differentiate NotAuthorizedException and ForbiddenException

* s3/iceberg: reject requests if authenticator is nil to prevent auth bypass

* s3/iceberg: refactor Auth middleware to build context incrementally and use switch for error mapping

* s3api: update misleading comment for authRequestWithAuthType

* s3api: return ErrAccessDenied if IAM is not configured to prevent auth bypass

* s3/iceberg: optimize context update in Auth middleware

* s3api: export CanDo for external authorization use

* s3/iceberg: enforce identity-based authorization in all API handlers

* s3api: fix compilation errors by updating internal CanDo references

* s3/iceberg: robust identity validation and consistent action usage in handlers

* s3api: complete CanDo rename across tests and policy engine integration

* s3api: fix integration tests by allowing admin access when auth is disabled and explicit gRPC ports

* duckdb

* create test bucket
2026-02-03 11:55:12 -08:00
Chris Lu
746df25164 fix formatting 2026-02-03 00:27:20 -08:00
Chris Lu
7169fe585d Delete verify_gc_empty_test.go 2026-02-03 00:20:38 -08:00
Chris Lu
1c1f80358f templ 2026-02-03 00:10:43 -08:00
Chris Lu
2bb21ea276 feat: Add Iceberg REST Catalog server and admin UI (#8175)
* feat: Add Iceberg REST Catalog server

Implement Iceberg REST Catalog API on a separate port (default 8181)
that exposes S3 Tables metadata through the Apache Iceberg REST protocol.

- Add new weed/s3api/iceberg package with REST handlers
- Implement /v1/config endpoint returning catalog configuration
- Implement namespace endpoints (list/create/get/head/delete)
- Implement table endpoints (list/create/load/head/delete/update)
- Add -port.iceberg flag to S3 standalone server (s3.go)
- Add -s3.port.iceberg flag to combined server mode (server.go)
- Add -s3.port.iceberg flag to mini cluster mode (mini.go)
- Support prefix-based routing for multiple catalogs

The Iceberg REST server reuses S3 Tables metadata storage under
/table-buckets and enables DuckDB, Spark, and other Iceberg clients
to connect to SeaweedFS as a catalog.

* feat: Add Iceberg Catalog pages to admin UI

Add admin UI pages to browse Iceberg catalogs, namespaces, and tables.

- Add Iceberg Catalog menu item under Object Store navigation
- Create iceberg_catalog.templ showing catalog overview with REST info
- Create iceberg_namespaces.templ listing namespaces in a catalog
- Create iceberg_tables.templ listing tables in a namespace
- Add handlers and routes in admin_handlers.go
- Add Iceberg data provider methods in s3tables_management.go
- Add Iceberg data types in types.go

The Iceberg Catalog pages provide visibility into the same S3 Tables
data through an Iceberg-centric lens, including REST endpoint examples
for DuckDB and PyIceberg.

* test: Add Iceberg catalog integration tests and reorg s3tables tests

- Reorganize existing s3tables tests to test/s3tables/table-buckets/
- Add new test/s3tables/catalog/ for Iceberg REST catalog tests
- Add TestIcebergConfig to verify /v1/config endpoint
- Add TestIcebergNamespaces to verify namespace listing
- Add TestDuckDBIntegration for DuckDB connectivity (requires Docker)
- Update CI workflow to use new test paths

* fix: Generate proper random UUIDs for Iceberg tables

Address code review feedback:
- Replace placeholder UUID with crypto/rand-based UUID v4 generation
- Add detailed TODO comments for handleUpdateTable stub explaining
  the required atomic metadata swap implementation

* fix: Serve Iceberg on localhost listener when binding to different interface

Address code review feedback: properly serve the localhost listener
when the Iceberg server is bound to a non-localhost interface.

* ci: Add Iceberg catalog integration tests to CI

Add new job to run Iceberg catalog tests in CI, along with:
- Iceberg package build verification
- Iceberg unit tests
- Iceberg go vet checks
- Iceberg format checks

* fix: Address code review feedback for Iceberg implementation

- fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN()
- fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter)
- fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success
- fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors
- fix: Use relative URL in template instead of hardcoded localhost:8181
- fix: Add HTTP timeout to test's waitForService function to avoid hangs
- fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures
- fix: Add Iceberg port to final port configuration logging in mini.go

* fix: Address critical issues in Iceberg implementation

- fix: Cache table UUIDs to ensure persistence across LoadTable calls
  The UUID now remains stable for the lifetime of the server session.
  TODO: For production, UUIDs should be persisted in S3 Tables metadata.

- fix: Remove redundant URL-encoded namespace parsing
  mux router already decodes %1F to \x1F before passing to handlers.
  Redundant ReplaceAll call could cause bugs with literal %1F in namespace.

* fix: Improve test robustness and reduce code duplication

- fix: Make DuckDB test more robust by failing on unexpected errors
  Instead of silently logging errors, now explicitly check for expected
  conditions (extension not available) and skip the test appropriately.

- fix: Extract username helper method to reduce duplication
  Created getUsername() helper in AdminHandlers to avoid duplicating
  the username retrieval logic across Iceberg page handlers.

* fix: Add mutex protection to table UUID cache

Protects concurrent access to the tableUUIDs map with sync.RWMutex.
Uses read-lock for fast path when UUID already cached, and write-lock
for generating new UUIDs. Includes double-check pattern to handle race
condition between read-unlock and write-lock.

* style: fix go fmt errors

* feat(iceberg): persist table UUID in S3 Tables metadata

* feat(admin): configure Iceberg port in Admin UI and commands

* refactor: address review comments (flags, tests, handlers)

- command/mini: fix tracking of explicit s3.port.iceberg flag
- command/admin: add explicit -iceberg.port flag
- admin/handlers: reuse getUsername helper
- tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check

* test: check error from FileStat in verify_gc_empty_test
2026-02-02 23:12:13 -08:00
Chris Lu
330bd92ddc 4.08 2026-02-02 20:44:13 -08:00
Chris Lu
ba8816e2e1 4.08 2026-02-02 20:36:03 -08:00
Chris Lu
3b9e367c1a templ 2026-02-02 20:34:37 -08:00
Lisandro Pin
ff5a8f0579 Implement RPC skeleton for regular/EC volumes scrubbing. (#8187)
* Implement RPC skeleton for regular/EC volumes scrubbing.

See https://github.com/seaweedfs/seaweedfs/issues/8018 for details.

* Minor proto improvements for `ScrubVolume()`, `ScrubEcVolume()`:

  - Add fields for scrubbing details in `ScrubVolumeResponse` and `ScrubEcVolumeResponse`,
    instead of reporting these through RPC errors.
  - Return a list of broken shards when scrubbing EC volumes, via `EcShardInfo'.
2026-02-02 17:55:04 -08:00
Lisandro Pin
345ac950b6 Add volume server RPCs to read and update state flags. (#8186)
* Boostrap persistent state for volume servers.

This PR implements logic load/save persistent state information for storages
associated with volume servers, and reporting state changes back to masters
via heartbeat messages.

More work ensues!

See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.

* Add volume server RPCs to read and update state flags.
2026-02-02 16:22:17 -08:00
Lisandro Pin
9638d37fe2 Block RPC write operations on volume servers when maintenance mode is enabled (#8115)
* Boostrap persistent state for volume servers.

This PR implements logic load/save persistent state information for storages
associated with volume servers, and reporting state changes back to masters
via heartbeat messages.

More work ensues!

See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.

* Block RPC operations writing to volume servers when maintenance mode is on.
2026-02-02 13:21:02 -08:00
dependabot[bot]
fca1216f6d build(deps): bump github.com/klauspost/compress from 1.18.2 to 1.18.3 (#8181)
Bumps [github.com/klauspost/compress](https://github.com/klauspost/compress) from 1.18.2 to 1.18.3.
- [Release notes](https://github.com/klauspost/compress/releases)
- [Commits](https://github.com/klauspost/compress/compare/v1.18.2...v1.18.3)

---
updated-dependencies:
- dependency-name: github.com/klauspost/compress
  dependency-version: 1.18.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-02 11:14:00 -08:00
dependabot[bot]
97e33b3dbd build(deps): bump cloud.google.com/go/storage from 1.59.1 to 1.59.2 (#8182)
Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.59.1 to 1.59.2.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](https://github.com/googleapis/google-cloud-go/compare/storage/v1.59.1...storage/v1.59.2)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-version: 1.59.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-02 11:13:50 -08:00
dependabot[bot]
c50aaa9fbd build(deps): bump github.com/lib/pq from 1.10.9 to 1.11.1 (#8183)
Bumps [github.com/lib/pq](https://github.com/lib/pq) from 1.10.9 to 1.11.1.
- [Release notes](https://github.com/lib/pq/releases)
- [Changelog](https://github.com/lib/pq/blob/master/CHANGELOG.md)
- [Commits](https://github.com/lib/pq/compare/v1.10.9...v1.11.1)

---
updated-dependencies:
- dependency-name: github.com/lib/pq
  dependency-version: 1.11.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-02 11:13:37 -08:00
dependabot[bot]
a992417058 build(deps): bump github.com/golang-jwt/jwt/v5 from 5.3.0 to 5.3.1 (#8184)
Bumps [github.com/golang-jwt/jwt/v5](https://github.com/golang-jwt/jwt) from 5.3.0 to 5.3.1.
- [Release notes](https://github.com/golang-jwt/jwt/releases)
- [Commits](https://github.com/golang-jwt/jwt/compare/v5.3.0...v5.3.1)

---
updated-dependencies:
- dependency-name: github.com/golang-jwt/jwt/v5
  dependency-version: 5.3.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-02 11:13:28 -08:00
dependabot[bot]
723fca35d4 build(deps): bump github.com/shirou/gopsutil/v4 from 4.25.12 to 4.26.1 (#8185)
Bumps [github.com/shirou/gopsutil/v4](https://github.com/shirou/gopsutil) from 4.25.12 to 4.26.1.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v4.25.12...v4.26.1)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v4
  dependency-version: 4.26.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-02 11:13:16 -08:00
Chris Lu
f23e09f58b fix: skip exhausted blocks before creating an interval (#8180)
* fix: skip exhausted blocks before creating an interval

* refactor: optimize interval creation and fix logic duplication

* docs: add docstring for LocateData

* refactor: extract moveToNextBlock helper to deduplicate logic

* fix: use int64 for block index comparison to prevent overflow

* test: add unit test for LocateData boundary crossing (issue #8179)

* fix: skip exhausted blocks to prevent negative interval size and panics (issue #8179)

* refactor: apply review suggestions for test maintainability and code style
2026-02-02 11:12:31 -08:00
Chris Lu
621834d96a s3tables: add Iceberg file layout validation for table buckets (#8176)
* s3tables: add Iceberg file layout validation for table buckets

This PR adds file layout validation for table buckets to enforce Apache
Iceberg table structure. Files uploaded to table buckets must conform
to the expected Iceberg layout:

- metadata/ directory: contains metadata files (*.json, *.avro)
  - v*.metadata.json (table metadata)
  - snap-*.avro (snapshot manifests)
  - *-m*.avro (manifest files)
  - version-hint.text

- data/ directory: contains data files (*.parquet, *.orc, *.avro)
  - Supports partition paths (e.g., year=2024/month=01/)
  - Supports bucket subdirectories

The validator exports functions for use by the S3 API:
- IsTableBucketPath: checks if a path is under /table-buckets/
- GetTableInfoFromPath: extracts bucket/namespace/table from path
- ValidateTableBucketUpload: validates file layout for table bucket uploads
- ValidateTableBucketUploadWithClient: validates with filer client access

Invalid uploads receive InvalidIcebergLayout error response.

* Address review comments: regex performance, error handling, stricter patterns

* Fix validateMetadataFile and validateDataFile to handle subdirectories and directory creation

* Fix error handling, metadata validation, reduce code duplication

* Fix empty remainingPath handling for directory paths

* Refactor: unify validateMetadataFile and validateDataFile

* Refactor: extract UUID pattern constant

* fix: allow Iceberg partition and directory paths without trailing slashes

Modified validateFile to correctly handle directory paths that do not end with a trailing slash.
This ensures that paths like 'data/year=2024' are validated as directories if they match
partition or subdirectory patterns, rather than being incorrectly rejected as invalid files.
Added comprehensive test cases for various directory and partition path combinations.

* refactor: use standard path package and idiomatic returns

Simplified directory and filename extraction in validateFile by using the standard
path package (aliased as pathpkg). This improves readability and avoids manual string
manipulation. Also updated GetTableInfoFromPath to use naked returns for named
return values, aligning with Go conventions for short functions.

* feat: enforce strict Iceberg top-level directories and metadata restrictions

Implemented strict validation for Iceberg layout:
- Bare top-level keys like 'metadata' and 'data' are now rejected; they must have
  a trailing slash or a subpath.
- Subdirectories under 'metadata/' are now prohibited to enforce the flat structure
  required by Iceberg.
- Updated the test suite with negative test cases and ensured proper formatting.

* feat: allow table root directory markers in ValidateTableBucketUpload

Modified ValidateTableBucketUpload to short-circuit and return nil when the
relative path within a table is empty. This occurs when a trailing slash is
used on the table directory (e.g., /table-buckets/mybucket/myns/mytable/).
Added a test case 'table dir with slash' to verify this behavior.

* test: add regression cases for metadata subdirs and table markers

Enforced a strictly flat structure for the metadata directory by removing the
"directory without trailing slash" fallback in validateFile for metadata.
Added regression test cases:
- metadata/nested (must fail)
- /table-buckets/.../mytable/ (must pass)
Verified all tests pass.

* feat: reject double slashes in Iceberg table paths

Modified validateDirectoryPath to return an error when encountering empty path
segments, effectively rejecting double slashes like 'data//file.parquet'.
Updated validateFile to use manual path splitting instead of the 'path' package
for intermediate directories to ensure redundant slashes are not auto-cleaned
before validation. Added regression tests for various double slash scenarios.

* refactor: separate isMetadata logic in validateDirectoryPath

Following reviewer feedback, refactored validateDirectoryPath to explicitly
separate the handling of metadata and data paths. This improves readability
and clarifies the function's intent while maintaining the strict validation rules
and double-slash rejection previously implemented.

* feat: validate bucket, namespace, and table path segments

Updated ValidateTableBucketUpload to ensure that bucket, namespace, and table
segments in the path are non-empty. This prevents invalid paths like
'/table-buckets//myns/mytable/...' from being accepted during upload.
Added regression tests for various empty segment scenarios.

* Update weed/s3api/s3tables/iceberg_layout.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* feat: block double-slash bypass in table relative paths

Added a guard in ValidateTableBucketUpload to reject tableRelativePath if it
starts with a '/' or contains '//'. This ensures that paths like
'/table-buckets/b/ns/t//data/file.parquet' are properly rejected and cannot
bypass the layout validation. Added regression tests to verify.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 10:05:04 -08:00
Chris Lu
2ee6e4f391 mount: refresh and evict hot dir cache (#8174)
* mount: refresh and evict hot dir cache

* mount: guard dir update window and extend TTL

* mount: reuse timestamp for cache mark

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* mount: make dir cache tuning configurable

* mount: dedupe dir update notices

* mount: restore invalidate-all cache helper

* mount: keep hot dir tuning constants

* mount: centralize cache state reset

* mount: mark refresh completion time

* mount: allow disabling idle eviction

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-31 13:46:37 -08:00
Chris Lu
fe6f8d737d mount: invalidate meta cache on follow retry (#8173)
* mount: invalidate meta cache on follow retry

* mount: clear cache under mount root on retry
2026-01-31 11:18:26 -08:00
Chris Lu
79722bcf30 Add s3tables shell and admin UI (#8172)
* Add shared s3tables manager

* Add s3tables shell commands

* Add s3tables admin API

* Add s3tables admin UI

* Fix admin s3tables namespace create

* Rename table buckets menu

* Centralize s3tables tag validation

* Reuse s3tables manager in admin

* Extract s3tables list limit

* Add s3tables bucket ARN helper

* Remove write middleware from s3tables APIs

* Fix bucket link and policy hint

* Fix table tag parsing and nav link

* Disable namespace table link on invalid ARN

* Improve s3tables error decode

* Return flag parse errors for s3tables tag

* Accept query params for namespace create

* Bind namespace create form data

* Read s3tables JS data from DOM

* s3tables: allow empty region ARN

* shell: pass s3tables account id

* shell: require account for table buckets

* shell: use bucket name for namespaces

* shell: use bucket name for tables

* shell: use bucket name for tags

* admin: add table buckets links in file browser

* s3api: reuse s3tables tag validation

* admin: harden s3tables UI handlers

* fix admin list table buckets

* allow admin s3tables access

* validate s3tables bucket tags

* log s3tables bucket metadata errors

* rollback table bucket on owner failure

* show s3tables bucket owner

* add s3tables iam conditions

* Add s3tables user permissions UI

* Authorize s3tables using identity actions

* Add s3tables permissions to user modal

* Disambiguate bucket scope in user permissions

* Block table bucket names that match S3 buckets

* Pretty-print IAM identity JSON

* Include tags in s3tables permission context

* admin: refactor S3 Tables inline JavaScript into a separate file

* s3tables: extend IAM policy condition operators support

* shell: use LookupEntry wrapper for s3tables bucket conflict check

* admin: handle buildBucketPermissions validation in create/update flows
2026-01-30 22:57:05 -08:00
Chris Lu
b2b0a38e71 s3api: allow empty region and account id in s3tables ARN (#8171)
* s3api: allow empty region and account id in s3tables ARN

* s3api: refactor S3 Tables ARN regex into a constant
2026-01-30 13:15:39 -08:00
Chris Lu
6a9e7360df s3api: fix S3 Tables auth to allow auto-hashing of body (#8170)
* s3api: allow auto-hashing of request body for s3tables

* s3api: add unit test for s3tables body hashing
2026-01-30 12:02:18 -08:00
Chris Lu
f1e27b8f30 s3: change s3 tables to use RESTful API (#8169)
* s3: refactor s3 tables to use RESTful API

* test/s3tables: guard empty namespaces

* s3api: document tag parsing and validate get-table

* s3api: limit S3Tables REST body size

* Update weed/s3api/s3api_tables.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/s3api/s3tables/handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* s3api: accept encoded table bucket ARNs

* s3api: validate namespaces and close body

* s3api: match encoded table bucket ARNs

* s3api: scope table bucket ARN routes

* s3api: dedupe table bucket request builders

* test/s3tables: allow list tables without namespace

* s3api: validate table params and tag ARN

* s3api: tighten tag handling and get-table params

* s3api: loosen tag ARN route matching

* Fix S3 Tables REST routing and tests

* Adjust S3 Tables request parsing

* Gate S3 Tables target routing

* Avoid double decoding namespaces

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-30 10:37:34 -08:00
Lisandro Pin
9e15823855 Have masters update DataNode details based on state heartbeats from volume servers. (#8017) 2026-01-29 21:51:46 -08:00
Chris Lu
6940b7d06e fix Filer startup failure due to JWT on / path #8149 (#8167)
* fix Filer startup failure due to JWT on / path #8149

- Comment out JWT keys in security.toml.example
- Revert Dockerfile.local change that enabled security by default
- Exempt GET/HEAD on / from JWT check for health checks

* refactor: simplify JWT bypass condition as per PR feedback
2026-01-29 21:45:15 -08:00
Chris Lu
23c25379ca iam: add ECDSA support for OIDC token validation (#8166)
* iam: add ECDSA support for OIDC token validation

Fixes seaweedfs/seaweedfs#8148

* iam: refactor OIDC ECDSA tests and add failure cases

- Refactored TestOIDCProviderJWTValidationECDSA to use t.Run
- Added sub-tests for expired token, wrong key, invalid issuer, and invalid audience

* Update weed/iam/oidc/oidc_provider_test.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* iam: improve error type assertions for OIDC invalid signature tests

- Updated both RSA and ECDSA tests to specifically check for ErrProviderInvalidToken

* iam: pad EC coordinates in OIDC tests to comply with RFC 7518

- Coordinates are now zero-padded to the full field size (e.g., 32 bytes for P-256)
- Ensures interoperability with strict OIDC providers

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 20:03:43 -08:00
Chris Lu
c7c2d8d606 Merge branch 'master' of https://github.com/seaweedfs/seaweedfs 2026-01-29 20:03:26 -08:00
Chris Lu
88c27615c4 /table-buckets 2026-01-29 20:03:17 -08:00
Chris Lu
49c66bbb2e shell: allow spaces in arguments via quoting (#8157) (#8165)
* shell: allow spaces in arguments via quoting (#8157)

- updated argument splitting regex to handle quoted segments
- added robust quote stripping to remove matching quotes from flags
- added unit tests for regex splitting and flag parsing

* shell: use robust state machine parser for command line arguments

- replaced regex-based splitter with splitCommandLine state machine
- added escape character support in splitCommandLine and stripQuotes
- updated unit tests to include escaped quotes and single-quote literals
- addressed feedback regarding escaped quotes handling (#8157)

* shell: detect unbalanced quotes in stripQuotes

- modified stripQuotes to return the original string if quotes are unbalanced
- added test cases for unbalanced quotes in shell_liner_test.go

* shell: refactor shared parsing logic into parseShellInput helper

- unified splitting and unquoting logic into a single state machine
- splitCommandLine now returns unquoted tokens directly
- simplified processEachCmd by removing redundant unquoting loop
- improved maintainability by eliminating code duplication

* shell: detect trailing backslash in stripQuotes

- updated parseShellInput to include escaped state in unbalanced flag
- stripQuotes now returns original string if it ends with an unescaped backslash
- added test case for trailing backslash in shell_liner_test.go
2026-01-29 19:06:17 -08:00
Chris Lu
94e0b902f9 shell: update fs.verify and volume.fsck for new BFS signature
Updated dependent commands to match the refactored
doTraverseBfsAndSaving signature and use context for channel sends.
2026-01-29 14:42:10 -08:00
Chris Lu
3b05efbdbc shell: fix potential deadlock in fs.meta.save BFS traversal
Refactored doTraverseBfsAndSaving to use context cancellation.
If the saving process fails, the traversal is stopped immediately
to prevent workers from blocking on the output channel.
2026-01-29 14:42:10 -08:00
Chris Lu
b91427c30f filer: preserve existing TTL during CreateEntry/UpdateEntry gRPC calls
This ensures that metadata load correctly restores entries with their
original TTL instead of overwriting with the default from filer.conf.
Fixes #8159.
2026-01-29 14:42:09 -08:00
Chris Lu
550a4ff761 Fix inconsistent TTL reporting in volume.list #8158 (#8164)
* fix inconsistent TTL reporting in volume.list #8158

* simplify volume.list output using vi.String()
2026-01-29 14:16:42 -08:00
Chris Lu
8b61fd77b5 s3api: ensure MD5 is calculated or reused during CopyObject (#8163)
* s3api: ensure MD5 is calculated or reused during CopyObject

Fixes #8155
- Capture and reuse source MD5 for direct copies
- Calculate MD5 for small inline objects during copy

* s3api: refactor encryption logic and safe MD5 copying

- Extract duplicated bucket default encryption logic into helper
- Use safe append copy for MD5 slice to avoid shared modifications

* refactor

* avoids unnecessary MD5 recalculations for small files
2026-01-29 12:53:38 -08:00
Peter Dodd
4d513a2b3d feat(gcs): add application default credentials fallback support (#8161)
* feat(gcs): add application default credentials fallback support

* refactor

* Update weed/remote_storage/gcs/gcs_storage_client.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-29 09:57:49 -08:00
Chris Lu
c52d3d1229 fix: correct chunk size in encrypted uploads (fixes #8151) (#8154)
* fix: correct chunk size in encrypted uploads

When uploading with encryption enabled (-encryptVolumeData flag),
the chunk metadata was incorrectly storing the encrypted data size
instead of the original uncompressed size.

This caused data corruption when reading files back because:
1. Encrypted data is larger than original data
2. After decryption and decompression, the actual data is smaller
3. Size validation in readEncryptedUrl would fail
4. Files appeared truncated with null bytes

The fix ensures uploadResult.Size uses clearDataLen (the original
uncompressed size) in both cipher and non-cipher code paths,
matching the expected behavior.

Fixes: #8151

* test: add comprehensive tests for encrypted upload size fix

Add unit tests to verify that encrypted uploads correctly store the
original uncompressed data size, not the encrypted size.

Tests cover:
- Cipher key generation and encryption/decryption roundtrip
- Compression + encryption roundtrip with size preservation
- Different data sizes
- Compression detection and effectiveness
- Size behavior demonstration showing the bug and fix

All tests verify that metadata stores the original size (19 bytes in
the key example) not the encrypted size (75 bytes), which is critical
for preventing data corruption on read operations.

Issue: #8151

* remove emojis

* Update upload_content_test.go

* Delete upload_content_test.go
2026-01-28 20:46:03 -08:00
Chris Lu
9e575822a3 Merge master into s3tables-by-claude (resolve conflicts, keep s3tables changes) 2026-01-28 19:47:42 -08:00
Chris Lu
d399113e0c test: fix duplicate subtest names in permissions_test.go
Rename duplicate 'combined * and ?' test cases to include singular/plural suffix
for clarity and to support targeted test runs.
2026-01-28 19:42:19 -08:00
Chris Lu
c106532b79 fix: prevent MiniClusterCtx race conditions in command shutdown
Capture global MiniClusterCtx into local variables before goroutine/select
evaluation to prevent nil dereference/data race when context is reset to nil
after nil check. Applied to filer, master, volume, and s3 commands.
2026-01-28 19:42:16 -08:00
Chris Lu
a4217dff5f s3tables: enhance DeleteTable authorization with policy checking
Fetch and evaluate table policies in DeleteTable handler to support policy-based
delegation. Aligns authorization behavior with GetTable and ListTables handlers
instead of only checking ownership.
2026-01-28 19:42:12 -08:00