* Add multi-partition-spec compaction and delete-aware compaction (Phase 3)
Multi-partition-spec compaction:
- Add SpecID to compactionBin struct and group by spec+partition key
- Remove the len(specIDs) > 1 skip that blocked spec-evolved tables
- Write per-spec manifests in compaction commit using specByID map
- Use per-bin PartitionSpec when calling NewDataFileBuilder
Delete-aware compaction:
- Add ApplyDeletes config (default: true) with readBoolConfig helper
- Implement position delete collection (file_path + pos Parquet columns)
- Implement equality delete collection (field ID to column mapping)
- Update mergeParquetFiles to filter rows via position deletes (binary
search) and equality deletes (hash set lookup)
- Smart delete manifest carry-forward: drop when all data files compacted
- Fix EXISTING/DELETED entries to include sequence numbers
Tests for multi-spec bins, delete collection, merge filtering, and
end-to-end compaction with position/equality/mixed deletes.
* Add structured metrics and per-bin progress to iceberg maintenance
- Change return type of all four operations from (string, error) to
(string, map[string]int64, error) with structured metric counts
(files_merged, snapshots_expired, orphans_removed, duration_ms, etc.)
- Add onProgress callback to compactDataFiles for per-bin progress
- In Execute, pass progress callback that sends JobProgressUpdate with
per-bin stage messages
- Accumulate per-operation metrics with dot-prefixed keys
(e.g. compact.files_merged) into OutputValues on completion
- Update testing_api.go wrappers and integration test call sites
- Add tests: TestCompactDataFilesMetrics, TestExpireSnapshotsMetrics,
TestExecuteCompletionOutputValues
* Address review feedback: group equality deletes by field IDs, use metric constants
- Group equality deletes by distinct equality_ids sets so different
delete files with different equality columns are handled correctly
- Use length-prefixed type-aware encoding in buildEqualityKey to avoid
ambiguity between types and collisions from null bytes
- Extract metric key strings into package-level constants
* Fix buildEqualityKey to use length-prefixed type-aware encoding
The previous implementation used plain String() concatenation with null
byte separators, which caused type ambiguity (int 123 vs string "123")
and separator collisions when values contain null bytes. Now each value
is serialized as "kind:length:value" for unambiguous composite keys.
This fix was missed in the prior cherry-pick due to a merge conflict.
* Address nitpick review comments
- Document patchManifestContentToDeletes workaround: explain that
iceberg-go WriteManifest cannot create delete manifests, and note
the fail-fast validation on pattern match
- Document makeTestEntries: note that specID field is ignored and
callers should use makeTestEntriesWithSpec for multi-spec testing
* fmt
* Fix path normalization, manifest threshold, and artifact filename collisions
- Normalize file paths in position delete collection and lookup so that
absolute S3 URLs and relative paths match correctly
- Fix rewriteManifests threshold check to count only data manifests
(was including delete manifests in the count and metric)
- Add random suffix to artifact filenames in compactDataFiles and
rewriteManifests to prevent collisions between concurrent runs
- Sort compaction bins by SpecID then PartitionKey for deterministic
ordering across specs
* Fix pos delete read, deduplicate column resolution, minor cleanups
- Remove broken Column() guard in position delete reading that silently
defaulted pos to 0; unconditionally extract Int64() instead
- Deduplicate column resolution in readEqualityDeleteFile by calling
resolveEqualityColIndices instead of inlining the same logic
- Add warning log in readBoolConfig for unrecognized string values
- Fix CompactDataFiles call site in integration test to capture 3 return
values
* Advance progress on all bins, deterministic manifest order, assert metrics
- Call onProgress for every bin iteration including skipped/failed bins
so progress reporting never appears stalled
- Sort spec IDs before iterating specEntriesMap to produce deterministic
manifest list ordering across runs
- Assert expected metric keys in CompactDataFiles integration test
---------
Co-authored-by: Copilot <copilot@github.com>
* Add iceberg_maintenance plugin worker handler (Phase 1)
Implement automated Iceberg table maintenance as a new plugin worker job
type. The handler scans S3 table buckets for tables needing maintenance
and executes operations in the correct Iceberg order: expire snapshots,
remove orphan files, and rewrite manifests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add data file compaction to iceberg maintenance handler (Phase 2)
Implement bin-packing compaction for small Parquet data files:
- Enumerate data files from manifests, group by partition
- Merge small files using parquet-go (read rows, write merged output)
- Create new manifest with ADDED/DELETED/EXISTING entries
- Commit new snapshot with compaction metadata
Add 'compact' operation to maintenance order (runs before expire_snapshots),
configurable via target_file_size_bytes and min_input_files thresholds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix memory exhaustion in mergeParquetFiles by processing files sequentially
Previously all source Parquet files were loaded into memory simultaneously,
risking OOM when a compaction bin contained many small files. Now each file
is loaded, its rows are streamed into the output writer, and its data is
released before the next file is loaded — keeping peak memory proportional
to one input file plus the output buffer.
* Validate bucket/namespace/table names against path traversal
Reject names containing '..', '/', or '\' in Execute to prevent
directory traversal via crafted job parameters.
* Add filer address failover in iceberg maintenance handler
Try each filer address from cluster context in order instead of only
using the first one. This improves resilience when the primary filer
is temporarily unreachable.
* Add separate MinManifestsToRewrite config for manifest rewrite threshold
The rewrite_manifests operation was reusing MinInputFiles (meant for
compaction bin file counts) as its manifest count threshold. Add a
dedicated MinManifestsToRewrite field with its own config UI section
and default value (5) so the two thresholds can be tuned independently.
* Fix risky mtime fallback in orphan removal that could delete new files
When entry.Attributes is nil, mtime defaulted to Unix epoch (1970),
which would always be older than the safety threshold, causing the
file to be treated as eligible for deletion. Skip entries with nil
Attributes instead, matching the safer logic in operations.go.
* Fix undefined function references in iceberg_maintenance_handler.go
Use the exported function names (ShouldSkipDetectionByInterval,
BuildDetectorActivity, BuildExecutorActivity) matching their
definitions in vacuum_handler.go.
* Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage
The IcebergMaintenanceHandler and its compaction code in the parent
pluginworker package duplicated the logic already present in the
iceberg/ subpackage (which self-registers via init()). The old code
lacked stale-plan guards, proper path normalization, CAS-based xattr
updates, and error-returning parseOperations.
Since the registry pattern (default "all") makes the old handler
unreachable, remove it entirely. All functionality is provided by
iceberg.Handler with the reviewed improvements.
* Fix MinManifestsToRewrite clamping to match UI minimum of 2
The clamp reset values below 2 to the default of 5, contradicting the
UI's advertised MinValue of 2. Clamp to 2 instead.
* Sort entries by size descending in splitOversizedBin for better packing
Entries were processed in insertion order which is non-deterministic
from map iteration. Sorting largest-first before the splitting loop
improves bin packing efficiency by filling bins more evenly.
* Add context cancellation check to drainReader loop
The row-streaming loop in drainReader did not check ctx between
iterations, making long compaction merges uncancellable. Check
ctx.Done() at the top of each iteration.
* Fix splitOversizedBin to always respect targetSize limit
The minFiles check in the split condition allowed bins to grow past
targetSize when they had fewer than minFiles entries, defeating the
OOM protection. Now bins always split at targetSize, and a trailing
runt with fewer than minFiles entries is merged into the previous bin.
* Add integration tests for iceberg table maintenance plugin worker
Tests start a real weed mini cluster, create S3 buckets and Iceberg
table metadata via filer gRPC, then exercise the iceberg.Handler
operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against
the live filer. A full maintenance cycle test runs all operations in
sequence and verifies metadata consistency.
Also adds exported method wrappers (testing_api.go) so the integration
test package can call the unexported handler methods.
* Fix splitOversizedBin dropping files and add source path to drainReader errors
The runt-merge step could leave leading bins with fewer than minFiles
entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop
the first 80-byte file). Replace the filter-based approach with an
iterative merge that folds any sub-minFiles bin into its smallest
neighbor, preserving all eligible files.
Also add the source file path to drainReader error messages so callers
can identify which Parquet file caused a read/write failure.
* Harden integration test error handling
- s3put: fail immediately on HTTP 4xx/5xx instead of logging and
continuing
- lookupEntry: distinguish NotFound (return nil) from unexpected RPC
errors (fail the test)
- writeOrphan and orphan creation in FullMaintenanceCycle: check
CreateEntryResponse.Error in addition to the RPC error
* go fmt
---------
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Enforce IAM for s3tables bucket creation
* Prefer IAM path when policies exist
* Ensure IAM enforcement honors default allow
* address comments
* Reused the precomputed principal when setting tableBucketMetadata.OwnerAccountID, avoiding the redundant getAccountID call.
* get identity
* fix
* dedup
* fix
* comments
* fix tests
* update iam config
* go fmt
* fix ports
* fix flags
* mini clean shutdown
* Revert "update iam config"
This reverts commit ca48fdbb0afa45657823d98657556c0bbf24f239.
Revert "mini clean shutdown"
This reverts commit 9e17f6baffd5dd7cc404d831d18dd618b9fe5049.
Revert "fix flags"
This reverts commit e9e7b29d2f77ee5cb82147d50621255410695ee3.
Revert "go fmt"
This reverts commit bd3241960b1d9484b7900190773b0ecb3f762c9a.
* test/s3tables: share single weed mini per test package via TestMain
Previously each top-level test function in the catalog and s3tables
package started and stopped its own weed mini instance. This caused
failures when a prior instance wasn't cleanly stopped before the next
one started (port conflicts, leaked global state).
Changes:
- catalog/iceberg_catalog_test.go: introduce TestMain that starts one
shared TestEnvironment (external weed binary) before all tests and
tears it down after. All individual test functions now use sharedEnv.
Added randomSuffix() for unique resource names across tests.
- catalog/pyiceberg_test.go: updated to use sharedEnv instead of
per-test environments.
- catalog/pyiceberg_test_helpers.go -> pyiceberg_test_helpers_test.go:
renamed to a _test.go file so it can access TestEnvironment which is
defined in a test file.
- table-buckets/setup.go: add package-level sharedCluster variable.
- table-buckets/s3tables_integration_test.go: introduce TestMain that
starts one shared TestCluster before all tests. TestS3TablesIntegration
now uses sharedCluster. Extract startMiniClusterInDir (no *testing.T)
for TestMain use. TestS3TablesCreateBucketIAMPolicy keeps its own
cluster (different IAM config). Remove miniClusterMutex (no longer
needed). Fix Stop() to not panic when t is nil."
* delete
* parse
* default allow should work with anonymous
* fix port
* iceberg route
The failures are from Iceberg REST using the default bucket warehouse when no prefix is provided. Your tests create random buckets, so /v1/namespaces was looking in warehouse and failing. I updated the tests to use the prefixed Iceberg routes (/v1/{bucket}/...) via a small helper.
* test(s3tables): fix port conflicts and IAM ARN matching in integration tests
- Pass -master.dir explicitly to prevent filer store directory collision
between shared cluster and per-test clusters running in the same process
- Pass -volume.port.public and -volume.publicUrl to prevent the global
publicPort flag (mutated from 0 → concrete port by first cluster) from
being reused by a second cluster, causing 'address already in use'
- Remove the flag-reset loop in Stop() that reset global flag values while
other goroutines were reading them (race → panic)
- Fix IAM policy Resource ARN in TestS3TablesCreateBucketIAMPolicy to use
wildcards (arn:aws:s3tables:*:*:bucket/<name>) because the handler
generates ARNs with its own DefaultRegion (us-east-1) and principal name
('admin'), not the test constants testRegion/testAccountID
* S3: Implement IAM defaults and STS signing key fallback logic
* S3: Refactor startup order to init SSE-S3 key manager before IAM
* S3: Derive STS signing key from KEK using HKDF for security isolation
* S3: Document STS signing key fallback in security.toml
* fix(s3api): refine anonymous access logic and secure-by-default behavior
- Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions.
- Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration.
- Update `NewIdentityAccessManagement` signature to accept `filerClient`.
- In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior.
- Use specific `LookupAnonymous` method instead of generic map lookup.
- Update tests to accommodate signature changes and verify improved anonymous handling.
* feat(s3api): make IAM configuration optional
- Start S3 API server without a configuration file if `EnableIam` option is set.
- Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode).
- Handle empty configuration path gracefully in `loadIAMManagerFromConfig`.
- Add integration test `iam_optional_test.go` to verify empty config behavior.
* fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore
* fix(iamapi): properly initialize FilerClient instead of passing nil
* fix(iamapi): properly initialize filer client for IAM management
- Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses.
- Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality.
* clean: remove dead code in s3api_server.go
* refactor(s3api): improve IAM initialization, safety and anonymous access security
* fix(s3api): ensure IAM config loads from filer after client init
* fix(s3): resolve test failures in integration, CORS, and tagging tests
- Fix CORS tests by providing explicit anonymous permissions config
- Fix S3 integration tests by setting admin credentials in init
- Align tagging test credentials in CI with IAM defaults
- Added goroutine to retry IAM config load in iamapi server
* fix(s3): allow anonymous access to health targets and S3 Tables when identities are present
* fix(ci): use /healthz for Caddy health check in awscli tests
* iam, s3api: expose DefaultAllow from IAM and Policy Engine
This allows checking the global "Open by Default" configuration from
other components like S3 Tables.
* s3api/s3tables: support DefaultAllow in permission logic and handler
Updated CheckPermissionWithContext to respect the DefaultAllow flag
in PolicyContext. This enables "Open by Default" behavior for
unauthenticated access in zero-config environments. Added a targeted
unit test to verify the logic.
* s3api/s3tables: propagate DefaultAllow through handlers
Propagated the DefaultAllow flag to individual handlers for
namespaces, buckets, tables, policies, and tagging. This ensures
consistent "Open by Default" behavior across all S3 Tables API
endpoints.
* s3api: wire up DefaultAllow for S3 Tables API initialization
Updated registerS3TablesRoutes to query the global IAM configuration
and set the DefaultAllow flag on the S3 Tables API server. This
completes the end-to-end propagation required for anonymous access in
zero-config environments. Added a SetDefaultAllow method to
S3TablesApiServer to facilitate this.
* s3api: fix tests by adding DefaultAllow to mock IAM integrations
The IAMIntegration interface was updated to include DefaultAllow(),
breaking several mock implementations in tests. This commit fixes
the build errors by adding the missing method to the mocks.
* env
* ensure ports
* env
* env
* fix default allow
* add one more test using non-anonymous user
* debug
* add more debug
* less logs
* Fix STS InvalidAccessKeyId and request body consumption in Lakekeeper integration test
* Remove debug prints
* Add Lakekeeper integration tests to CI
* Fix connection refused in CI by binding to 0.0.0.0
* Add timeout to docker run in Lakekeeper integration test
* Update weed/s3api/auth_credentials.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix STS AssumeRole with POST body param and add integration test
* Add STS integration test to CI workflow
* Address code review feedback: fix HPP vulnerability and style issues
* Refactor: address code review feedback
- Fix HTTP Parameter Pollution vulnerability in UnifiedPostHandler
- Refactor permission check logic for better readability
- Extract test helpers to testutil/docker.go to reduce duplication
- Clean up imports and simplify context setting
* Add SigV4-style test variant for AssumeRole POST body routing
- Added ActionInBodyWithSigV4Style test case to validate real-world scenario
- Test confirms routing works correctly for AWS SigV4-signed requests
- Addresses code review feedback about testing with SigV4 signatures
* Fix: always set identity in context when non-nil
- Ensure UnifiedPostHandler always calls SetIdentityInContext when identity is non-nil
- Only call SetIdentityNameInContext when identity.Name is non-empty
- This ensures downstream handlers (embeddedIam.DoActions) always have access to identity
- Addresses potential issue where empty identity.Name would skip context setting
* Add Iceberg table details view
* Enhance Iceberg catalog browsing UI
* Fix Iceberg UI security and logic issues
- Fix selectSchema() and partitionFieldsFromFullMetadata() to always search for matching IDs instead of checking != 0
- Fix snapshotsFromFullMetadata() to defensive-copy before sorting to prevent mutating caller's slice
- Fix XSS vulnerabilities in s3tables.js: replace innerHTML with textContent/createElement for user-controlled data
- Fix deleteIcebergTable() to redirect to namespace tables list on details page instead of reloading
- Fix data-bs-target in iceberg_namespaces.templ: remove templ.SafeURL for CSS selector
- Add catalogName to delete modal data attributes for proper redirect
- Remove unused hidden inputs from create table form (icebergTableBucketArn, icebergTableNamespace)
* Regenerate templ files for Iceberg UI updates
* Support complex Iceberg type objects in schema
Change Type field from string to json.RawMessage in both IcebergSchemaFieldInfo
and internal icebergSchemaField to properly handle Iceberg spec's complex type
objects (e.g. {"type": "struct", "fields": [...]}). Currently test data
only shows primitive string types, but this change makes the implementation
defensively robust for future complex types by preserving the exact JSON
representation. Add typeToString() helper and update schema extraction
functions to marshal string types as JSON. Update template to convert
json.RawMessage to string for display.
* Regenerate templ files for Type field changes
* templ
* Fix additional Iceberg UI issues from code review
- Fix lazy-load flag that was set before async operation completed, preventing retries
on error; now sets loaded flag only after successful load and throws error to caller
for proper error handling and UI updates
- Add zero-time guards for CreatedAt and ModifiedAt fields in table details to avoid
displaying Go zero-time values; render dash when time is zero
- Add URL path escaping for all catalog/namespace/table names in URLs to prevent
malformed URLs when names contain special characters like /, ?, or #
- Remove redundant innerHTML clear in loadIcebergNamespaceTables that cleared twice
before appending the table list
- Fix selectSnapshotForMetrics to remove != 0 guard for consistency with selectSchema
fix; now always searches for CurrentSnapshotID without zero-value gate
- Enhance typeToString() helper to display '(complex)' for non-primitive JSON types
* Regenerate templ files for Phase 3 updates
* Fix template generation to use correct file paths
Run templ generate from repo root instead of weed/admin directory to ensure
generated _templ.go files have correct absolute paths in error messages
(e.g., 'weed/admin/view/app/iceberg_table_details.templ' instead of
'app/iceberg_table_details.templ'). This ensures both 'make admin-generate'
at repo root and 'make generate' in weed/admin directory produce identical
output with consistent file path references.
* Regenerate template files with correct path references
* Validate S3 Tables names in UI
- Add client-side validation for table bucket and namespace names to surface
errors for invalid characters (dots/underscores) before submission
- Use HTML validity messages with reportValidity for immediate feedback
- Update namespace helper text to reflect actual constraints (single-level,
lowercase letters, numbers, and underscores)
* Regenerate templ files for namespace helper text
* Fix Iceberg catalog REST link and actions
* Disallow S3 object access on table buckets
* Validate Iceberg layout for table bucket objects
* Fix REST API link to /v1/config
* merge iceberg page with table bucket page
* Allowed Trino/Iceberg stats files in metadata validation
* fixes
- Backend/data handling:
- Normalized Iceberg type display and fallback handling in weed/admin/dash/s3tables_management.go.
- Fixed snapshot fallback pointer semantics in weed/admin/dash/s3tables_management.go.
- Added CSRF token generation/propagation/validation for namespace create/delete in:
- weed/admin/dash/csrf.go
- weed/admin/dash/auth_middleware.go
- weed/admin/dash/middleware.go
- weed/admin/dash/s3tables_management.go
- weed/admin/view/layout/layout.templ
- weed/admin/static/js/s3tables.js
- UI/template fixes:
- Zero-time guards for CreatedAt fields in:
- weed/admin/view/app/iceberg_namespaces.templ
- weed/admin/view/app/iceberg_tables.templ
- Fixed invalid templ-in-script interpolation and host/port rendering in:
- weed/admin/view/app/iceberg_catalog.templ
- weed/admin/view/app/s3tables_buckets.templ
- Added data-catalog-name consistency on Iceberg delete action in weed/admin/view/app/iceberg_tables.templ.
- Updated retry wording in weed/admin/static/js/s3tables.js.
- Regenerated all affected _templ.go files.
- S3 API/comment follow-ups:
- Reused cached table-bucket validator in weed/s3api/bucket_paths.go.
- Added validation-failure debug logging in weed/s3api/s3api_object_handlers_tagging.go.
- Added multipart path-validation design comment in weed/s3api/s3api_object_handlers_multipart.go.
- Build tooling:
- Fixed templ generate working directory issues in weed/admin/Makefile (watch + pattern rule).
* populate data
* test/s3tables: harden populate service checks
* admin: skip table buckets in object-store bucket list
* admin sidebar: move object store to top-level links
* admin iceberg catalog: guard zero times and escape links
* admin forms: add csrf/error handling and client-side name validation
* admin s3tables: fix namespace delete modal redeclaration
* admin: replace native confirm dialogs with modal helpers
* admin modal-alerts: remove noisy confirm usage console log
* reduce logs
* test/s3tables: use partitioned tables in trino and spark populate
* admin file browser: normalize filer ServerAddress for HTTP parsing
* Add Spark Iceberg catalog integration tests and CI support
Implement comprehensive integration tests for Spark with SeaweedFS Iceberg REST catalog:
- Basic CRUD operations (Create, Read, Update, Delete) on Iceberg tables
- Namespace (database) management
- Data insertion, querying, and deletion
- Time travel capabilities via snapshot versioning
- Compatible with SeaweedFS S3 and Iceberg REST endpoints
Tests mirror the structure of existing Trino integration tests but use Spark's
Python SQL API and PySpark for testing.
Add GitHub Actions CI job for spark-iceberg-catalog-tests in s3-tables-tests.yml
to automatically run Spark integration tests on pull requests.
* fmt
* Fix Spark integration tests - code review feedback
* go mod tidy
* Add go mod tidy step to integration test jobs
Add 'go mod tidy' step before test runs for all integration test jobs:
- s3-tables-tests
- iceberg-catalog-tests
- trino-iceberg-catalog-tests
- spark-iceberg-catalog-tests
This ensures dependencies are clean before running tests.
* Fix remaining Spark operations test issues
Address final code review comments:
Setup & Initialization:
- Add waitForSparkReady() helper function that polls Spark readiness
with backoff instead of hardcoded 10-second sleep
- Extract setupSparkTestEnv() helper to reduce boilerplate duplication
between TestSparkCatalogBasicOperations and TestSparkTimeTravel
- Both tests now use helpers for consistent, reliable setup
Assertions & Validation:
- Make setup-critical operations (namespace, table creation, initial
insert) use t.Fatalf instead of t.Errorf to fail fast
- Validate setupSQL output in TestSparkTimeTravel and fail if not
'Setup complete'
- Add validation after second INSERT in TestSparkTimeTravel:
verify row count increased to 2 before time travel test
- Add context to error messages with namespace and tableName params
Code Quality:
- Remove code duplication between test functions
- All critical paths now properly validated
- Consistent error handling throughout
* Fix go vet errors in S3 Tables tests
Fixes:
1. setup_test.go (Spark):
- Add missing import: github.com/testcontainers/testcontainers-go/wait
- Use wait.ForLog instead of undefined testcontainers.NewLogStrategy
- Remove unused strings import
2. trino_catalog_test.go:
- Use net.JoinHostPort instead of fmt.Sprintf for address formatting
- Properly handles IPv6 addresses by wrapping them in brackets
* Use weed mini for simpler SeaweedFS startup
Replace complex multi-process startup (master, volume, filer, s3)
with single 'weed mini' command that starts all services together.
Benefits:
- Simpler, more reliable startup
- Single weed mini process vs 4 separate processes
- Automatic coordination between components
- Better port management with no manual coordination
Changes:
- Remove separate master, volume, filer process startup
- Use weed mini with -master.port, -filer.port, -s3.port flags
- Keep Iceberg REST as separate service (still needed)
- Increase timeout to 15s for port readiness (weed mini startup)
- Remove volumePort and filerProcess fields from TestEnvironment
- Simplify cleanup to only handle two processes (mini, iceberg rest)
* Clean up dead code and temp directory leaks
Fixes:
1. Remove dead s3Process field and cleanup:
- weed mini bundles S3 gateway, no separate process needed
- Removed s3Process field from TestEnvironment
- Removed unnecessary s3Process cleanup code
2. Fix temp config directory leak:
- Add sparkConfigDir field to TestEnvironment
- Store returned configDir in writeSparkConfig
- Clean up sparkConfigDir in Cleanup() with os.RemoveAll
- Prevents accumulation of temp directories in test runs
3. Simplify Cleanup:
- Now handles only necessary processes (weed mini, iceberg rest)
- Removes both seaweedfsDataDir and sparkConfigDir
- Cleaner shutdown sequence
* Use weed mini's built-in Iceberg REST and fix python binary
Changes:
- Add -s3.port.iceberg flag to weed mini for built-in Iceberg REST Catalog
- Remove separate 'weed server' process for Iceberg REST
- Remove icebergRestProcess field from TestEnvironment
- Simplify Cleanup() to only manage weed mini + Spark
- Add port readiness check for iceberg REST from weed mini
- Set Spark container Cmd to '/bin/sh -c sleep 3600' to keep it running
- Change python to python3 in container.Exec calls
This simplifies to truly one all-in-one weed mini process (master, filer, s3,
iceberg-rest) plus just the Spark container.
* go fmt
* clean up
* bind on a non-loopback IP for container access, aligned Iceberg metadata saves/locations with table locations, and reworked Spark time travel to use TIMESTAMP AS OF with safe timestamp extraction.
* shared mini start
* Fixed internal directory creation under /buckets so .objects paths can auto-create without failing bucket-name validation, which restores table bucket object writes
* fix path
Updated table bucket objects to write under `/buckets/<bucket>` and saved Iceberg metadata there, adjusting Spark time-travel timestamp to committed_at +1s. Rebuilt the weed binary (`go
install ./weed`) and confirmed passing tests for Spark and Trino with focused test commands.
* Updated table bucket creation to stop creating /buckets/.objects and switched Trino REST warehouse to s3://<bucket> to match Iceberg layout.
* Stabilize S3Tables integration tests
* Fix timestamp extraction and remove dead code in bucketDir
* Use table bucket as warehouse in s3tables tests
* Update trino_blog_operations_test.go
* adds the CASCADE option to handle any remaining table metadata/files in the schema directory
* skip namespace not empty
* Add Trino blog operations test
* Update test/s3tables/catalog_trino/trino_blog_operations_test.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: add table bucket path helpers and filer operations
- Add table object root and table location mapping directories
- Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers
- Support table location bucket mapping for S3 access
* feat: manage table bucket object roots on creation/deletion
- Create .objects directory for table buckets on creation
- Clean up table object bucket paths on deletion
- Enable S3 operations on table bucket object roots
* feat: add table location mapping for Iceberg REST
- Track table location bucket mappings when tables are created/updated/deleted
- Enable location-based routing for S3 operations on table data
* feat: route S3 operations to table bucket object roots
- Route table-s3 bucket names to mapped table paths
- Route table buckets to object root directories
- Support table location bucket mapping lookup
* feat: emit table-s3 locations from Iceberg REST
- Generate unique table-s3 bucket names with UUID suffix
- Store table metadata under table bucket paths
- Return table-s3 locations for Trino compatibility
* fix: handle missing directories in S3 list operations
- Propagate ErrNotFound from ListEntries for non-existent directories
- Treat missing directories as empty results for list operations
- Fixes Trino non-empty location checks on table creation
* test: improve Trino CSV parsing for single-value results
- Sanitize Trino output to skip jline warnings
- Handle single-value CSV results without header rows
- Strip quotes from numeric values in tests
* refactor: use bucket path helpers throughout S3 API
- Replace direct bucket path operations with helper functions
- Leverage centralized table bucket routing logic
- Improve maintainability with consistent path resolution
* fix: add table bucket cache and improve filer error handling
- Cache table bucket lookups to reduce filer overhead on repeated checks
- Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error
- Fix delete order in handler_bucket_get_list_delete: delete table object before directory
- Make location mapping errors best-effort: log and continue, don't fail API
- Update table location mappings to delete stale prior bucket mappings on update
- Add 1-second sleep before timestamp time travel query to ensure timestamps are in past
- Fix CSV parsing: examine all lines, not skip first; handle single-value rows
* fix: properly handle stale metadata location mapping cleanup
- Capture oldMetadataLocation before mutation in handleUpdateTable
- Update updateTableLocationMapping to accept both old and new locations
- Use passed-in oldMetadataLocation to detect location changes
- Delete stale mapping only when location actually changes
- Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping)
- Improve logging to show old -> new location transitions
* refactor: cleanup imports and cache design
- Remove unused 'sync' import from bucket_paths.go
- Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling
- Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache
- Improve cache separation: table buckets cache is now separate from bucket metadata cache
* fix: improve cache invalidation and add transient error handling
Cache invalidation (critical fix):
- Add tableLocationCache to BucketRegistry for location mapping lookups
- Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata
- Prevents stale cache entries when buckets are deleted/recreated
Transient error handling:
- Only cache table bucket lookups when conclusive (found or ErrNotFound)
- Skip caching on transient errors (network, permission, etc)
- Prevents marking real table buckets as non-table due to transient failures
Performance optimization:
- Cache tableLocationDir results to avoid repeated filer RPCs on hot paths
- tableLocationDir now checks cache before making expensive filer lookups
- Cache stores empty string for 'not found' to avoid redundant lookups
Code clarity:
- Add comment to deleteDirectory explaining DeleteEntry response lacks Error field
* go fmt
* fix: mirror transient error handling in tableLocationDir and optimize bucketDir
Transient error handling:
- tableLocationDir now only caches definitive results
- Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses
- Improves reliability on flaky systems or during recovery
Performance optimization:
- bucketDir avoids redundant isTableBucket call via bucketRoot
- Directly use s3a.option.BucketsPath for regular buckets
- Saves one cache lookup for every non-table bucket operation
* fix: revert bucketDir optimization to preserve bucketRoot logic
The optimization to directly use BucketsPath bypassed bucketRoot's logic
and caused issues with S3 list operations on delimiter+prefix cases.
Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly
handles all bucket types and ensures consistent path resolution across
the codebase.
The slight performance cost of an extra cache lookup is worth the correctness
and consistency benefits.
* feat: move table buckets under /buckets
Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries.
* Fix S3 Tables code review issues
- handler_bucket_create.go: Fix bucket existence check to properly validate
entryResp.Entry before setting s3BucketExists flag (nil Entry should not
indicate existing bucket)
- bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified
buckets root path for all bucket types
- file_browser_data.go: Optimize by extracting table bucket check early to
avoid redundant WithFilerClient call
* Fix list prefix delimiter handling
* Handle list errors conservatively
* Fix Trino FOR TIMESTAMP query - use past timestamp
Iceberg requires the timestamp to be strictly in the past.
Use current_timestamp - interval '1' second instead of current_timestamp.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fix multipart etag
* address comments
* clean up
* clean up
* optimization
* address comments
* unquoted etag
* dedup
* upgrade
* clean
* etag
* return quoted tag
* quoted etag
* debug
* s3api: unify ETag retrieval and quoting across handlers
Refactor newListEntry to take *S3ApiServer and use getObjectETag,
and update setResponseHeaders to use the same logic. This ensures
consistent ETags are returned for both listing and direct access.
* s3api: implement ListObjects deduplication for versioned buckets
Handle duplicate entries between the main path and the .versions
directory by prioritizing the latest version when bucket versioning
is enabled.
* s3api: cleanup stale main file entries during versioned uploads
Add explicit deletion of pre-existing "main" files when creating new
versions in versioned buckets. This prevents stale entries from
appearing in bucket listings and ensures consistency.
* s3api: fix cleanup code placement in versioned uploads
Correct the placement of rm calls in completeMultipartUpload and
putVersionedObject to ensure stale main files are properly deleted
during versioned uploads.
* s3api: improve getObjectETag fallback for empty ExtETagKey
Ensure that when ExtETagKey exists but contains an empty value,
the function falls through to MD5/chunk-based calculation instead
of returning an empty string.
* s3api: fix test files for new newListEntry signature
Update test files to use the new newListEntry signature where the
first parameter is *S3ApiServer. Created mockS3ApiServer to properly
test owner display name lookup functionality.
* s3api: use filer.ETag for consistent Md5 handling in getEtagFromEntry
Change getEtagFromEntry fallback to use filer.ETag(entry) instead of
filer.ETagChunks to ensure legacy entries with Attributes.Md5 are
handled consistently with the rest of the codebase.
* s3api: optimize list logic and fix conditional header logging
- Hoist bucket versioning check out of per-entry callback to avoid
repeated getVersioningState calls
- Extract appendOrDedup helper function to eliminate duplicate
dedup/append logic across multiple code paths
- Change If-Match mismatch logging from glog.Errorf to glog.V(3).Infof
and remove DEBUG prefix for consistency
* s3api: fix test mock to properly initialize IAM accounts
Fixed nil pointer dereference in TestNewListEntryOwnerDisplayName by
directly initializing the IdentityAccessManagement.accounts map in the
test setup. This ensures newListEntry can properly look up account
display names without panicking.
* cleanup
* s3api: remove premature main file cleanup in versioned uploads
Removed incorrect cleanup logic that was deleting main files during
versioned uploads. This was causing test failures because it deleted
objects that should have been preserved as null versions when
versioning was first enabled. The deduplication logic in listing is
sufficient to handle duplicate entries without deleting files during
upload.
* s3api: add empty-value guard to getEtagFromEntry
Added the same empty-value guard used in getObjectETag to prevent
returning quoted empty strings. When ExtETagKey exists but is empty,
the function now falls through to filer.ETag calculation instead of
returning "".
* s3api: fix listing of directory key objects with matching prefix
Revert prefix handling logic to use strings.TrimPrefix instead of
checking HasPrefix with empty string result. This ensures that when a
directory key object exactly matches the prefix (e.g. prefix="dir/",
object="dir/"), it is correctly handled as a regular entry instead of
being skipped or incorrectly processed as a common prefix. Also fixed
missing variable definition.
* s3api: refactor list inline dedup to use appendOrDedup helper
Refactored the inline deduplication logic in listFilerEntries to use the
shared appendOrDedup helper function. This ensures consistent behavior
and reduces code duplication.
* test: fix port allocation race in s3tables integration test
Updated startMiniCluster to find all required ports simultaneously using
findAvailablePorts instead of sequentially. This prevents race conditions
where the OS reallocates a port that was just released, causing multiple
services (e.g. Filer and Volume) to be assigned the same port and fail
to start.
* test: add comprehensive CRUD tests for S3 Tables Catalog Trino integration
- Add TestNamespaceCRUD: Tests complete Create-Read-Update-Delete lifecycle for namespaces
- Add TestNamespaceListingPagination: Tests listing multiple namespaces with verification
- Add TestNamespaceErrorHandling: Tests error handling for edge cases (IF EXISTS, IF NOT EXISTS)
- Add TestSchemaIntegrationWithCatalog: Tests integration between Trino SQL and Iceberg REST Catalog
All tests pass successfully and use Trino SQL interface for practical integration testing.
Tests properly skip when Docker is unavailable.
Use randomized namespace names to avoid conflicts in parallel execution.
The tests provide comprehensive coverage of namespace/schema CRUD operations which form the
foundation of the Iceberg catalog integration with Trino.
* test: Address code review feedback for S3 Tables Catalog Trino CRUD tests
- Extract common test setup into setupTrinoTest() helper function
- Replace all fmt.Printf calls with idiomatic t.Logf
- Change namespace deletion verification from t.Logf to t.Errorf for proper test failures
- Enhance TestNamespaceErrorHandling with persistence verification test
- Remove unnecessary fmt import
- Improve test documentation with clarifying comments
* test: Fix schema naming and remove verbose output logging
- Fix TestNamespaceListingPagination schema name generation: use fmt.Sprintf instead of string(rune())
- Remove verbose logging of SHOW SCHEMAS output to reduce noise in test logs
- Keep high-level operation logging while removing detailed result output
* test: add Trino Iceberg catalog integration test
- Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog
- Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog
- Starts weed mini with all services and Trino in Docker container
- Validates Iceberg catalog schema creation and listing operations
- Uses native S3 filesystem support in Trino with path-style access
- Add workflow job to s3-tables-tests.yml for CI execution
* fix: preserve AWS environment credentials when replacing S3 configuration
When S3 configuration is loaded from filer/db, it replaces the identities list
and inadvertently removes AWS_ACCESS_KEY_ID credentials that were added from
environment variables. This caused auth to remain disabled even though valid
credentials were present.
Fix by preserving environment-based identities when replacing the configuration
and re-adding them after the replacement. This ensures environment credentials
persist across configuration reloads and properly enable authentication.
* fix: use correct ServerAddress format with gRPC port encoding
The admin server couldn't connect to master because the master address
was missing the gRPC port information. Use pb.NewServerAddress() which
properly encodes both HTTP and gRPC ports in the address string.
Changes:
- weed/command/mini.go: Use pb.NewServerAddress for master address in admin
- test/s3/policy/policy_test.go: Store and use gRPC ports for master/filer addresses
This fix applies to:
1. Admin server connection to master (mini.go)
2. Test shell commands that need master/filer addresses (policy_test.go)
* move
* move
* fix: always include gRPC port in server address encoding
The NewServerAddress() function was omitting the gRPC port from the address
string when it matched the port+10000 convention. However, gRPC port allocation
doesn't always follow this convention - when the calculated port is busy, an
alternative port is allocated.
This caused a bug where:
1. Master's gRPC port was allocated as 50661 (sequential, not port+10000)
2. Address was encoded as '192.168.1.66:50660' (gRPC port omitted)
3. Admin client called ToGrpcAddress() which assumed port+10000 offset
4. Admin tried to connect to 60660 but master was on 50661 → connection failed
Fix: Always include explicit gRPC port in address format (host:httpPort.grpcPort)
unless gRPC port is 0. This makes addresses unambiguous and works regardless of
the port allocation strategy used.
Impacts: All server-to-server gRPC connections now use properly formatted addresses.
* test: fix Iceberg REST API readiness check
The Iceberg REST API endpoints require authentication. When checked without
credentials, the API returns 403 Forbidden (not 401 Unauthorized). The
readiness check now accepts both auth error codes (401/403) as indicators
that the service is up and ready, it just needs credentials.
This fixes the 'Iceberg REST API did not become ready' test failure.
* Fix AWS SigV4 signature verification for base64-encoded payload hashes
AWS SigV4 canonical requests must use hex-encoded SHA256 hashes,
but the X-Amz-Content-Sha256 header may be transmitted as base64.
Changes:
- Added normalizePayloadHash() function to convert base64 to hex
- Call normalizePayloadHash() in extractV4AuthInfoFromHeader()
- Added encoding/base64 import
Fixes 403 Forbidden errors on POST requests to Iceberg REST API
when clients send base64-encoded content hashes in the header.
Impacted services: Iceberg REST API, S3Tables
* Fix AWS SigV4 signature verification for base64-encoded payload hashes
AWS SigV4 canonical requests must use hex-encoded SHA256 hashes,
but the X-Amz-Content-Sha256 header may be transmitted as base64.
Changes:
- Added normalizePayloadHash() function to convert base64 to hex
- Call normalizePayloadHash() in extractV4AuthInfoFromHeader()
- Added encoding/base64 import
- Removed unused fmt import
Fixes 403 Forbidden errors on POST requests to Iceberg REST API
when clients send base64-encoded content hashes in the header.
Impacted services: Iceberg REST API, S3Tables
* pass sigv4
* s3api: fix identity preservation and logging levels
- Ensure environment-based identities are preserved during config replacement
- Update accessKeyIdent and nameToIdentity maps correctly
- Downgrade informational logs to V(2) to reduce noise
* test: fix trino integration test and s3 policy test
- Pin Trino image version to 479
- Fix port binding to 0.0.0.0 for Docker connectivity
- Fix S3 policy test hang by correctly assigning MiniClusterCtx
- Improve port finding robustness in policy tests
* ci: pre-pull trino image to avoid timeouts
- Pull trinodb/trino:479 after Docker setup
- Ensure image is ready before integration tests start
* iceberg: remove unused checkAuth and improve logging
- Remove unused checkAuth method
- Downgrade informational logs to V(2)
- Ensure loggingMiddleware uses a status writer for accurate reported codes
- Narrow catch-all route to avoid interfering with other subsystems
* iceberg: fix build failure by removing unused s3api import
* Update iceberg.go
* use warehouse
* Update trino_catalog_test.go
* full integration with iceberg-go
* Table Commit Operations (handleUpdateTable)
* s3tables: fix Iceberg v2 compliance and namespace properties
This commit ensures SeaweedFS Iceberg REST Catalog is compliant with
Iceberg Format Version 2 by:
- Using iceberg-go's table.NewMetadataWithUUID for strict v2 compliance.
- Explicitly initializing namespace properties to empty maps.
- Removing omitempty from required Iceberg response fields.
- Fixing CommitTableRequest unmarshaling using table.Requirements and table.Updates.
* s3tables: automate Iceberg integration tests
- Added Makefile for local test execution and cluster management.
- Added docker-compose for PyIceberg compatibility kit.
- Added Go integration test harness for PyIceberg.
- Updated GitHub CI to run Iceberg catalog tests automatically.
* s3tables: update PyIceberg test suite for compatibility
- Updated test_rest_catalog.py to use latest PyIceberg transaction APIs.
- Updated Dockerfile to include pyarrow and pandas dependencies.
- Improved namespace and table handling in integration tests.
* s3tables: address review feedback on Iceberg Catalog
- Implemented robust metadata version parsing and incrementing.
- Ensured table metadata changes are persisted during commit (handleUpdateTable).
- Standardized namespace property initialization for consistency.
- Fixed unused variable and incorrect struct field build errors.
* s3tables: finalize Iceberg REST Catalog and optimize tests
- Implemented robust metadata versioning and persistence.
- Standardized namespace property initialization.
- Optimized integration tests using pre-built Docker image.
- Added strict property persistence validation to test suite.
- Fixed build errors from previous partial updates.
* Address PR review: fix Table UUID stability, implement S3Tables UpdateTable, and support full metadata persistence individually
* fix: Iceberg catalog stable UUIDs, metadata persistence, and file writing
- Ensure table UUIDs are stable (do not regenerate on load).
- Persist full table metadata (Iceberg JSON) in s3tables extended attributes.
- Add `MetadataVersion` to explicitly track version numbers, replacing regex parsing.
- Implement `saveMetadataFile` to persist metadata JSON files to the Filer on commit.
- Update `CreateTable` and `UpdateTable` handlers to use the new logic.
* test: bind weed mini to 0.0.0.0 in integration tests to fix Docker connectivity
* Iceberg: fix metadata handling in REST catalog
- Add nil guard in createTable
- Fix updateTable to correctly load existing metadata from storage
- Ensure full metadata persistence on updates
- Populate loadTable result with parsed metadata
* S3Tables: add auth checks and fix response fields in UpdateTable
- Add CheckPermissionWithContext to UpdateTable handler
- Include TableARN and MetadataLocation in UpdateTable response
- Use ErrCodeConflict (409) for version token mismatches
* Tests: improve Iceberg catalog test infrastructure and cleanup
- Makefile: use PID file for precise process killing
- test_rest_catalog.py: remove unused variables and fix f-strings
* Iceberg: fix variable shadowing in UpdateTable
- Rename inner loop variable `req` to `requirement` to avoid shadowing outer request variable
* S3Tables: simplify MetadataVersion initialization
- Use `max(req.MetadataVersion, 1)` instead of anonymous function
* Tests: remove unicode characters from S3 tables integration test logs
- Remove unicode checkmarks from test output for cleaner logs
* Iceberg: improve metadata persistence robustness
- Fix MetadataLocation in LoadTableResult to fallback to generated location
- Improve saveMetadataFile to ensure directory hierarchy existence and robust error handling
* s3: enforce authentication and JSON error format for Iceberg REST Catalog
* s3/iceberg: align error exception types with OpenAPI spec examples
* s3api: refactor AuthenticateRequest to return identity object
* s3/iceberg: propagate full identity object to request context
* s3/iceberg: differentiate NotAuthorizedException and ForbiddenException
* s3/iceberg: reject requests if authenticator is nil to prevent auth bypass
* s3/iceberg: refactor Auth middleware to build context incrementally and use switch for error mapping
* s3api: update misleading comment for authRequestWithAuthType
* s3api: return ErrAccessDenied if IAM is not configured to prevent auth bypass
* s3/iceberg: optimize context update in Auth middleware
* s3api: export CanDo for external authorization use
* s3/iceberg: enforce identity-based authorization in all API handlers
* s3api: fix compilation errors by updating internal CanDo references
* s3/iceberg: robust identity validation and consistent action usage in handlers
* s3api: complete CanDo rename across tests and policy engine integration
* s3api: fix integration tests by allowing admin access when auth is disabled and explicit gRPC ports
* duckdb
* create test bucket
* feat: Add Iceberg REST Catalog server
Implement Iceberg REST Catalog API on a separate port (default 8181)
that exposes S3 Tables metadata through the Apache Iceberg REST protocol.
- Add new weed/s3api/iceberg package with REST handlers
- Implement /v1/config endpoint returning catalog configuration
- Implement namespace endpoints (list/create/get/head/delete)
- Implement table endpoints (list/create/load/head/delete/update)
- Add -port.iceberg flag to S3 standalone server (s3.go)
- Add -s3.port.iceberg flag to combined server mode (server.go)
- Add -s3.port.iceberg flag to mini cluster mode (mini.go)
- Support prefix-based routing for multiple catalogs
The Iceberg REST server reuses S3 Tables metadata storage under
/table-buckets and enables DuckDB, Spark, and other Iceberg clients
to connect to SeaweedFS as a catalog.
* feat: Add Iceberg Catalog pages to admin UI
Add admin UI pages to browse Iceberg catalogs, namespaces, and tables.
- Add Iceberg Catalog menu item under Object Store navigation
- Create iceberg_catalog.templ showing catalog overview with REST info
- Create iceberg_namespaces.templ listing namespaces in a catalog
- Create iceberg_tables.templ listing tables in a namespace
- Add handlers and routes in admin_handlers.go
- Add Iceberg data provider methods in s3tables_management.go
- Add Iceberg data types in types.go
The Iceberg Catalog pages provide visibility into the same S3 Tables
data through an Iceberg-centric lens, including REST endpoint examples
for DuckDB and PyIceberg.
* test: Add Iceberg catalog integration tests and reorg s3tables tests
- Reorganize existing s3tables tests to test/s3tables/table-buckets/
- Add new test/s3tables/catalog/ for Iceberg REST catalog tests
- Add TestIcebergConfig to verify /v1/config endpoint
- Add TestIcebergNamespaces to verify namespace listing
- Add TestDuckDBIntegration for DuckDB connectivity (requires Docker)
- Update CI workflow to use new test paths
* fix: Generate proper random UUIDs for Iceberg tables
Address code review feedback:
- Replace placeholder UUID with crypto/rand-based UUID v4 generation
- Add detailed TODO comments for handleUpdateTable stub explaining
the required atomic metadata swap implementation
* fix: Serve Iceberg on localhost listener when binding to different interface
Address code review feedback: properly serve the localhost listener
when the Iceberg server is bound to a non-localhost interface.
* ci: Add Iceberg catalog integration tests to CI
Add new job to run Iceberg catalog tests in CI, along with:
- Iceberg package build verification
- Iceberg unit tests
- Iceberg go vet checks
- Iceberg format checks
* fix: Address code review feedback for Iceberg implementation
- fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN()
- fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter)
- fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success
- fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors
- fix: Use relative URL in template instead of hardcoded localhost:8181
- fix: Add HTTP timeout to test's waitForService function to avoid hangs
- fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures
- fix: Add Iceberg port to final port configuration logging in mini.go
* fix: Address critical issues in Iceberg implementation
- fix: Cache table UUIDs to ensure persistence across LoadTable calls
The UUID now remains stable for the lifetime of the server session.
TODO: For production, UUIDs should be persisted in S3 Tables metadata.
- fix: Remove redundant URL-encoded namespace parsing
mux router already decodes %1F to \x1F before passing to handlers.
Redundant ReplaceAll call could cause bugs with literal %1F in namespace.
* fix: Improve test robustness and reduce code duplication
- fix: Make DuckDB test more robust by failing on unexpected errors
Instead of silently logging errors, now explicitly check for expected
conditions (extension not available) and skip the test appropriately.
- fix: Extract username helper method to reduce duplication
Created getUsername() helper in AdminHandlers to avoid duplicating
the username retrieval logic across Iceberg page handlers.
* fix: Add mutex protection to table UUID cache
Protects concurrent access to the tableUUIDs map with sync.RWMutex.
Uses read-lock for fast path when UUID already cached, and write-lock
for generating new UUIDs. Includes double-check pattern to handle race
condition between read-unlock and write-lock.
* style: fix go fmt errors
* feat(iceberg): persist table UUID in S3 Tables metadata
* feat(admin): configure Iceberg port in Admin UI and commands
* refactor: address review comments (flags, tests, handlers)
- command/mini: fix tracking of explicit s3.port.iceberg flag
- command/admin: add explicit -iceberg.port flag
- admin/handlers: reuse getUsername helper
- tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check
* test: check error from FileStat in verify_gc_empty_test
Add *testing.T field to TestCluster struct and initialize it in
startMiniCluster. This allows Stop() to properly log warnings when
cluster shutdown times out. Includes the t field in the test cluster
initialization and restores the logging statement in Stop().
The TestCluster.Stop() method doesn't have access to testing.T object.
Remove the log statement and keep the timeout handling comment for clarity.
The original intent (warning about shutdown timeout) is still captured in
the code comment explaining potential issues.
The timeout path (2 second wait for graceful shutdown) was silent. Add a
warning log message when it occurs to help diagnose flaky test issues and
indicate when the mini cluster didn't shut down cleanly.
Updated random string generation to use crypto/rand in s3tables tests.
Increased resilience of IAM distributed tests by adding "connection refused"
to retryable errors.
- Add authorization checks to all S3 Tables handlers (policy, table ops) to enforce security
- Improve error handling to distinguish between NotFound (404) and InternalError (500)
- Fix directory FileMode usage in filer_ops
- Improve test randomness for version tokens
- Update permissions comments to acknowledge IAM gaps
- Introduce sync.Mutex to protect global state (os.Args, os.Chdir)
- Ensure serialized initialization of the mini cluster runner
- Fix intermittent race conditions during parallel test execution
- Implement doRequestAndDecode to eliminate HTTP boilerplate
- Update client API to accept []string for namespaces to support hierarchy
- Standardize error response decoding across all client methods
- Introduce MiniClusterCtx to coordinate shutdown across mini services
- Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation
- Ensure all resources are cleaned up properly during test teardown
- Integrate MiniClusterCtx in s3tables integration tests
- Move all client methods to client.go
- Remove duplicate types/constants from s3tables_integration_test.go
- Keep setup.go for test infrastructure
- Keep integration test logic in s3tables_integration_test.go
- Clean up unused imports
- Test compiles successfully
- Create setup.go with TestCluster and S3TablesClient definitions
- Create client.go with HTTP client methods for all operations
- Test utilities and client methods organized for reusability
- Foundation for S3 Tables integration tests