feat: Add Iceberg REST Catalog server and admin UI (#8175)

* feat: Add Iceberg REST Catalog server

Implement Iceberg REST Catalog API on a separate port (default 8181)
that exposes S3 Tables metadata through the Apache Iceberg REST protocol.

- Add new weed/s3api/iceberg package with REST handlers
- Implement /v1/config endpoint returning catalog configuration
- Implement namespace endpoints (list/create/get/head/delete)
- Implement table endpoints (list/create/load/head/delete/update)
- Add -port.iceberg flag to S3 standalone server (s3.go)
- Add -s3.port.iceberg flag to combined server mode (server.go)
- Add -s3.port.iceberg flag to mini cluster mode (mini.go)
- Support prefix-based routing for multiple catalogs

The Iceberg REST server reuses S3 Tables metadata storage under
/table-buckets and enables DuckDB, Spark, and other Iceberg clients
to connect to SeaweedFS as a catalog.

* feat: Add Iceberg Catalog pages to admin UI

Add admin UI pages to browse Iceberg catalogs, namespaces, and tables.

- Add Iceberg Catalog menu item under Object Store navigation
- Create iceberg_catalog.templ showing catalog overview with REST info
- Create iceberg_namespaces.templ listing namespaces in a catalog
- Create iceberg_tables.templ listing tables in a namespace
- Add handlers and routes in admin_handlers.go
- Add Iceberg data provider methods in s3tables_management.go
- Add Iceberg data types in types.go

The Iceberg Catalog pages provide visibility into the same S3 Tables
data through an Iceberg-centric lens, including REST endpoint examples
for DuckDB and PyIceberg.

* test: Add Iceberg catalog integration tests and reorg s3tables tests

- Reorganize existing s3tables tests to test/s3tables/table-buckets/
- Add new test/s3tables/catalog/ for Iceberg REST catalog tests
- Add TestIcebergConfig to verify /v1/config endpoint
- Add TestIcebergNamespaces to verify namespace listing
- Add TestDuckDBIntegration for DuckDB connectivity (requires Docker)
- Update CI workflow to use new test paths

* fix: Generate proper random UUIDs for Iceberg tables

Address code review feedback:
- Replace placeholder UUID with crypto/rand-based UUID v4 generation
- Add detailed TODO comments for handleUpdateTable stub explaining
  the required atomic metadata swap implementation

* fix: Serve Iceberg on localhost listener when binding to different interface

Address code review feedback: properly serve the localhost listener
when the Iceberg server is bound to a non-localhost interface.

* ci: Add Iceberg catalog integration tests to CI

Add new job to run Iceberg catalog tests in CI, along with:
- Iceberg package build verification
- Iceberg unit tests
- Iceberg go vet checks
- Iceberg format checks

* fix: Address code review feedback for Iceberg implementation

- fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN()
- fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter)
- fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success
- fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors
- fix: Use relative URL in template instead of hardcoded localhost:8181
- fix: Add HTTP timeout to test's waitForService function to avoid hangs
- fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures
- fix: Add Iceberg port to final port configuration logging in mini.go

* fix: Address critical issues in Iceberg implementation

- fix: Cache table UUIDs to ensure persistence across LoadTable calls
  The UUID now remains stable for the lifetime of the server session.
  TODO: For production, UUIDs should be persisted in S3 Tables metadata.

- fix: Remove redundant URL-encoded namespace parsing
  mux router already decodes %1F to \x1F before passing to handlers.
  Redundant ReplaceAll call could cause bugs with literal %1F in namespace.

* fix: Improve test robustness and reduce code duplication

- fix: Make DuckDB test more robust by failing on unexpected errors
  Instead of silently logging errors, now explicitly check for expected
  conditions (extension not available) and skip the test appropriately.

- fix: Extract username helper method to reduce duplication
  Created getUsername() helper in AdminHandlers to avoid duplicating
  the username retrieval logic across Iceberg page handlers.

* fix: Add mutex protection to table UUID cache

Protects concurrent access to the tableUUIDs map with sync.RWMutex.
Uses read-lock for fast path when UUID already cached, and write-lock
for generating new UUIDs. Includes double-check pattern to handle race
condition between read-unlock and write-lock.

* style: fix go fmt errors

* feat(iceberg): persist table UUID in S3 Tables metadata

* feat(admin): configure Iceberg port in Admin UI and commands

* refactor: address review comments (flags, tests, handlers)

- command/mini: fix tracking of explicit s3.port.iceberg flag
- command/admin: add explicit -iceberg.port flag
- admin/handlers: reuse getUsername helper
- tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check

* test: check error from FileStat in verify_gc_empty_test
This commit is contained in:
Chris Lu
2026-02-02 23:12:13 -08:00
committed by GitHub
parent 330bd92ddc
commit 2bb21ea276
59 changed files with 3436 additions and 818 deletions

View File

@@ -104,11 +104,12 @@ type AdminServer struct {
collectionStatsCacheThreshold time.Duration
s3TablesManager *s3tables.Manager
icebergPort int
}
// Type definitions moved to types.go
func NewAdminServer(masters string, templateFS http.FileSystem, dataDir string) *AdminServer {
func NewAdminServer(masters string, templateFS http.FileSystem, dataDir string, icebergPort int) *AdminServer {
grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.admin")
// Create master client with multiple master support
@@ -136,6 +137,7 @@ func NewAdminServer(masters string, templateFS http.FileSystem, dataDir string)
configPersistence: NewConfigPersistence(dataDir),
collectionStatsCacheThreshold: 30 * time.Second,
s3TablesManager: newS3TablesManager(),
icebergPort: icebergPort,
}
// Initialize topic retention purger

View File

@@ -166,6 +166,85 @@ func (s *AdminServer) GetS3TablesTablesData(ctx context.Context, bucketArn, name
}, nil
}
// Iceberg Catalog data providers
// GetIcebergCatalogData returns the Iceberg catalog overview data.
// Each S3 Table Bucket is exposed as an Iceberg catalog.
func (s *AdminServer) GetIcebergCatalogData(ctx context.Context) (IcebergCatalogData, error) {
bucketsData, err := s.GetS3TablesBucketsData(ctx)
if err != nil {
return IcebergCatalogData{}, err
}
catalogs := make([]IcebergCatalogInfo, 0, len(bucketsData.Buckets))
for _, bucket := range bucketsData.Buckets {
catalogs = append(catalogs, IcebergCatalogInfo{
Name: bucket.Name,
ARN: bucket.ARN,
OwnerAccountID: bucket.OwnerAccountID,
CreatedAt: bucket.CreatedAt,
})
}
return IcebergCatalogData{
Catalogs: catalogs,
TotalCatalogs: len(catalogs),
IcebergPort: s.icebergPort, // Use the port passed to AdminServer
LastUpdated: time.Now(),
}, nil
}
// GetIcebergNamespacesData returns namespaces for an Iceberg catalog.
func (s *AdminServer) GetIcebergNamespacesData(ctx context.Context, catalogName, bucketArn string) (IcebergNamespacesData, error) {
nsData, err := s.GetS3TablesNamespacesData(ctx, bucketArn)
if err != nil {
return IcebergNamespacesData{}, err
}
namespaces := make([]IcebergNamespaceInfo, 0, len(nsData.Namespaces))
for _, ns := range nsData.Namespaces {
name := ""
if len(ns.Namespace) > 0 {
name = strings.Join(ns.Namespace, ".")
}
namespaces = append(namespaces, IcebergNamespaceInfo{
Name: name,
CreatedAt: ns.CreatedAt,
})
}
return IcebergNamespacesData{
CatalogName: catalogName,
Namespaces: namespaces,
TotalNamespaces: len(namespaces),
LastUpdated: time.Now(),
}, nil
}
// GetIcebergTablesData returns tables for an Iceberg namespace.
func (s *AdminServer) GetIcebergTablesData(ctx context.Context, catalogName, bucketArn, namespace string) (IcebergTablesData, error) {
tablesData, err := s.GetS3TablesTablesData(ctx, bucketArn, namespace)
if err != nil {
return IcebergTablesData{}, err
}
tables := make([]IcebergTableInfo, 0, len(tablesData.Tables))
for _, t := range tablesData.Tables {
tables = append(tables, IcebergTableInfo{
Name: t.Name,
CreatedAt: t.CreatedAt,
})
}
return IcebergTablesData{
CatalogName: catalogName,
NamespaceName: namespace,
Tables: tables,
TotalTables: len(tables),
LastUpdated: time.Now(),
}, nil
}
// API handlers
func (s *AdminServer) ListS3TablesBucketsAPI(c *gin.Context) {

View File

@@ -596,3 +596,46 @@ type STSConfigData struct {
Providers []string `json:"providers,omitempty"`
LastUpdated time.Time `json:"last_updated"`
}
// Iceberg Catalog types
type IcebergCatalogInfo struct {
Name string `json:"name"`
ARN string `json:"arn"`
OwnerAccountID string `json:"owner_account_id"`
CreatedAt time.Time `json:"created_at"`
}
type IcebergCatalogData struct {
Username string `json:"username"`
Catalogs []IcebergCatalogInfo `json:"catalogs"`
TotalCatalogs int `json:"total_catalogs"`
IcebergPort int `json:"iceberg_port"`
LastUpdated time.Time `json:"last_updated"`
}
type IcebergNamespaceInfo struct {
Name string `json:"name"`
CreatedAt time.Time `json:"created_at"`
}
type IcebergNamespacesData struct {
Username string `json:"username"`
CatalogName string `json:"catalog_name"`
Namespaces []IcebergNamespaceInfo `json:"namespaces"`
TotalNamespaces int `json:"total_namespaces"`
LastUpdated time.Time `json:"last_updated"`
}
type IcebergTableInfo struct {
Name string `json:"name"`
CreatedAt time.Time `json:"created_at"`
}
type IcebergTablesData struct {
Username string `json:"username"`
CatalogName string `json:"catalog_name"`
NamespaceName string `json:"namespace_name"`
Tables []IcebergTableInfo `json:"tables"`
TotalTables int `json:"total_tables"`
LastUpdated time.Time `json:"last_updated"`
}