s3tables: redesign Iceberg REST Catalog using iceberg-go and automate integration tests (#8197)
* full integration with iceberg-go * Table Commit Operations (handleUpdateTable) * s3tables: fix Iceberg v2 compliance and namespace properties This commit ensures SeaweedFS Iceberg REST Catalog is compliant with Iceberg Format Version 2 by: - Using iceberg-go's table.NewMetadataWithUUID for strict v2 compliance. - Explicitly initializing namespace properties to empty maps. - Removing omitempty from required Iceberg response fields. - Fixing CommitTableRequest unmarshaling using table.Requirements and table.Updates. * s3tables: automate Iceberg integration tests - Added Makefile for local test execution and cluster management. - Added docker-compose for PyIceberg compatibility kit. - Added Go integration test harness for PyIceberg. - Updated GitHub CI to run Iceberg catalog tests automatically. * s3tables: update PyIceberg test suite for compatibility - Updated test_rest_catalog.py to use latest PyIceberg transaction APIs. - Updated Dockerfile to include pyarrow and pandas dependencies. - Improved namespace and table handling in integration tests. * s3tables: address review feedback on Iceberg Catalog - Implemented robust metadata version parsing and incrementing. - Ensured table metadata changes are persisted during commit (handleUpdateTable). - Standardized namespace property initialization for consistency. - Fixed unused variable and incorrect struct field build errors. * s3tables: finalize Iceberg REST Catalog and optimize tests - Implemented robust metadata versioning and persistence. - Standardized namespace property initialization. - Optimized integration tests using pre-built Docker image. - Added strict property persistence validation to test suite. - Fixed build errors from previous partial updates. * Address PR review: fix Table UUID stability, implement S3Tables UpdateTable, and support full metadata persistence individually * fix: Iceberg catalog stable UUIDs, metadata persistence, and file writing - Ensure table UUIDs are stable (do not regenerate on load). - Persist full table metadata (Iceberg JSON) in s3tables extended attributes. - Add `MetadataVersion` to explicitly track version numbers, replacing regex parsing. - Implement `saveMetadataFile` to persist metadata JSON files to the Filer on commit. - Update `CreateTable` and `UpdateTable` handlers to use the new logic. * test: bind weed mini to 0.0.0.0 in integration tests to fix Docker connectivity * Iceberg: fix metadata handling in REST catalog - Add nil guard in createTable - Fix updateTable to correctly load existing metadata from storage - Ensure full metadata persistence on updates - Populate loadTable result with parsed metadata * S3Tables: add auth checks and fix response fields in UpdateTable - Add CheckPermissionWithContext to UpdateTable handler - Include TableARN and MetadataLocation in UpdateTable response - Use ErrCodeConflict (409) for version token mismatches * Tests: improve Iceberg catalog test infrastructure and cleanup - Makefile: use PID file for precise process killing - test_rest_catalog.py: remove unused variables and fix f-strings * Iceberg: fix variable shadowing in UpdateTable - Rename inner loop variable `req` to `requirement` to avoid shadowing outer request variable * S3Tables: simplify MetadataVersion initialization - Use `max(req.MetadataVersion, 1)` instead of anonymous function * Tests: remove unicode characters from S3 tables integration test logs - Remove unicode checkmarks from test output for cleaner logs * Iceberg: improve metadata persistence robustness - Fix MetadataLocation in LoadTableResult to fallback to generated location - Improve saveMetadataFile to ensure directory hierarchy existence and robust error handling
This commit is contained in:
13
test/s3tables/catalog/Dockerfile.pyiceberg
Normal file
13
test/s3tables/catalog/Dockerfile.pyiceberg
Normal file
@@ -0,0 +1,13 @@
|
||||
# PyIceberg test container for Iceberg REST Catalog compatibility testing
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Install PyIceberg with S3 support and dependencies
|
||||
RUN pip install --no-cache-dir "pyiceberg[s3fs]" pyarrow pandas
|
||||
|
||||
# Copy the test script
|
||||
COPY test_rest_catalog.py /app/
|
||||
|
||||
# Default command
|
||||
CMD ["python3", "/app/test_rest_catalog.py", "--help"]
|
||||
37
test/s3tables/catalog/Makefile
Normal file
37
test/s3tables/catalog/Makefile
Normal file
@@ -0,0 +1,37 @@
|
||||
# Makefile for Iceberg REST Catalog Integration Tests
|
||||
#
|
||||
# This Makefile provides easy-to-use targets for running integration tests
|
||||
# and starting a local development environment.
|
||||
|
||||
.PHONY: build mini test test-pyiceberg clean
|
||||
|
||||
# Root directory of the SeaweedFS project
|
||||
PROJECT_ROOT = ../../..
|
||||
WEED_BIN = $(PROJECT_ROOT)/weed/weed
|
||||
|
||||
# Build the weed binary
|
||||
build:
|
||||
@echo "Building SeaweedFS binary..."
|
||||
cd $(PROJECT_ROOT)/weed && go build -v
|
||||
|
||||
# Start SeaweedFS mini cluster for manual testing
|
||||
# Uses default ports: S3=8333, Iceberg=8182
|
||||
mini: build
|
||||
@echo "Starting SeaweedFS mini cluster..."
|
||||
$(WEED_BIN) mini -s3.port.iceberg=8182 & echo $$! > weed-mini.pid; wait
|
||||
|
||||
# Run all integration tests in this directory
|
||||
test: build
|
||||
@echo "Running all Iceberg catalog integration tests..."
|
||||
go test -v .
|
||||
|
||||
# Run only the PyIceberg compatibility tests (requires Docker)
|
||||
test-pyiceberg: build
|
||||
@echo "Running PyIceberg compatibility tests..."
|
||||
go test -v -run TestPyIcebergRestCatalog .
|
||||
|
||||
# Clean up temporary data and stop weed mini if running
|
||||
clean:
|
||||
@echo "Cleaning up..."
|
||||
@test -f weed-mini.pid && kill $$(cat weed-mini.pid) && rm weed-mini.pid || true
|
||||
@rm -rf /tmp/seaweed-iceberg-test-*
|
||||
33
test/s3tables/catalog/docker-compose.test.yaml
Normal file
33
test/s3tables/catalog/docker-compose.test.yaml
Normal file
@@ -0,0 +1,33 @@
|
||||
# Iceberg REST Catalog Compatibility Test Docker Compose
|
||||
#
|
||||
# This compose file sets up a test environment for validating
|
||||
# the SeaweedFS Iceberg REST Catalog implementation.
|
||||
#
|
||||
# Usage:
|
||||
# docker compose -f docker-compose.test.yaml up --build
|
||||
#
|
||||
# Note: SeaweedFS must be running on the host with the Iceberg REST port exposed.
|
||||
# Set ICEBERG_CATALOG_URL environment variable to point to your catalog.
|
||||
|
||||
services:
|
||||
# PyIceberg-based REST catalog test
|
||||
pyiceberg-test:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile.pyiceberg
|
||||
environment:
|
||||
- CATALOG_URL=${CATALOG_URL:-http://host.docker.internal:8182}
|
||||
- WAREHOUSE=${WAREHOUSE:-s3://test-bucket/}
|
||||
- PREFIX=${PREFIX:-pyiceberg-test}
|
||||
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-test}
|
||||
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:-test}
|
||||
- AWS_REGION=${AWS_REGION:-us-east-1}
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
volumes:
|
||||
- ./test_rest_catalog.py:/app/test_rest_catalog.py:ro
|
||||
command: >
|
||||
python3 /app/test_rest_catalog.py
|
||||
--catalog-url ${CATALOG_URL:-http://host.docker.internal:8182}
|
||||
--warehouse ${WAREHOUSE:-s3://test-bucket/}
|
||||
--prefix ${PREFIX:-pyiceberg-test}
|
||||
@@ -171,6 +171,7 @@ func (env *TestEnvironment) StartSeaweedFS(t *testing.T) {
|
||||
"-s3.port", fmt.Sprintf("%d", env.s3Port),
|
||||
"-s3.port.grpc", fmt.Sprintf("%d", env.s3GrpcPort),
|
||||
"-s3.port.iceberg", fmt.Sprintf("%d", env.icebergPort),
|
||||
"-ip.bind", "0.0.0.0",
|
||||
"-dir", env.dataDir,
|
||||
)
|
||||
cmd.Stdout = os.Stdout
|
||||
|
||||
80
test/s3tables/catalog/pyiceberg_test.go
Normal file
80
test/s3tables/catalog/pyiceberg_test.go
Normal file
@@ -0,0 +1,80 @@
|
||||
// Package catalog provides integration tests for the Iceberg REST Catalog API.
|
||||
// This file adds PyIceberg-based compatibility tests using Docker.
|
||||
package catalog
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestPyIcebergRestCatalog tests the Iceberg REST Catalog using PyIceberg client in Docker.
|
||||
// This provides a more comprehensive test than DuckDB as PyIceberg fully exercises the REST API.
|
||||
//
|
||||
// Prerequisites:
|
||||
// - Docker must be available
|
||||
// - SeaweedFS must be running with Iceberg REST enabled
|
||||
//
|
||||
// To run manually:
|
||||
//
|
||||
// cd test/s3tables/catalog
|
||||
// docker compose -f docker-compose.test.yaml up --build
|
||||
func TestPyIcebergRestCatalog(t *testing.T) {
|
||||
if testing.Short() {
|
||||
t.Skip("Skipping integration test in short mode")
|
||||
}
|
||||
|
||||
env := NewTestEnvironment(t)
|
||||
defer env.Cleanup(t)
|
||||
|
||||
if !env.dockerAvailable {
|
||||
t.Skip("Docker not available, skipping PyIceberg integration test")
|
||||
}
|
||||
|
||||
env.StartSeaweedFS(t)
|
||||
|
||||
// Create the test bucket first
|
||||
bucketName := "pyiceberg-compat-test"
|
||||
createTableBucket(t, env, bucketName)
|
||||
|
||||
// Build the test working directory path
|
||||
testDir := filepath.Join(env.seaweedDir, "test", "s3tables", "catalog")
|
||||
|
||||
// Run PyIceberg test using Docker
|
||||
catalogURL := fmt.Sprintf("http://host.docker.internal:%d", env.icebergPort)
|
||||
s3Endpoint := fmt.Sprintf("http://host.docker.internal:%d", env.s3Port)
|
||||
warehouse := fmt.Sprintf("s3://%s/", bucketName)
|
||||
|
||||
// Build the test image first for faster repeated runs
|
||||
buildCmd := exec.Command("docker", "build", "-t", "iceberg-rest-test", "-f", "Dockerfile.pyiceberg", ".")
|
||||
buildCmd.Dir = testDir
|
||||
if out, err := buildCmd.CombinedOutput(); err != nil {
|
||||
t.Fatalf("Failed to build test image: %v\n%s", err, string(out))
|
||||
}
|
||||
|
||||
cmd := exec.Command("docker", "run", "--rm",
|
||||
"--add-host", "host.docker.internal:host-gateway",
|
||||
"-e", fmt.Sprintf("AWS_ACCESS_KEY_ID=%s", "test"),
|
||||
"-e", fmt.Sprintf("AWS_SECRET_ACCESS_KEY=%s", "test"),
|
||||
"-e", fmt.Sprintf("AWS_ENDPOINT_URL=%s", s3Endpoint),
|
||||
"-v", fmt.Sprintf("%s:/app:ro", testDir),
|
||||
"iceberg-rest-test",
|
||||
"python3", "/app/test_rest_catalog.py",
|
||||
"--catalog-url", catalogURL,
|
||||
"--warehouse", warehouse,
|
||||
"--prefix", bucketName,
|
||||
)
|
||||
cmd.Dir = testDir
|
||||
cmd.Stdout = os.Stdout
|
||||
cmd.Stderr = os.Stderr
|
||||
|
||||
t.Logf("Running PyIceberg REST catalog test...")
|
||||
t.Logf(" Catalog URL: %s", catalogURL)
|
||||
t.Logf(" Warehouse: %s", warehouse)
|
||||
|
||||
if err := cmd.Run(); err != nil {
|
||||
t.Errorf("PyIceberg test failed: %v", err)
|
||||
}
|
||||
}
|
||||
236
test/s3tables/catalog/test_rest_catalog.py
Normal file
236
test/s3tables/catalog/test_rest_catalog.py
Normal file
@@ -0,0 +1,236 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Iceberg REST Catalog Compatibility Test for SeaweedFS
|
||||
|
||||
This script tests the Iceberg REST Catalog API compatibility of the
|
||||
SeaweedFS Iceberg REST Catalog implementation.
|
||||
|
||||
Usage:
|
||||
python3 test_rest_catalog.py --catalog-url http://localhost:8182
|
||||
|
||||
Requirements:
|
||||
pip install pyiceberg[s3fs]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pyiceberg.catalog import load_catalog
|
||||
from pyiceberg.schema import Schema
|
||||
from pyiceberg.types import (
|
||||
IntegerType,
|
||||
LongType,
|
||||
StringType,
|
||||
NestedField,
|
||||
)
|
||||
from pyiceberg.exceptions import (
|
||||
NamespaceAlreadyExistsError,
|
||||
NoSuchNamespaceError,
|
||||
TableAlreadyExistsError,
|
||||
NoSuchTableError,
|
||||
)
|
||||
|
||||
|
||||
def test_config_endpoint(catalog):
|
||||
"""Test that the catalog config endpoint returns valid configuration."""
|
||||
print("Testing /v1/config endpoint...")
|
||||
# The catalog is already loaded which means config endpoint worked
|
||||
print(" /v1/config endpoint working")
|
||||
return True
|
||||
|
||||
|
||||
def test_namespace_operations(catalog, prefix):
|
||||
"""Test namespace CRUD operations."""
|
||||
print("Testing namespace operations...")
|
||||
namespace = (f"{prefix.replace('-', '_')}_test_ns",)
|
||||
|
||||
# List initial namespaces
|
||||
namespaces = catalog.list_namespaces()
|
||||
print(f" Initial namespaces: {namespaces}")
|
||||
|
||||
# Create namespace
|
||||
try:
|
||||
catalog.create_namespace(namespace)
|
||||
print(f" Created namespace: {namespace}")
|
||||
except NamespaceAlreadyExistsError:
|
||||
print(f" ! Namespace already exists: {namespace}")
|
||||
|
||||
# List namespaces (should include our new one)
|
||||
namespaces = catalog.list_namespaces()
|
||||
if namespace in namespaces:
|
||||
print(" Namespace appears in list")
|
||||
else:
|
||||
print(f" Namespace not found in list: {namespaces}")
|
||||
return False
|
||||
|
||||
# Get namespace properties
|
||||
try:
|
||||
props = catalog.load_namespace_properties(namespace)
|
||||
print(f" Loaded namespace properties: {props}")
|
||||
except NoSuchNamespaceError:
|
||||
print(f" Failed to load namespace properties")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def test_table_operations(catalog, prefix):
|
||||
"""Test table CRUD operations."""
|
||||
print("Testing table operations...")
|
||||
namespace = (f"{prefix.replace('-', '_')}_test_ns",)
|
||||
table_name = "test_table"
|
||||
table_id = namespace + (table_name,)
|
||||
|
||||
# Define a simple schema
|
||||
schema = Schema(
|
||||
NestedField(field_id=1, name="id", field_type=LongType(), required=True),
|
||||
NestedField(field_id=2, name="name", field_type=StringType(), required=False),
|
||||
NestedField(field_id=3, name="age", field_type=IntegerType(), required=False),
|
||||
)
|
||||
|
||||
# Create table
|
||||
try:
|
||||
table = catalog.create_table(
|
||||
identifier=table_id,
|
||||
schema=schema,
|
||||
)
|
||||
print(f" Created table: {table_id}")
|
||||
except TableAlreadyExistsError:
|
||||
print(f" ! Table already exists: {table_id}")
|
||||
_ = catalog.load_table(table_id)
|
||||
|
||||
# List tables
|
||||
tables = catalog.list_tables(namespace)
|
||||
if table_name in [t[1] for t in tables]:
|
||||
print(" Table appears in list")
|
||||
else:
|
||||
print(f" Table not found in list: {tables}")
|
||||
return False
|
||||
|
||||
# Load table
|
||||
try:
|
||||
loaded_table = catalog.load_table(table_id)
|
||||
print(f" Loaded table: {loaded_table.name()}")
|
||||
print(f" Schema: {loaded_table.schema()}")
|
||||
print(f" Location: {loaded_table.location()}")
|
||||
except NoSuchTableError:
|
||||
print(f" Failed to load table")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def test_table_update(catalog, prefix):
|
||||
"""Test table update/commit operations."""
|
||||
print("Testing table update operations...")
|
||||
namespace = (f"{prefix.replace('-', '_')}_test_ns",)
|
||||
table_name = "test_table"
|
||||
table_id = namespace + (table_name,)
|
||||
|
||||
try:
|
||||
table = catalog.load_table(table_id)
|
||||
|
||||
# Update table properties
|
||||
with table.transaction() as transaction:
|
||||
transaction.set_properties({"test.property": "test.value"})
|
||||
|
||||
print(" Updated table properties")
|
||||
|
||||
# Reload and verify
|
||||
table = catalog.load_table(table_id)
|
||||
if table.properties.get("test.property") == "test.value":
|
||||
print(" Property update verified")
|
||||
else:
|
||||
print(" ! Property update failed or not persisted")
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
print(f" Table update failed: {e}")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def test_cleanup(catalog, prefix):
|
||||
"""Test table and namespace deletion."""
|
||||
print("Testing cleanup operations...")
|
||||
namespace = (f"{prefix.replace('-', '_')}_test_ns",)
|
||||
table_id = namespace + ("test_table",)
|
||||
|
||||
# Drop table
|
||||
try:
|
||||
catalog.drop_table(table_id)
|
||||
print(f" Dropped table: {table_id}")
|
||||
except NoSuchTableError:
|
||||
print(f" ! Table already deleted: {table_id}")
|
||||
|
||||
# Drop namespace
|
||||
try:
|
||||
catalog.drop_namespace(namespace)
|
||||
print(f" Dropped namespace: {namespace}")
|
||||
except NoSuchNamespaceError:
|
||||
print(f" ! Namespace already deleted: {namespace}")
|
||||
except Exception as e:
|
||||
print(f" ? Namespace drop error (may be expected): {e}")
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Test Iceberg REST Catalog compatibility")
|
||||
parser.add_argument("--catalog-url", required=True, help="Iceberg REST Catalog URL (e.g., http://localhost:8182)")
|
||||
parser.add_argument("--warehouse", default="s3://iceberg-test/", help="Warehouse location")
|
||||
parser.add_argument("--prefix", required=True, help="Table bucket prefix")
|
||||
parser.add_argument("--skip-cleanup", action="store_true", help="Skip cleanup at the end")
|
||||
args = parser.parse_args()
|
||||
|
||||
print(f"Connecting to Iceberg REST Catalog at: {args.catalog_url}")
|
||||
print(f"Warehouse: {args.warehouse}")
|
||||
print(f"Prefix: {args.prefix}")
|
||||
print()
|
||||
|
||||
# Load the REST catalog
|
||||
catalog = load_catalog(
|
||||
"rest",
|
||||
**{
|
||||
"type": "rest",
|
||||
"uri": args.catalog_url,
|
||||
"warehouse": args.warehouse,
|
||||
"prefix": args.prefix,
|
||||
}
|
||||
)
|
||||
|
||||
# Run tests
|
||||
tests = [
|
||||
("Config Endpoint", lambda: test_config_endpoint(catalog)),
|
||||
("Namespace Operations", lambda: test_namespace_operations(catalog, args.prefix)),
|
||||
("Table Operations", lambda: test_table_operations(catalog, args.prefix)),
|
||||
("Table Update", lambda: test_table_update(catalog, args.prefix)),
|
||||
]
|
||||
|
||||
if not args.skip_cleanup:
|
||||
tests.append(("Cleanup", lambda: test_cleanup(catalog, args.prefix)))
|
||||
|
||||
passed = 0
|
||||
failed = 0
|
||||
|
||||
for name, test_fn in tests:
|
||||
print(f"\n{'='*50}")
|
||||
try:
|
||||
if test_fn():
|
||||
passed += 1
|
||||
print(f"PASSED: {name}")
|
||||
else:
|
||||
failed += 1
|
||||
print(f"FAILED: {name}")
|
||||
except Exception as e:
|
||||
failed += 1
|
||||
print(f"ERROR in {name}: {e}")
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Results: {passed} passed, {failed} failed")
|
||||
|
||||
return 0 if failed == 0 else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user