* Add Trino blog operations test * Update test/s3tables/catalog_trino/trino_blog_operations_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: add table bucket path helpers and filer operations - Add table object root and table location mapping directories - Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers - Support table location bucket mapping for S3 access * feat: manage table bucket object roots on creation/deletion - Create .objects directory for table buckets on creation - Clean up table object bucket paths on deletion - Enable S3 operations on table bucket object roots * feat: add table location mapping for Iceberg REST - Track table location bucket mappings when tables are created/updated/deleted - Enable location-based routing for S3 operations on table data * feat: route S3 operations to table bucket object roots - Route table-s3 bucket names to mapped table paths - Route table buckets to object root directories - Support table location bucket mapping lookup * feat: emit table-s3 locations from Iceberg REST - Generate unique table-s3 bucket names with UUID suffix - Store table metadata under table bucket paths - Return table-s3 locations for Trino compatibility * fix: handle missing directories in S3 list operations - Propagate ErrNotFound from ListEntries for non-existent directories - Treat missing directories as empty results for list operations - Fixes Trino non-empty location checks on table creation * test: improve Trino CSV parsing for single-value results - Sanitize Trino output to skip jline warnings - Handle single-value CSV results without header rows - Strip quotes from numeric values in tests * refactor: use bucket path helpers throughout S3 API - Replace direct bucket path operations with helper functions - Leverage centralized table bucket routing logic - Improve maintainability with consistent path resolution * fix: add table bucket cache and improve filer error handling - Cache table bucket lookups to reduce filer overhead on repeated checks - Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error - Fix delete order in handler_bucket_get_list_delete: delete table object before directory - Make location mapping errors best-effort: log and continue, don't fail API - Update table location mappings to delete stale prior bucket mappings on update - Add 1-second sleep before timestamp time travel query to ensure timestamps are in past - Fix CSV parsing: examine all lines, not skip first; handle single-value rows * fix: properly handle stale metadata location mapping cleanup - Capture oldMetadataLocation before mutation in handleUpdateTable - Update updateTableLocationMapping to accept both old and new locations - Use passed-in oldMetadataLocation to detect location changes - Delete stale mapping only when location actually changes - Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping) - Improve logging to show old -> new location transitions * refactor: cleanup imports and cache design - Remove unused 'sync' import from bucket_paths.go - Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling - Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache - Improve cache separation: table buckets cache is now separate from bucket metadata cache * fix: improve cache invalidation and add transient error handling Cache invalidation (critical fix): - Add tableLocationCache to BucketRegistry for location mapping lookups - Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata - Prevents stale cache entries when buckets are deleted/recreated Transient error handling: - Only cache table bucket lookups when conclusive (found or ErrNotFound) - Skip caching on transient errors (network, permission, etc) - Prevents marking real table buckets as non-table due to transient failures Performance optimization: - Cache tableLocationDir results to avoid repeated filer RPCs on hot paths - tableLocationDir now checks cache before making expensive filer lookups - Cache stores empty string for 'not found' to avoid redundant lookups Code clarity: - Add comment to deleteDirectory explaining DeleteEntry response lacks Error field * go fmt * fix: mirror transient error handling in tableLocationDir and optimize bucketDir Transient error handling: - tableLocationDir now only caches definitive results - Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses - Improves reliability on flaky systems or during recovery Performance optimization: - bucketDir avoids redundant isTableBucket call via bucketRoot - Directly use s3a.option.BucketsPath for regular buckets - Saves one cache lookup for every non-table bucket operation * fix: revert bucketDir optimization to preserve bucketRoot logic The optimization to directly use BucketsPath bypassed bucketRoot's logic and caused issues with S3 list operations on delimiter+prefix cases. Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly handles all bucket types and ensures consistent path resolution across the codebase. The slight performance cost of an extra cache lookup is worth the correctness and consistency benefits. * feat: move table buckets under /buckets Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries. * Fix S3 Tables code review issues - handler_bucket_create.go: Fix bucket existence check to properly validate entryResp.Entry before setting s3BucketExists flag (nil Entry should not indicate existing bucket) - bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified buckets root path for all bucket types - file_browser_data.go: Optimize by extracting table bucket check early to avoid redundant WithFilerClient call * Fix list prefix delimiter handling * Handle list errors conservatively * Fix Trino FOR TIMESTAMP query - use past timestamp Iceberg requires the timestamp to be strictly in the past. Use current_timestamp - interval '1' second instead of current_timestamp. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
436 lines
14 KiB
Go
436 lines
14 KiB
Go
package s3tables
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"errors"
|
|
pathpkg "path"
|
|
"regexp"
|
|
"strings"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
|
)
|
|
|
|
// Iceberg file layout validation
|
|
// Apache Iceberg tables follow a specific file layout structure:
|
|
// - metadata/ directory containing metadata files (*.json, *.avro)
|
|
// - data/ directory containing data files (*.parquet, *.orc, *.avro)
|
|
//
|
|
// Valid file patterns include:
|
|
// - metadata/v*.metadata.json (table metadata)
|
|
// - metadata/snap-*.avro (snapshot manifest lists)
|
|
// - metadata/*.avro (manifest files)
|
|
// - data/*.parquet, data/*.orc, data/*.avro (data files)
|
|
|
|
const uuidPattern = `[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}`
|
|
|
|
var (
|
|
// Allowed directories in an Iceberg table
|
|
icebergAllowedDirs = map[string]bool{
|
|
"metadata": true,
|
|
"data": true,
|
|
}
|
|
|
|
// Patterns for valid metadata files
|
|
metadataFilePatterns = []*regexp.Regexp{
|
|
regexp.MustCompile(`^v\d+\.metadata\.json$`), // Table metadata: v1.metadata.json, v2.metadata.json
|
|
regexp.MustCompile(`^snap-\d+-\d+-` + uuidPattern + `\.avro$`), // Snapshot manifests: snap-123-1-uuid.avro
|
|
regexp.MustCompile(`^` + uuidPattern + `-m\d+\.avro$`), // Manifest files: uuid-m0.avro
|
|
regexp.MustCompile(`^` + uuidPattern + `\.avro$`), // General manifest files
|
|
regexp.MustCompile(`^version-hint\.text$`), // Version hint file
|
|
regexp.MustCompile(`^` + uuidPattern + `\.metadata\.json$`), // UUID-named metadata
|
|
}
|
|
|
|
// Patterns for valid data files
|
|
dataFilePatterns = []*regexp.Regexp{
|
|
regexp.MustCompile(`^[^/]+\.parquet$`), // Parquet files
|
|
regexp.MustCompile(`^[^/]+\.orc$`), // ORC files
|
|
regexp.MustCompile(`^[^/]+\.avro$`), // Avro files
|
|
}
|
|
|
|
// Data file partition path pattern (e.g., year=2024/month=01/)
|
|
partitionPathPattern = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*=[^/]+$`)
|
|
|
|
// Pattern for valid subdirectory names (alphanumeric, underscore, hyphen, and UUID-style directories)
|
|
validSubdirectoryPattern = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
|
|
)
|
|
|
|
// IcebergLayoutValidator validates that files conform to Iceberg table layout
|
|
type IcebergLayoutValidator struct{}
|
|
|
|
// NewIcebergLayoutValidator creates a new Iceberg layout validator
|
|
func NewIcebergLayoutValidator() *IcebergLayoutValidator {
|
|
return &IcebergLayoutValidator{}
|
|
}
|
|
|
|
// ValidateFilePath validates that a file path conforms to Iceberg layout
|
|
// The path should be relative to the table root (e.g., "metadata/v1.metadata.json" or "data/file.parquet")
|
|
func (v *IcebergLayoutValidator) ValidateFilePath(relativePath string) error {
|
|
// Normalize path separators
|
|
relativePath = strings.TrimPrefix(relativePath, "/")
|
|
if relativePath == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "empty file path",
|
|
}
|
|
}
|
|
|
|
parts := strings.SplitN(relativePath, "/", 2)
|
|
|
|
topDir := parts[0]
|
|
|
|
// Check if top-level directory is allowed
|
|
if !icebergAllowedDirs[topDir] {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "files must be placed in 'metadata/' or 'data/' directories",
|
|
}
|
|
}
|
|
|
|
// If it's just a bare top-level key (no trailing slash and no subpath), reject it
|
|
if len(parts) == 1 {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "must be a directory (use trailing slash) or contain a subpath",
|
|
}
|
|
}
|
|
|
|
remainingPath := parts[1]
|
|
if remainingPath == "" {
|
|
return nil // allow paths like "data/" or "metadata/"
|
|
}
|
|
|
|
switch topDir {
|
|
case "metadata":
|
|
return v.validateMetadataFile(remainingPath)
|
|
case "data":
|
|
return v.validateDataFile(remainingPath)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// validateDirectoryPath validates intermediate subdirectories in a path
|
|
// isMetadata indicates if we're in the metadata directory (true) or data directory (false)
|
|
func validateDirectoryPath(normalizedPath string, isMetadata bool) error {
|
|
if isMetadata {
|
|
// For metadata, reject any subdirectories (enforce flat structure under metadata/)
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "metadata directory does not support subdirectories",
|
|
}
|
|
}
|
|
|
|
// For data, validate each partition or subdirectory segment
|
|
subdirs := strings.Split(normalizedPath, "/")
|
|
for _, subdir := range subdirs {
|
|
if subdir == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "invalid partition or subdirectory format in data path: empty segment",
|
|
}
|
|
}
|
|
// For data, allow both partitions and valid subdirectories
|
|
if !partitionPathPattern.MatchString(subdir) && !isValidSubdirectory(subdir) {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "invalid partition or subdirectory format in data path",
|
|
}
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// validateFilePatterns validates a filename against allowed patterns
|
|
// isMetadata indicates if we're validating metadata files (true) or data files (false)
|
|
func validateFilePatterns(filename string, isMetadata bool) error {
|
|
var patterns []*regexp.Regexp
|
|
var errorMsg string
|
|
|
|
if isMetadata {
|
|
patterns = metadataFilePatterns
|
|
errorMsg = "invalid metadata file format: must be a valid Iceberg metadata, manifest, or snapshot file"
|
|
} else {
|
|
patterns = dataFilePatterns
|
|
errorMsg = "invalid data file format: must be .parquet, .orc, or .avro"
|
|
}
|
|
|
|
for _, pattern := range patterns {
|
|
if pattern.MatchString(filename) {
|
|
return nil
|
|
}
|
|
}
|
|
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: errorMsg,
|
|
}
|
|
}
|
|
|
|
// validateFile validates files with a unified logic for metadata and data directories
|
|
// isMetadata indicates whether we're validating metadata files (true) or data files (false)
|
|
// The logic is:
|
|
// 1. If path ends with "/", it's a directory - validate all parts and return nil
|
|
// 2. Otherwise, validate intermediate parts, then check the filename against patterns
|
|
func (v *IcebergLayoutValidator) validateFile(path string, isMetadata bool) error {
|
|
// Detect if it's a directory (path ends with "/")
|
|
if strings.HasSuffix(path, "/") {
|
|
// Normalize by removing trailing slash
|
|
normalizedPath := strings.TrimSuffix(path, "/")
|
|
return validateDirectoryPath(normalizedPath, isMetadata)
|
|
}
|
|
|
|
filename := pathpkg.Base(path)
|
|
|
|
// Validate intermediate subdirectories if present
|
|
// Find if there are intermediate directories by looking for the last slash
|
|
lastSlash := strings.LastIndex(path, "/")
|
|
if lastSlash != -1 {
|
|
dir := path[:lastSlash]
|
|
if err := validateDirectoryPath(dir, isMetadata); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
|
|
// Check against allowed file patterns
|
|
err := validateFilePatterns(filename, isMetadata)
|
|
if err == nil {
|
|
return nil
|
|
}
|
|
|
|
// Path could be for a directory without a trailing slash, e.g., "data/year=2024"
|
|
if !isMetadata {
|
|
if partitionPathPattern.MatchString(filename) || isValidSubdirectory(filename) {
|
|
return nil
|
|
}
|
|
}
|
|
|
|
return err
|
|
}
|
|
|
|
// validateMetadataFile validates files in the metadata/ directory
|
|
// This is a thin wrapper that calls validateFile with isMetadata=true
|
|
func (v *IcebergLayoutValidator) validateMetadataFile(path string) error {
|
|
return v.validateFile(path, true)
|
|
}
|
|
|
|
// validateDataFile validates files in the data/ directory
|
|
// This is a thin wrapper that calls validateFile with isMetadata=false
|
|
func (v *IcebergLayoutValidator) validateDataFile(path string) error {
|
|
return v.validateFile(path, false)
|
|
}
|
|
|
|
// isValidSubdirectory checks if a path component is a valid subdirectory name
|
|
func isValidSubdirectory(name string) bool {
|
|
// Allow alphanumeric, underscore, hyphen, and UUID-style directories
|
|
return validSubdirectoryPattern.MatchString(name)
|
|
}
|
|
|
|
// IcebergLayoutError represents an Iceberg layout validation error
|
|
type IcebergLayoutError struct {
|
|
Code string
|
|
Message string
|
|
}
|
|
|
|
func (e *IcebergLayoutError) Error() string {
|
|
return e.Message
|
|
}
|
|
|
|
// Error code for Iceberg layout violations
|
|
const (
|
|
ErrCodeInvalidIcebergLayout = "InvalidIcebergLayout"
|
|
)
|
|
|
|
// TableBucketFileValidator validates file uploads to table buckets
|
|
type TableBucketFileValidator struct {
|
|
layoutValidator *IcebergLayoutValidator
|
|
}
|
|
|
|
// NewTableBucketFileValidator creates a new table bucket file validator
|
|
func NewTableBucketFileValidator() *TableBucketFileValidator {
|
|
return &TableBucketFileValidator{
|
|
layoutValidator: NewIcebergLayoutValidator(),
|
|
}
|
|
}
|
|
|
|
// ValidateTableBucketUpload checks if a file upload to a table bucket conforms to Iceberg layout
|
|
// fullPath is the complete filer path (e.g., /buckets/mybucket/mynamespace/mytable/data/file.parquet)
|
|
// Returns nil if the path is not a table bucket path or if validation passes
|
|
// Returns an error if the file doesn't conform to Iceberg layout
|
|
func (v *TableBucketFileValidator) ValidateTableBucketUpload(fullPath string) error {
|
|
// Check if this is a table bucket path
|
|
if !strings.HasPrefix(fullPath, TablesPath+"/") {
|
|
return nil // Not a table bucket, no validation needed
|
|
}
|
|
|
|
// Extract the path relative to table bucket root
|
|
// Format: /buckets/{bucket}/{namespace}/{table}/{relative-path}
|
|
relativePath := strings.TrimPrefix(fullPath, TablesPath+"/")
|
|
parts := strings.SplitN(relativePath, "/", 4)
|
|
|
|
// Need at least bucket/namespace/table/file
|
|
if len(parts) < 4 {
|
|
// Creating bucket, namespace, or table directories - allow only if preceding parts are non-empty
|
|
for i := 0; i < len(parts); i++ {
|
|
if parts[i] == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "bucket, namespace, and table segments cannot be empty",
|
|
}
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// For full paths, also verify bucket, namespace, and table segments are non-empty
|
|
if parts[0] == "" || parts[1] == "" || parts[2] == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "bucket, namespace, and table segments cannot be empty",
|
|
}
|
|
}
|
|
|
|
// The last part is the path within the table (data/file.parquet or metadata/v1.json)
|
|
tableRelativePath := parts[3]
|
|
if tableRelativePath == "" {
|
|
return nil
|
|
}
|
|
|
|
// Reject paths with empty segments (double slashes) within the table path
|
|
if strings.HasPrefix(tableRelativePath, "/") || strings.Contains(tableRelativePath, "//") {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "bucket, namespace, and table segments cannot be empty",
|
|
}
|
|
}
|
|
|
|
return v.layoutValidator.ValidateFilePath(tableRelativePath)
|
|
}
|
|
|
|
// IsTableBucketPath checks if a path is under the table buckets directory
|
|
func IsTableBucketPath(fullPath string) bool {
|
|
return strings.HasPrefix(fullPath, TablesPath+"/")
|
|
}
|
|
|
|
// GetTableInfoFromPath extracts bucket, namespace, and table names from a table bucket path
|
|
// Returns empty strings if the path doesn't contain enough components
|
|
func GetTableInfoFromPath(fullPath string) (bucket, namespace, table string) {
|
|
if !strings.HasPrefix(fullPath, TablesPath+"/") {
|
|
return "", "", ""
|
|
}
|
|
|
|
relativePath := strings.TrimPrefix(fullPath, TablesPath+"/")
|
|
parts := strings.SplitN(relativePath, "/", 4)
|
|
|
|
if len(parts) >= 1 {
|
|
bucket = parts[0]
|
|
}
|
|
if len(parts) >= 2 {
|
|
namespace = parts[1]
|
|
}
|
|
if len(parts) >= 3 {
|
|
table = parts[2]
|
|
}
|
|
|
|
return
|
|
}
|
|
|
|
// ValidateTableBucketUploadWithClient validates upload and checks that the table exists and is ICEBERG format
|
|
func (v *TableBucketFileValidator) ValidateTableBucketUploadWithClient(
|
|
ctx context.Context,
|
|
client filer_pb.SeaweedFilerClient,
|
|
fullPath string,
|
|
) error {
|
|
// If not a table bucket path, nothing more to check
|
|
if !IsTableBucketPath(fullPath) {
|
|
return nil
|
|
}
|
|
|
|
// Get table info and verify it exists
|
|
bucket, namespace, table := GetTableInfoFromPath(fullPath)
|
|
if bucket == "" || namespace == "" || table == "" {
|
|
return nil // Not deep enough to need validation
|
|
}
|
|
|
|
if strings.HasPrefix(bucket, ".") {
|
|
return nil
|
|
}
|
|
|
|
resp, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
|
Directory: TablesPath,
|
|
Name: bucket,
|
|
})
|
|
if err != nil {
|
|
if errors.Is(err, filer_pb.ErrNotFound) {
|
|
return nil
|
|
}
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "failed to verify table bucket: " + err.Error(),
|
|
}
|
|
}
|
|
if resp == nil || !IsTableBucketEntry(resp.Entry) {
|
|
return nil
|
|
}
|
|
|
|
// Now check basic layout once we know this is a table bucket path.
|
|
if err := v.ValidateTableBucketUpload(fullPath); err != nil {
|
|
return err
|
|
}
|
|
|
|
// Verify the table exists and has ICEBERG format by checking its metadata
|
|
tablePath := GetTablePath(bucket, namespace, table)
|
|
dir, name := splitPath(tablePath)
|
|
|
|
resp, err = filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
|
Directory: dir,
|
|
Name: name,
|
|
})
|
|
if err != nil {
|
|
// Distinguish between "not found" and other errors
|
|
if errors.Is(err, filer_pb.ErrNotFound) {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table does not exist",
|
|
}
|
|
}
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "failed to verify table existence: " + err.Error(),
|
|
}
|
|
}
|
|
|
|
// Check if table has metadata indicating ICEBERG format
|
|
if resp.Entry == nil || resp.Entry.Extended == nil {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table is not a valid ICEBERG table (missing metadata)",
|
|
}
|
|
}
|
|
|
|
metadataBytes, ok := resp.Entry.Extended[ExtendedKeyMetadata]
|
|
if !ok {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table is not in ICEBERG format (missing format metadata)",
|
|
}
|
|
}
|
|
|
|
var metadata tableMetadataInternal
|
|
if err := json.Unmarshal(metadataBytes, &metadata); err != nil {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "failed to parse table metadata: " + err.Error(),
|
|
}
|
|
}
|
|
const TableFormatIceberg = "ICEBERG"
|
|
if metadata.Format != TableFormatIceberg {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table is not in " + TableFormatIceberg + " format",
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|