Add table operations test (#8241)
* Add Trino blog operations test * Update test/s3tables/catalog_trino/trino_blog_operations_test.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat: add table bucket path helpers and filer operations - Add table object root and table location mapping directories - Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers - Support table location bucket mapping for S3 access * feat: manage table bucket object roots on creation/deletion - Create .objects directory for table buckets on creation - Clean up table object bucket paths on deletion - Enable S3 operations on table bucket object roots * feat: add table location mapping for Iceberg REST - Track table location bucket mappings when tables are created/updated/deleted - Enable location-based routing for S3 operations on table data * feat: route S3 operations to table bucket object roots - Route table-s3 bucket names to mapped table paths - Route table buckets to object root directories - Support table location bucket mapping lookup * feat: emit table-s3 locations from Iceberg REST - Generate unique table-s3 bucket names with UUID suffix - Store table metadata under table bucket paths - Return table-s3 locations for Trino compatibility * fix: handle missing directories in S3 list operations - Propagate ErrNotFound from ListEntries for non-existent directories - Treat missing directories as empty results for list operations - Fixes Trino non-empty location checks on table creation * test: improve Trino CSV parsing for single-value results - Sanitize Trino output to skip jline warnings - Handle single-value CSV results without header rows - Strip quotes from numeric values in tests * refactor: use bucket path helpers throughout S3 API - Replace direct bucket path operations with helper functions - Leverage centralized table bucket routing logic - Improve maintainability with consistent path resolution * fix: add table bucket cache and improve filer error handling - Cache table bucket lookups to reduce filer overhead on repeated checks - Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error - Fix delete order in handler_bucket_get_list_delete: delete table object before directory - Make location mapping errors best-effort: log and continue, don't fail API - Update table location mappings to delete stale prior bucket mappings on update - Add 1-second sleep before timestamp time travel query to ensure timestamps are in past - Fix CSV parsing: examine all lines, not skip first; handle single-value rows * fix: properly handle stale metadata location mapping cleanup - Capture oldMetadataLocation before mutation in handleUpdateTable - Update updateTableLocationMapping to accept both old and new locations - Use passed-in oldMetadataLocation to detect location changes - Delete stale mapping only when location actually changes - Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping) - Improve logging to show old -> new location transitions * refactor: cleanup imports and cache design - Remove unused 'sync' import from bucket_paths.go - Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling - Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache - Improve cache separation: table buckets cache is now separate from bucket metadata cache * fix: improve cache invalidation and add transient error handling Cache invalidation (critical fix): - Add tableLocationCache to BucketRegistry for location mapping lookups - Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata - Prevents stale cache entries when buckets are deleted/recreated Transient error handling: - Only cache table bucket lookups when conclusive (found or ErrNotFound) - Skip caching on transient errors (network, permission, etc) - Prevents marking real table buckets as non-table due to transient failures Performance optimization: - Cache tableLocationDir results to avoid repeated filer RPCs on hot paths - tableLocationDir now checks cache before making expensive filer lookups - Cache stores empty string for 'not found' to avoid redundant lookups Code clarity: - Add comment to deleteDirectory explaining DeleteEntry response lacks Error field * go fmt * fix: mirror transient error handling in tableLocationDir and optimize bucketDir Transient error handling: - tableLocationDir now only caches definitive results - Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses - Improves reliability on flaky systems or during recovery Performance optimization: - bucketDir avoids redundant isTableBucket call via bucketRoot - Directly use s3a.option.BucketsPath for regular buckets - Saves one cache lookup for every non-table bucket operation * fix: revert bucketDir optimization to preserve bucketRoot logic The optimization to directly use BucketsPath bypassed bucketRoot's logic and caused issues with S3 list operations on delimiter+prefix cases. Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly handles all bucket types and ensures consistent path resolution across the codebase. The slight performance cost of an extra cache lookup is worth the correctness and consistency benefits. * feat: move table buckets under /buckets Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries. * Fix S3 Tables code review issues - handler_bucket_create.go: Fix bucket existence check to properly validate entryResp.Entry before setting s3BucketExists flag (nil Entry should not indicate existing bucket) - bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified buckets root path for all bucket types - file_browser_data.go: Optimize by extracting table bucket check early to avoid redundant WithFilerClient call * Fix list prefix delimiter handling * Handle list errors conservatively * Fix Trino FOR TIMESTAMP query - use past timestamp Iceberg requires the timestamp to be strictly in the past. Use current_timestamp - interval '1' second instead of current_timestamp. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
@@ -20,7 +20,7 @@ var (
|
||||
func (h *S3TablesHandler) createDirectory(ctx context.Context, client filer_pb.SeaweedFilerClient, path string) error {
|
||||
dir, name := splitPath(path)
|
||||
now := time.Now().Unix()
|
||||
_, err := client.CreateEntry(ctx, &filer_pb.CreateEntryRequest{
|
||||
return filer_pb.CreateEntry(ctx, client, &filer_pb.CreateEntryRequest{
|
||||
Directory: dir,
|
||||
Entry: &filer_pb.Entry{
|
||||
Name: name,
|
||||
@@ -28,13 +28,74 @@ func (h *S3TablesHandler) createDirectory(ctx context.Context, client filer_pb.S
|
||||
Attributes: &filer_pb.FuseAttributes{
|
||||
Mtime: now,
|
||||
Crtime: now,
|
||||
FileMode: uint32(0755 | os.ModeDir), // Directory mode
|
||||
FileMode: uint32(0755 | os.ModeDir),
|
||||
},
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// ensureDirectory ensures a directory exists at the specified path
|
||||
func (h *S3TablesHandler) ensureDirectory(ctx context.Context, client filer_pb.SeaweedFilerClient, path string) error {
|
||||
dir, name := splitPath(path)
|
||||
_, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: dir,
|
||||
Name: name,
|
||||
})
|
||||
if err == nil {
|
||||
return nil
|
||||
}
|
||||
if errors.Is(err, filer_pb.ErrNotFound) {
|
||||
return h.createDirectory(ctx, client, path)
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
// upsertFile creates or updates a small file with the given content
|
||||
func (h *S3TablesHandler) upsertFile(ctx context.Context, client filer_pb.SeaweedFilerClient, path string, data []byte) error {
|
||||
dir, name := splitPath(path)
|
||||
now := time.Now().Unix()
|
||||
resp, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: dir,
|
||||
Name: name,
|
||||
})
|
||||
if err != nil {
|
||||
if !errors.Is(err, filer_pb.ErrNotFound) {
|
||||
return err
|
||||
}
|
||||
return filer_pb.CreateEntry(ctx, client, &filer_pb.CreateEntryRequest{
|
||||
Directory: dir,
|
||||
Entry: &filer_pb.Entry{
|
||||
Name: name,
|
||||
Content: data,
|
||||
Attributes: &filer_pb.FuseAttributes{
|
||||
Mtime: now,
|
||||
Crtime: now,
|
||||
FileMode: uint32(0644),
|
||||
FileSize: uint64(len(data)),
|
||||
},
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
entry := resp.Entry
|
||||
if entry.Attributes == nil {
|
||||
entry.Attributes = &filer_pb.FuseAttributes{}
|
||||
}
|
||||
entry.Attributes.Mtime = now
|
||||
entry.Attributes.FileSize = uint64(len(data))
|
||||
entry.Content = data
|
||||
return filer_pb.UpdateEntry(ctx, client, &filer_pb.UpdateEntryRequest{
|
||||
Directory: dir,
|
||||
Entry: entry,
|
||||
})
|
||||
}
|
||||
|
||||
// deleteEntryIfExists removes an entry if it exists, ignoring missing errors
|
||||
func (h *S3TablesHandler) deleteEntryIfExists(ctx context.Context, client filer_pb.SeaweedFilerClient, path string) error {
|
||||
dir, name := splitPath(path)
|
||||
return filer_pb.DoRemove(ctx, client, dir, name, true, false, true, false, nil)
|
||||
}
|
||||
|
||||
// setExtendedAttribute sets an extended attribute on an existing entry
|
||||
func (h *S3TablesHandler) setExtendedAttribute(ctx context.Context, client filer_pb.SeaweedFilerClient, path, key string, data []byte) error {
|
||||
dir, name := splitPath(path)
|
||||
@@ -57,11 +118,10 @@ func (h *S3TablesHandler) setExtendedAttribute(ctx context.Context, client filer
|
||||
entry.Extended[key] = data
|
||||
|
||||
// Save the updated entry
|
||||
_, err = client.UpdateEntry(ctx, &filer_pb.UpdateEntryRequest{
|
||||
return filer_pb.UpdateEntry(ctx, client, &filer_pb.UpdateEntryRequest{
|
||||
Directory: dir,
|
||||
Entry: entry,
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// getExtendedAttribute gets an extended attribute from an entry
|
||||
@@ -108,14 +168,14 @@ func (h *S3TablesHandler) deleteExtendedAttribute(ctx context.Context, client fi
|
||||
}
|
||||
|
||||
// Save the updated entry
|
||||
_, err = client.UpdateEntry(ctx, &filer_pb.UpdateEntryRequest{
|
||||
return filer_pb.UpdateEntry(ctx, client, &filer_pb.UpdateEntryRequest{
|
||||
Directory: dir,
|
||||
Entry: entry,
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// deleteDirectory deletes a directory and all its contents
|
||||
// Note: DeleteEntry RPC response doesn't have an Error field, so we only check the RPC err
|
||||
func (h *S3TablesHandler) deleteDirectory(ctx context.Context, client filer_pb.SeaweedFilerClient, path string) error {
|
||||
dir, name := splitPath(path)
|
||||
_, err := client.DeleteEntry(ctx, &filer_pb.DeleteEntryRequest{
|
||||
|
||||
@@ -16,14 +16,15 @@ import (
|
||||
)
|
||||
|
||||
const (
|
||||
TablesPath = "/table-buckets"
|
||||
TablesPath = s3_constants.DefaultBucketsPath
|
||||
DefaultAccountID = "000000000000"
|
||||
DefaultRegion = "us-east-1"
|
||||
|
||||
// Extended entry attributes for metadata storage
|
||||
ExtendedKeyMetadata = "s3tables.metadata"
|
||||
ExtendedKeyPolicy = "s3tables.policy"
|
||||
ExtendedKeyTags = "s3tables.tags"
|
||||
ExtendedKeyTableBucket = "s3tables.tableBucket"
|
||||
ExtendedKeyMetadata = "s3tables.metadata"
|
||||
ExtendedKeyPolicy = "s3tables.policy"
|
||||
ExtendedKeyTags = "s3tables.tags"
|
||||
|
||||
// Maximum request body size (10MB)
|
||||
maxRequestBodySize = 10 * 1024 * 1024
|
||||
|
||||
@@ -38,16 +38,16 @@ func (h *S3TablesHandler) handleCreateTableBucket(w http.ResponseWriter, r *http
|
||||
// Check if bucket already exists and ensure no conflict with object store buckets
|
||||
tableBucketExists := false
|
||||
s3BucketExists := false
|
||||
bucketsPath := s3_constants.DefaultBucketsPath
|
||||
err := filerClient.WithFilerClient(false, func(client filer_pb.SeaweedFilerClient) error {
|
||||
resp, err := client.GetFilerConfiguration(r.Context(), &filer_pb.GetFilerConfigurationRequest{})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
bucketsPath := resp.DirBuckets
|
||||
if bucketsPath == "" {
|
||||
bucketsPath = s3_constants.DefaultBucketsPath
|
||||
if resp.DirBuckets != "" {
|
||||
bucketsPath = resp.DirBuckets
|
||||
}
|
||||
_, err = filer_pb.LookupEntry(r.Context(), client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
entryResp, err := filer_pb.LookupEntry(r.Context(), client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: bucketsPath,
|
||||
Name: req.Name,
|
||||
})
|
||||
@@ -55,20 +55,15 @@ func (h *S3TablesHandler) handleCreateTableBucket(w http.ResponseWriter, r *http
|
||||
if !errors.Is(err, filer_pb.ErrNotFound) {
|
||||
return err
|
||||
}
|
||||
} else {
|
||||
s3BucketExists = true
|
||||
return nil
|
||||
}
|
||||
_, err = filer_pb.LookupEntry(r.Context(), client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: TablesPath,
|
||||
Name: req.Name,
|
||||
})
|
||||
if err != nil {
|
||||
if errors.Is(err, filer_pb.ErrNotFound) {
|
||||
return nil
|
||||
if entryResp != nil && entryResp.Entry != nil {
|
||||
if IsTableBucketEntry(entryResp.Entry) {
|
||||
tableBucketExists = true
|
||||
} else {
|
||||
s3BucketExists = true
|
||||
}
|
||||
return err
|
||||
}
|
||||
tableBucketExists = true
|
||||
return nil
|
||||
})
|
||||
|
||||
@@ -78,15 +73,14 @@ func (h *S3TablesHandler) handleCreateTableBucket(w http.ResponseWriter, r *http
|
||||
return err
|
||||
}
|
||||
|
||||
if s3BucketExists {
|
||||
h.writeError(w, http.StatusConflict, ErrCodeBucketAlreadyExists, fmt.Sprintf("bucket name %s is already used by an object store bucket", req.Name))
|
||||
return fmt.Errorf("bucket name conflicts with object store bucket")
|
||||
}
|
||||
|
||||
if tableBucketExists {
|
||||
h.writeError(w, http.StatusConflict, ErrCodeBucketAlreadyExists, fmt.Sprintf("table bucket %s already exists", req.Name))
|
||||
return fmt.Errorf("bucket already exists")
|
||||
}
|
||||
if s3BucketExists {
|
||||
h.writeError(w, http.StatusConflict, ErrCodeBucketAlreadyExists, fmt.Sprintf("bucket %s already exists", req.Name))
|
||||
return fmt.Errorf("bucket already exists")
|
||||
}
|
||||
|
||||
// Create the bucket directory and set metadata as extended attributes
|
||||
now := time.Now()
|
||||
@@ -111,11 +105,24 @@ func (h *S3TablesHandler) handleCreateTableBucket(w http.ResponseWriter, r *http
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure object root directory exists for table bucket S3 operations
|
||||
if err := h.ensureDirectory(r.Context(), client, GetTableObjectRootDir()); err != nil {
|
||||
return fmt.Errorf("failed to create table object root directory: %w", err)
|
||||
}
|
||||
if err := h.ensureDirectory(r.Context(), client, GetTableObjectBucketPath(req.Name)); err != nil {
|
||||
return fmt.Errorf("failed to create table object bucket directory: %w", err)
|
||||
}
|
||||
|
||||
// Create bucket directory
|
||||
if err := h.createDirectory(r.Context(), client, bucketPath); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Mark as a table bucket
|
||||
if err := h.setExtendedAttribute(r.Context(), client, bucketPath, ExtendedKeyTableBucket, []byte("true")); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Set metadata as extended attribute
|
||||
if err := h.setExtendedAttribute(r.Context(), client, bucketPath, ExtendedKeyMetadata, metadataBytes); err != nil {
|
||||
return err
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/seaweedfs/seaweedfs/weed/glog"
|
||||
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
||||
)
|
||||
|
||||
@@ -166,6 +167,10 @@ func (h *S3TablesHandler) handleListTableBuckets(w http.ResponseWriter, r *http.
|
||||
continue
|
||||
}
|
||||
|
||||
if !IsTableBucketEntry(entry.Entry) {
|
||||
continue
|
||||
}
|
||||
|
||||
// Read metadata from extended attribute
|
||||
data, ok := entry.Entry.Extended[ExtendedKeyMetadata]
|
||||
if !ok {
|
||||
@@ -343,7 +348,22 @@ func (h *S3TablesHandler) handleDeleteTableBucket(w http.ResponseWriter, r *http
|
||||
|
||||
// Delete the bucket
|
||||
err = filerClient.WithFilerClient(false, func(client filer_pb.SeaweedFilerClient) error {
|
||||
return h.deleteDirectory(r.Context(), client, bucketPath)
|
||||
// Delete table object entry first, then directory
|
||||
// This ensures we clean up the leaf entry even if directory deletion fails
|
||||
tableObjErr := h.deleteEntryIfExists(r.Context(), client, GetTableObjectBucketPath(bucketName))
|
||||
dirErr := h.deleteDirectory(r.Context(), client, bucketPath)
|
||||
|
||||
// Log any errors but don't fail if one succeeds
|
||||
if tableObjErr != nil && dirErr != nil {
|
||||
return fmt.Errorf("delete table object failed: %w, delete directory failed: %w", tableObjErr, dirErr)
|
||||
}
|
||||
if tableObjErr != nil {
|
||||
glog.V(1).Infof("failed to delete table object for %s: %v", bucketName, tableObjErr)
|
||||
}
|
||||
if dirErr != nil {
|
||||
glog.V(1).Infof("failed to delete table bucket dir for %s: %v", bucketName, dirErr)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
|
||||
@@ -18,7 +18,7 @@ func (h *S3TablesHandler) extractResourceOwnerAndBucket(
|
||||
resourcePath string,
|
||||
rType ResourceType,
|
||||
) (ownerAccountID, bucketName string, err error) {
|
||||
// Extract bucket name from resource path (format: /table-buckets/{bucket}/... for both tables and buckets)
|
||||
// Extract bucket name from resource path (format: /buckets/{bucket}/... for both tables and buckets)
|
||||
parts := strings.Split(strings.Trim(resourcePath, "/"), "/")
|
||||
if len(parts) >= 2 {
|
||||
bucketName = parts[1]
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package s3tables
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
@@ -227,6 +228,10 @@ func (h *S3TablesHandler) handleCreateTable(w http.ResponseWriter, r *http.Reque
|
||||
}
|
||||
}
|
||||
|
||||
if err := h.updateTableLocationMapping(r.Context(), client, "", req.MetadataLocation, tablePath); err != nil {
|
||||
glog.V(1).Infof("failed to update table location mapping for %s: %v", req.MetadataLocation, err)
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
|
||||
@@ -909,7 +914,13 @@ func (h *S3TablesHandler) handleDeleteTable(w http.ResponseWriter, r *http.Reque
|
||||
|
||||
// Delete the table
|
||||
err = filerClient.WithFilerClient(false, func(client filer_pb.SeaweedFilerClient) error {
|
||||
return h.deleteDirectory(r.Context(), client, tablePath)
|
||||
if err := h.deleteDirectory(r.Context(), client, tablePath); err != nil {
|
||||
return err
|
||||
}
|
||||
if err := h.deleteTableLocationMapping(r.Context(), client, metadata.MetadataLocation); err != nil {
|
||||
glog.V(1).Infof("failed to delete table location mapping for %s: %v", metadata.MetadataLocation, err)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
@@ -1051,6 +1062,9 @@ func (h *S3TablesHandler) handleUpdateTable(w http.ResponseWriter, r *http.Reque
|
||||
return ErrVersionTokenMismatch
|
||||
}
|
||||
|
||||
// Capture old metadata location before mutation for stale mapping cleanup
|
||||
oldMetadataLocation := metadata.MetadataLocation
|
||||
|
||||
// Update metadata
|
||||
if req.Metadata != nil {
|
||||
if metadata.Metadata == nil {
|
||||
@@ -1086,7 +1100,13 @@ func (h *S3TablesHandler) handleUpdateTable(w http.ResponseWriter, r *http.Reque
|
||||
}
|
||||
|
||||
err = filerClient.WithFilerClient(false, func(client filer_pb.SeaweedFilerClient) error {
|
||||
return h.setExtendedAttribute(r.Context(), client, tablePath, ExtendedKeyMetadata, metadataBytes)
|
||||
if err := h.setExtendedAttribute(r.Context(), client, tablePath, ExtendedKeyMetadata, metadataBytes); err != nil {
|
||||
return err
|
||||
}
|
||||
if err := h.updateTableLocationMapping(r.Context(), client, oldMetadataLocation, metadata.MetadataLocation, tablePath); err != nil {
|
||||
glog.V(1).Infof("failed to update table location mapping for %s -> %s: %v", oldMetadataLocation, metadata.MetadataLocation, err)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
@@ -1101,3 +1121,35 @@ func (h *S3TablesHandler) handleUpdateTable(w http.ResponseWriter, r *http.Reque
|
||||
})
|
||||
return nil
|
||||
}
|
||||
|
||||
func (h *S3TablesHandler) updateTableLocationMapping(ctx context.Context, client filer_pb.SeaweedFilerClient, oldMetadataLocation, newMetadataLocation, tablePath string) error {
|
||||
newTableLocationBucket, ok := parseTableLocationBucket(newMetadataLocation)
|
||||
if !ok {
|
||||
return nil
|
||||
}
|
||||
|
||||
if err := h.ensureDirectory(ctx, client, GetTableLocationMappingDir()); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// If the metadata location changed, delete the stale mapping for the old bucket
|
||||
if oldMetadataLocation != "" && oldMetadataLocation != newMetadataLocation {
|
||||
oldTableLocationBucket, ok := parseTableLocationBucket(oldMetadataLocation)
|
||||
if ok && oldTableLocationBucket != newTableLocationBucket {
|
||||
oldMappingPath := GetTableLocationMappingPath(oldTableLocationBucket)
|
||||
if err := h.deleteEntryIfExists(ctx, client, oldMappingPath); err != nil {
|
||||
glog.V(1).Infof("failed to delete stale mapping for %s: %v", oldTableLocationBucket, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return h.upsertFile(ctx, client, GetTableLocationMappingPath(newTableLocationBucket), []byte(tablePath))
|
||||
}
|
||||
|
||||
func (h *S3TablesHandler) deleteTableLocationMapping(ctx context.Context, client filer_pb.SeaweedFilerClient, metadataLocation string) error {
|
||||
tableLocationBucket, ok := parseTableLocationBucket(metadataLocation)
|
||||
if !ok {
|
||||
return nil
|
||||
}
|
||||
return h.deleteEntryIfExists(ctx, client, GetTableLocationMappingPath(tableLocationBucket))
|
||||
}
|
||||
|
||||
@@ -254,7 +254,7 @@ func NewTableBucketFileValidator() *TableBucketFileValidator {
|
||||
}
|
||||
|
||||
// ValidateTableBucketUpload checks if a file upload to a table bucket conforms to Iceberg layout
|
||||
// fullPath is the complete filer path (e.g., /table-buckets/mybucket/mynamespace/mytable/data/file.parquet)
|
||||
// fullPath is the complete filer path (e.g., /buckets/mybucket/mynamespace/mytable/data/file.parquet)
|
||||
// Returns nil if the path is not a table bucket path or if validation passes
|
||||
// Returns an error if the file doesn't conform to Iceberg layout
|
||||
func (v *TableBucketFileValidator) ValidateTableBucketUpload(fullPath string) error {
|
||||
@@ -264,7 +264,7 @@ func (v *TableBucketFileValidator) ValidateTableBucketUpload(fullPath string) er
|
||||
}
|
||||
|
||||
// Extract the path relative to table bucket root
|
||||
// Format: /table-buckets/{bucket}/{namespace}/{table}/{relative-path}
|
||||
// Format: /buckets/{bucket}/{namespace}/{table}/{relative-path}
|
||||
relativePath := strings.TrimPrefix(fullPath, TablesPath+"/")
|
||||
parts := strings.SplitN(relativePath, "/", 4)
|
||||
|
||||
@@ -307,7 +307,7 @@ func (v *TableBucketFileValidator) ValidateTableBucketUpload(fullPath string) er
|
||||
return v.layoutValidator.ValidateFilePath(tableRelativePath)
|
||||
}
|
||||
|
||||
// IsTableBucketPath checks if a path is under the table-buckets directory
|
||||
// IsTableBucketPath checks if a path is under the table buckets directory
|
||||
func IsTableBucketPath(fullPath string) bool {
|
||||
return strings.HasPrefix(fullPath, TablesPath+"/")
|
||||
}
|
||||
@@ -341,11 +341,6 @@ func (v *TableBucketFileValidator) ValidateTableBucketUploadWithClient(
|
||||
client filer_pb.SeaweedFilerClient,
|
||||
fullPath string,
|
||||
) error {
|
||||
// First check basic layout
|
||||
if err := v.ValidateTableBucketUpload(fullPath); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// If not a table bucket path, nothing more to check
|
||||
if !IsTableBucketPath(fullPath) {
|
||||
return nil
|
||||
@@ -357,11 +352,37 @@ func (v *TableBucketFileValidator) ValidateTableBucketUploadWithClient(
|
||||
return nil // Not deep enough to need validation
|
||||
}
|
||||
|
||||
if strings.HasPrefix(bucket, ".") {
|
||||
return nil
|
||||
}
|
||||
|
||||
resp, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: TablesPath,
|
||||
Name: bucket,
|
||||
})
|
||||
if err != nil {
|
||||
if errors.Is(err, filer_pb.ErrNotFound) {
|
||||
return nil
|
||||
}
|
||||
return &IcebergLayoutError{
|
||||
Code: ErrCodeInvalidIcebergLayout,
|
||||
Message: "failed to verify table bucket: " + err.Error(),
|
||||
}
|
||||
}
|
||||
if resp == nil || !IsTableBucketEntry(resp.Entry) {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Now check basic layout once we know this is a table bucket path.
|
||||
if err := v.ValidateTableBucketUpload(fullPath); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Verify the table exists and has ICEBERG format by checking its metadata
|
||||
tablePath := GetTablePath(bucket, namespace, table)
|
||||
dir, name := splitPath(tablePath)
|
||||
|
||||
resp, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
resp, err = filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: dir,
|
||||
Name: name,
|
||||
})
|
||||
|
||||
@@ -104,28 +104,28 @@ func TestTableBucketFileValidator_ValidateTableBucketUpload(t *testing.T) {
|
||||
{"filer path", "/home/user/file.txt", false},
|
||||
|
||||
// Table bucket structure paths (creating directories)
|
||||
{"table bucket root", "/table-buckets/mybucket", false},
|
||||
{"namespace dir", "/table-buckets/mybucket/myns", false},
|
||||
{"table dir", "/table-buckets/mybucket/myns/mytable", false},
|
||||
{"table dir trailing slash", "/table-buckets/mybucket/myns/mytable/", false},
|
||||
{"table bucket root", "/buckets/mybucket", false},
|
||||
{"namespace dir", "/buckets/mybucket/myns", false},
|
||||
{"table dir", "/buckets/mybucket/myns/mytable", false},
|
||||
{"table dir trailing slash", "/buckets/mybucket/myns/mytable/", false},
|
||||
|
||||
// Valid table bucket file uploads
|
||||
{"valid parquet upload", "/table-buckets/mybucket/myns/mytable/data/file.parquet", false},
|
||||
{"valid metadata upload", "/table-buckets/mybucket/myns/mytable/metadata/v1.metadata.json", false},
|
||||
{"valid partitioned data", "/table-buckets/mybucket/myns/mytable/data/year=2024/file.parquet", false},
|
||||
{"valid parquet upload", "/buckets/mybucket/myns/mytable/data/file.parquet", false},
|
||||
{"valid metadata upload", "/buckets/mybucket/myns/mytable/metadata/v1.metadata.json", false},
|
||||
{"valid partitioned data", "/buckets/mybucket/myns/mytable/data/year=2024/file.parquet", false},
|
||||
|
||||
// Invalid table bucket file uploads
|
||||
{"invalid file type", "/table-buckets/mybucket/myns/mytable/data/file.csv", true},
|
||||
{"invalid top-level dir", "/table-buckets/mybucket/myns/mytable/invalid/file.parquet", true},
|
||||
{"root file in table", "/table-buckets/mybucket/myns/mytable/file.parquet", true},
|
||||
{"invalid file type", "/buckets/mybucket/myns/mytable/data/file.csv", true},
|
||||
{"invalid top-level dir", "/buckets/mybucket/myns/mytable/invalid/file.parquet", true},
|
||||
{"root file in table", "/buckets/mybucket/myns/mytable/file.parquet", true},
|
||||
|
||||
// Empty segment cases
|
||||
{"empty bucket", "/table-buckets//myns/mytable/data/file.parquet", true},
|
||||
{"empty namespace", "/table-buckets/mybucket//mytable/data/file.parquet", true},
|
||||
{"empty table", "/table-buckets/mybucket/myns//data/file.parquet", true},
|
||||
{"empty bucket dir", "/table-buckets//", true},
|
||||
{"empty namespace dir", "/table-buckets/mybucket//", true},
|
||||
{"table double slash bypass", "/table-buckets/mybucket/myns/mytable//data/file.parquet", true},
|
||||
{"empty bucket", "/buckets//myns/mytable/data/file.parquet", true},
|
||||
{"empty namespace", "/buckets/mybucket//mytable/data/file.parquet", true},
|
||||
{"empty table", "/buckets/mybucket/myns//data/file.parquet", true},
|
||||
{"empty bucket dir", "/buckets//", true},
|
||||
{"empty namespace dir", "/buckets/mybucket//", true},
|
||||
{"table double slash bypass", "/buckets/mybucket/myns/mytable//data/file.parquet", true},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
@@ -143,11 +143,10 @@ func TestIsTableBucketPath(t *testing.T) {
|
||||
path string
|
||||
want bool
|
||||
}{
|
||||
{"/table-buckets/mybucket", true},
|
||||
{"/table-buckets/mybucket/ns/table/data/file.parquet", true},
|
||||
{"/buckets/mybucket", false},
|
||||
{"/buckets/mybucket", true},
|
||||
{"/buckets/mybucket/ns/table/data/file.parquet", true},
|
||||
{"/home/user/file.txt", false},
|
||||
{"table-buckets/mybucket", false}, // missing leading slash
|
||||
{"buckets/mybucket", false}, // missing leading slash
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
@@ -166,11 +165,11 @@ func TestGetTableInfoFromPath(t *testing.T) {
|
||||
wantNamespace string
|
||||
wantTable string
|
||||
}{
|
||||
{"/table-buckets/mybucket/myns/mytable/data/file.parquet", "mybucket", "myns", "mytable"},
|
||||
{"/table-buckets/mybucket/myns/mytable", "mybucket", "myns", "mytable"},
|
||||
{"/table-buckets/mybucket/myns", "mybucket", "myns", ""},
|
||||
{"/table-buckets/mybucket", "mybucket", "", ""},
|
||||
{"/buckets/mybucket", "", "", ""},
|
||||
{"/buckets/mybucket/myns/mytable/data/file.parquet", "mybucket", "myns", "mytable"},
|
||||
{"/buckets/mybucket/myns/mytable", "mybucket", "myns", "mytable"},
|
||||
{"/buckets/mybucket/myns", "mybucket", "myns", ""},
|
||||
{"/buckets/mybucket", "mybucket", "", ""},
|
||||
{"/home/user/file.txt", "", "", ""},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
|
||||
@@ -9,6 +9,8 @@ import (
|
||||
"regexp"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
||||
)
|
||||
|
||||
const (
|
||||
@@ -17,6 +19,11 @@ const (
|
||||
tableNamePatternStr = `[a-z0-9_]+`
|
||||
)
|
||||
|
||||
const (
|
||||
tableLocationMappingsDirName = ".table-location-mappings"
|
||||
tableObjectRootDirName = ".objects"
|
||||
)
|
||||
|
||||
var (
|
||||
bucketARNPattern = regexp.MustCompile(`^arn:aws:s3tables:[^:]*:[^:]*:bucket/(` + bucketNamePatternStr + `)$`)
|
||||
tableARNPattern = regexp.MustCompile(`^arn:aws:s3tables:[^:]*:[^:]*:bucket/(` + bucketNamePatternStr + `)/table/(` + tableNamespacePatternStr + `)/(` + tableNamePatternStr + `)$`)
|
||||
@@ -94,6 +101,26 @@ func GetTablePath(bucketName, namespace, tableName string) string {
|
||||
return path.Join(TablesPath, bucketName, namespace, tableName)
|
||||
}
|
||||
|
||||
// GetTableObjectRootDir returns the root path for table bucket object storage
|
||||
func GetTableObjectRootDir() string {
|
||||
return path.Join(TablesPath, tableObjectRootDirName)
|
||||
}
|
||||
|
||||
// GetTableObjectBucketPath returns the filer path for table bucket object storage
|
||||
func GetTableObjectBucketPath(bucketName string) string {
|
||||
return path.Join(GetTableObjectRootDir(), bucketName)
|
||||
}
|
||||
|
||||
// GetTableLocationMappingDir returns the root path for table location bucket mappings
|
||||
func GetTableLocationMappingDir() string {
|
||||
return path.Join(TablesPath, tableLocationMappingsDirName)
|
||||
}
|
||||
|
||||
// GetTableLocationMappingPath returns the filer path for a table location bucket mapping
|
||||
func GetTableLocationMappingPath(tableLocationBucket string) string {
|
||||
return path.Join(GetTableLocationMappingDir(), tableLocationBucket)
|
||||
}
|
||||
|
||||
// Metadata structures
|
||||
|
||||
type tableBucketMetadata struct {
|
||||
@@ -123,6 +150,15 @@ type tableMetadataInternal struct {
|
||||
Metadata *TableMetadata `json:"metadata,omitempty"`
|
||||
}
|
||||
|
||||
// IsTableBucketEntry returns true when the entry is marked as a table bucket.
|
||||
func IsTableBucketEntry(entry *filer_pb.Entry) bool {
|
||||
if entry == nil || entry.Extended == nil {
|
||||
return false
|
||||
}
|
||||
_, ok := entry.Extended[ExtendedKeyTableBucket]
|
||||
return ok
|
||||
}
|
||||
|
||||
// Utility functions
|
||||
|
||||
// validateBucketName validates bucket name and returns an error if invalid.
|
||||
@@ -182,6 +218,22 @@ func ValidateBucketName(name string) error {
|
||||
return validateBucketName(name)
|
||||
}
|
||||
|
||||
func parseTableLocationBucket(metadataLocation string) (string, bool) {
|
||||
if !strings.HasPrefix(metadataLocation, "s3://") {
|
||||
return "", false
|
||||
}
|
||||
trimmed := strings.TrimPrefix(metadataLocation, "s3://")
|
||||
trimmed = strings.TrimSuffix(trimmed, "/")
|
||||
if trimmed == "" {
|
||||
return "", false
|
||||
}
|
||||
bucket, _, _ := strings.Cut(trimmed, "/")
|
||||
if bucket == "" || !strings.HasSuffix(bucket, "--table-s3") {
|
||||
return "", false
|
||||
}
|
||||
return bucket, true
|
||||
}
|
||||
|
||||
// BuildBucketARN builds a bucket ARN with the provided region and account ID.
|
||||
// If region is empty, the ARN will omit the region field.
|
||||
func BuildBucketARN(region, accountID, bucketName string) (string, error) {
|
||||
|
||||
Reference in New Issue
Block a user