* Add Iceberg table details view
* Enhance Iceberg catalog browsing UI
* Fix Iceberg UI security and logic issues
- Fix selectSchema() and partitionFieldsFromFullMetadata() to always search for matching IDs instead of checking != 0
- Fix snapshotsFromFullMetadata() to defensive-copy before sorting to prevent mutating caller's slice
- Fix XSS vulnerabilities in s3tables.js: replace innerHTML with textContent/createElement for user-controlled data
- Fix deleteIcebergTable() to redirect to namespace tables list on details page instead of reloading
- Fix data-bs-target in iceberg_namespaces.templ: remove templ.SafeURL for CSS selector
- Add catalogName to delete modal data attributes for proper redirect
- Remove unused hidden inputs from create table form (icebergTableBucketArn, icebergTableNamespace)
* Regenerate templ files for Iceberg UI updates
* Support complex Iceberg type objects in schema
Change Type field from string to json.RawMessage in both IcebergSchemaFieldInfo
and internal icebergSchemaField to properly handle Iceberg spec's complex type
objects (e.g. {"type": "struct", "fields": [...]}). Currently test data
only shows primitive string types, but this change makes the implementation
defensively robust for future complex types by preserving the exact JSON
representation. Add typeToString() helper and update schema extraction
functions to marshal string types as JSON. Update template to convert
json.RawMessage to string for display.
* Regenerate templ files for Type field changes
* templ
* Fix additional Iceberg UI issues from code review
- Fix lazy-load flag that was set before async operation completed, preventing retries
on error; now sets loaded flag only after successful load and throws error to caller
for proper error handling and UI updates
- Add zero-time guards for CreatedAt and ModifiedAt fields in table details to avoid
displaying Go zero-time values; render dash when time is zero
- Add URL path escaping for all catalog/namespace/table names in URLs to prevent
malformed URLs when names contain special characters like /, ?, or #
- Remove redundant innerHTML clear in loadIcebergNamespaceTables that cleared twice
before appending the table list
- Fix selectSnapshotForMetrics to remove != 0 guard for consistency with selectSchema
fix; now always searches for CurrentSnapshotID without zero-value gate
- Enhance typeToString() helper to display '(complex)' for non-primitive JSON types
* Regenerate templ files for Phase 3 updates
* Fix template generation to use correct file paths
Run templ generate from repo root instead of weed/admin directory to ensure
generated _templ.go files have correct absolute paths in error messages
(e.g., 'weed/admin/view/app/iceberg_table_details.templ' instead of
'app/iceberg_table_details.templ'). This ensures both 'make admin-generate'
at repo root and 'make generate' in weed/admin directory produce identical
output with consistent file path references.
* Regenerate template files with correct path references
* Validate S3 Tables names in UI
- Add client-side validation for table bucket and namespace names to surface
errors for invalid characters (dots/underscores) before submission
- Use HTML validity messages with reportValidity for immediate feedback
- Update namespace helper text to reflect actual constraints (single-level,
lowercase letters, numbers, and underscores)
* Regenerate templ files for namespace helper text
* Fix Iceberg catalog REST link and actions
* Disallow S3 object access on table buckets
* Validate Iceberg layout for table bucket objects
* Fix REST API link to /v1/config
* merge iceberg page with table bucket page
* Allowed Trino/Iceberg stats files in metadata validation
* fixes
- Backend/data handling:
- Normalized Iceberg type display and fallback handling in weed/admin/dash/s3tables_management.go.
- Fixed snapshot fallback pointer semantics in weed/admin/dash/s3tables_management.go.
- Added CSRF token generation/propagation/validation for namespace create/delete in:
- weed/admin/dash/csrf.go
- weed/admin/dash/auth_middleware.go
- weed/admin/dash/middleware.go
- weed/admin/dash/s3tables_management.go
- weed/admin/view/layout/layout.templ
- weed/admin/static/js/s3tables.js
- UI/template fixes:
- Zero-time guards for CreatedAt fields in:
- weed/admin/view/app/iceberg_namespaces.templ
- weed/admin/view/app/iceberg_tables.templ
- Fixed invalid templ-in-script interpolation and host/port rendering in:
- weed/admin/view/app/iceberg_catalog.templ
- weed/admin/view/app/s3tables_buckets.templ
- Added data-catalog-name consistency on Iceberg delete action in weed/admin/view/app/iceberg_tables.templ.
- Updated retry wording in weed/admin/static/js/s3tables.js.
- Regenerated all affected _templ.go files.
- S3 API/comment follow-ups:
- Reused cached table-bucket validator in weed/s3api/bucket_paths.go.
- Added validation-failure debug logging in weed/s3api/s3api_object_handlers_tagging.go.
- Added multipart path-validation design comment in weed/s3api/s3api_object_handlers_multipart.go.
- Build tooling:
- Fixed templ generate working directory issues in weed/admin/Makefile (watch + pattern rule).
* populate data
* test/s3tables: harden populate service checks
* admin: skip table buckets in object-store bucket list
* admin sidebar: move object store to top-level links
* admin iceberg catalog: guard zero times and escape links
* admin forms: add csrf/error handling and client-side name validation
* admin s3tables: fix namespace delete modal redeclaration
* admin: replace native confirm dialogs with modal helpers
* admin modal-alerts: remove noisy confirm usage console log
* reduce logs
* test/s3tables: use partitioned tables in trino and spark populate
* admin file browser: normalize filer ServerAddress for HTTP parsing
437 lines
14 KiB
Go
437 lines
14 KiB
Go
package s3tables
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"errors"
|
|
pathpkg "path"
|
|
"regexp"
|
|
"strings"
|
|
|
|
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
|
)
|
|
|
|
// Iceberg file layout validation
|
|
// Apache Iceberg tables follow a specific file layout structure:
|
|
// - metadata/ directory containing metadata files (*.json, *.avro)
|
|
// - data/ directory containing data files (*.parquet, *.orc, *.avro)
|
|
//
|
|
// Valid file patterns include:
|
|
// - metadata/v*.metadata.json (table metadata)
|
|
// - metadata/snap-*.avro (snapshot manifest lists)
|
|
// - metadata/*.avro (manifest files)
|
|
// - data/*.parquet, data/*.orc, data/*.avro (data files)
|
|
|
|
const uuidPattern = `[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}`
|
|
|
|
var (
|
|
// Allowed directories in an Iceberg table
|
|
icebergAllowedDirs = map[string]bool{
|
|
"metadata": true,
|
|
"data": true,
|
|
}
|
|
|
|
// Patterns for valid metadata files
|
|
metadataFilePatterns = []*regexp.Regexp{
|
|
regexp.MustCompile(`^v\d+\.metadata\.json$`), // Table metadata: v1.metadata.json, v2.metadata.json
|
|
regexp.MustCompile(`^snap-\d+-\d+-` + uuidPattern + `\.avro$`), // Snapshot manifests: snap-123-1-uuid.avro
|
|
regexp.MustCompile(`^` + uuidPattern + `-m\d+\.avro$`), // Manifest files: uuid-m0.avro
|
|
regexp.MustCompile(`^` + uuidPattern + `\.avro$`), // General manifest files
|
|
regexp.MustCompile(`^version-hint\.text$`), // Version hint file
|
|
regexp.MustCompile(`^` + uuidPattern + `\.metadata\.json$`), // UUID-named metadata
|
|
regexp.MustCompile(`^[^/]+\.stats$`), // Trino/Iceberg stats files
|
|
}
|
|
|
|
// Patterns for valid data files
|
|
dataFilePatterns = []*regexp.Regexp{
|
|
regexp.MustCompile(`^[^/]+\.parquet$`), // Parquet files
|
|
regexp.MustCompile(`^[^/]+\.orc$`), // ORC files
|
|
regexp.MustCompile(`^[^/]+\.avro$`), // Avro files
|
|
}
|
|
|
|
// Data file partition path pattern (e.g., year=2024/month=01/)
|
|
partitionPathPattern = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*=[^/]+$`)
|
|
|
|
// Pattern for valid subdirectory names (alphanumeric, underscore, hyphen, and UUID-style directories)
|
|
validSubdirectoryPattern = regexp.MustCompile(`^[a-zA-Z0-9_-]+$`)
|
|
)
|
|
|
|
// IcebergLayoutValidator validates that files conform to Iceberg table layout
|
|
type IcebergLayoutValidator struct{}
|
|
|
|
// NewIcebergLayoutValidator creates a new Iceberg layout validator
|
|
func NewIcebergLayoutValidator() *IcebergLayoutValidator {
|
|
return &IcebergLayoutValidator{}
|
|
}
|
|
|
|
// ValidateFilePath validates that a file path conforms to Iceberg layout
|
|
// The path should be relative to the table root (e.g., "metadata/v1.metadata.json" or "data/file.parquet")
|
|
func (v *IcebergLayoutValidator) ValidateFilePath(relativePath string) error {
|
|
// Normalize path separators
|
|
relativePath = strings.TrimPrefix(relativePath, "/")
|
|
if relativePath == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "empty file path",
|
|
}
|
|
}
|
|
|
|
parts := strings.SplitN(relativePath, "/", 2)
|
|
|
|
topDir := parts[0]
|
|
|
|
// Check if top-level directory is allowed
|
|
if !icebergAllowedDirs[topDir] {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "files must be placed in 'metadata/' or 'data/' directories",
|
|
}
|
|
}
|
|
|
|
// If it's just a bare top-level key (no trailing slash and no subpath), reject it
|
|
if len(parts) == 1 {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "must be a directory (use trailing slash) or contain a subpath",
|
|
}
|
|
}
|
|
|
|
remainingPath := parts[1]
|
|
if remainingPath == "" {
|
|
return nil // allow paths like "data/" or "metadata/"
|
|
}
|
|
|
|
switch topDir {
|
|
case "metadata":
|
|
return v.validateMetadataFile(remainingPath)
|
|
case "data":
|
|
return v.validateDataFile(remainingPath)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// validateDirectoryPath validates intermediate subdirectories in a path
|
|
// isMetadata indicates if we're in the metadata directory (true) or data directory (false)
|
|
func validateDirectoryPath(normalizedPath string, isMetadata bool) error {
|
|
if isMetadata {
|
|
// For metadata, reject any subdirectories (enforce flat structure under metadata/)
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "metadata directory does not support subdirectories",
|
|
}
|
|
}
|
|
|
|
// For data, validate each partition or subdirectory segment
|
|
subdirs := strings.Split(normalizedPath, "/")
|
|
for _, subdir := range subdirs {
|
|
if subdir == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "invalid partition or subdirectory format in data path: empty segment",
|
|
}
|
|
}
|
|
// For data, allow both partitions and valid subdirectories
|
|
if !partitionPathPattern.MatchString(subdir) && !isValidSubdirectory(subdir) {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "invalid partition or subdirectory format in data path",
|
|
}
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// validateFilePatterns validates a filename against allowed patterns
|
|
// isMetadata indicates if we're validating metadata files (true) or data files (false)
|
|
func validateFilePatterns(filename string, isMetadata bool) error {
|
|
var patterns []*regexp.Regexp
|
|
var errorMsg string
|
|
|
|
if isMetadata {
|
|
patterns = metadataFilePatterns
|
|
errorMsg = "invalid metadata file format: must be a valid Iceberg metadata, manifest, or snapshot file"
|
|
} else {
|
|
patterns = dataFilePatterns
|
|
errorMsg = "invalid data file format: must be .parquet, .orc, or .avro"
|
|
}
|
|
|
|
for _, pattern := range patterns {
|
|
if pattern.MatchString(filename) {
|
|
return nil
|
|
}
|
|
}
|
|
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: errorMsg,
|
|
}
|
|
}
|
|
|
|
// validateFile validates files with a unified logic for metadata and data directories
|
|
// isMetadata indicates whether we're validating metadata files (true) or data files (false)
|
|
// The logic is:
|
|
// 1. If path ends with "/", it's a directory - validate all parts and return nil
|
|
// 2. Otherwise, validate intermediate parts, then check the filename against patterns
|
|
func (v *IcebergLayoutValidator) validateFile(path string, isMetadata bool) error {
|
|
// Detect if it's a directory (path ends with "/")
|
|
if strings.HasSuffix(path, "/") {
|
|
// Normalize by removing trailing slash
|
|
normalizedPath := strings.TrimSuffix(path, "/")
|
|
return validateDirectoryPath(normalizedPath, isMetadata)
|
|
}
|
|
|
|
filename := pathpkg.Base(path)
|
|
|
|
// Validate intermediate subdirectories if present
|
|
// Find if there are intermediate directories by looking for the last slash
|
|
lastSlash := strings.LastIndex(path, "/")
|
|
if lastSlash != -1 {
|
|
dir := path[:lastSlash]
|
|
if err := validateDirectoryPath(dir, isMetadata); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
|
|
// Check against allowed file patterns
|
|
err := validateFilePatterns(filename, isMetadata)
|
|
if err == nil {
|
|
return nil
|
|
}
|
|
|
|
// Path could be for a directory without a trailing slash, e.g., "data/year=2024"
|
|
if !isMetadata {
|
|
if partitionPathPattern.MatchString(filename) || isValidSubdirectory(filename) {
|
|
return nil
|
|
}
|
|
}
|
|
|
|
return err
|
|
}
|
|
|
|
// validateMetadataFile validates files in the metadata/ directory
|
|
// This is a thin wrapper that calls validateFile with isMetadata=true
|
|
func (v *IcebergLayoutValidator) validateMetadataFile(path string) error {
|
|
return v.validateFile(path, true)
|
|
}
|
|
|
|
// validateDataFile validates files in the data/ directory
|
|
// This is a thin wrapper that calls validateFile with isMetadata=false
|
|
func (v *IcebergLayoutValidator) validateDataFile(path string) error {
|
|
return v.validateFile(path, false)
|
|
}
|
|
|
|
// isValidSubdirectory checks if a path component is a valid subdirectory name
|
|
func isValidSubdirectory(name string) bool {
|
|
// Allow alphanumeric, underscore, hyphen, and UUID-style directories
|
|
return validSubdirectoryPattern.MatchString(name)
|
|
}
|
|
|
|
// IcebergLayoutError represents an Iceberg layout validation error
|
|
type IcebergLayoutError struct {
|
|
Code string
|
|
Message string
|
|
}
|
|
|
|
func (e *IcebergLayoutError) Error() string {
|
|
return e.Message
|
|
}
|
|
|
|
// Error code for Iceberg layout violations
|
|
const (
|
|
ErrCodeInvalidIcebergLayout = "InvalidIcebergLayout"
|
|
)
|
|
|
|
// TableBucketFileValidator validates file uploads to table buckets
|
|
type TableBucketFileValidator struct {
|
|
layoutValidator *IcebergLayoutValidator
|
|
}
|
|
|
|
// NewTableBucketFileValidator creates a new table bucket file validator
|
|
func NewTableBucketFileValidator() *TableBucketFileValidator {
|
|
return &TableBucketFileValidator{
|
|
layoutValidator: NewIcebergLayoutValidator(),
|
|
}
|
|
}
|
|
|
|
// ValidateTableBucketUpload checks if a file upload to a table bucket conforms to Iceberg layout
|
|
// fullPath is the complete filer path (e.g., /buckets/mybucket/mynamespace/mytable/data/file.parquet)
|
|
// Returns nil if the path is not a table bucket path or if validation passes
|
|
// Returns an error if the file doesn't conform to Iceberg layout
|
|
func (v *TableBucketFileValidator) ValidateTableBucketUpload(fullPath string) error {
|
|
// Check if this is a table bucket path
|
|
if !strings.HasPrefix(fullPath, TablesPath+"/") {
|
|
return nil // Not a table bucket, no validation needed
|
|
}
|
|
|
|
// Extract the path relative to table bucket root
|
|
// Format: /buckets/{bucket}/{namespace}/{table}/{relative-path}
|
|
relativePath := strings.TrimPrefix(fullPath, TablesPath+"/")
|
|
parts := strings.SplitN(relativePath, "/", 4)
|
|
|
|
// Need at least bucket/namespace/table/file
|
|
if len(parts) < 4 {
|
|
// Creating bucket, namespace, or table directories - allow only if preceding parts are non-empty
|
|
for i := 0; i < len(parts); i++ {
|
|
if parts[i] == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "bucket, namespace, and table segments cannot be empty",
|
|
}
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// For full paths, also verify bucket, namespace, and table segments are non-empty
|
|
if parts[0] == "" || parts[1] == "" || parts[2] == "" {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "bucket, namespace, and table segments cannot be empty",
|
|
}
|
|
}
|
|
|
|
// The last part is the path within the table (data/file.parquet or metadata/v1.json)
|
|
tableRelativePath := parts[3]
|
|
if tableRelativePath == "" {
|
|
return nil
|
|
}
|
|
|
|
// Reject paths with empty segments (double slashes) within the table path
|
|
if strings.HasPrefix(tableRelativePath, "/") || strings.Contains(tableRelativePath, "//") {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "bucket, namespace, and table segments cannot be empty",
|
|
}
|
|
}
|
|
|
|
return v.layoutValidator.ValidateFilePath(tableRelativePath)
|
|
}
|
|
|
|
// IsTableBucketPath checks if a path is under the table buckets directory
|
|
func IsTableBucketPath(fullPath string) bool {
|
|
return strings.HasPrefix(fullPath, TablesPath+"/")
|
|
}
|
|
|
|
// GetTableInfoFromPath extracts bucket, namespace, and table names from a table bucket path
|
|
// Returns empty strings if the path doesn't contain enough components
|
|
func GetTableInfoFromPath(fullPath string) (bucket, namespace, table string) {
|
|
if !strings.HasPrefix(fullPath, TablesPath+"/") {
|
|
return "", "", ""
|
|
}
|
|
|
|
relativePath := strings.TrimPrefix(fullPath, TablesPath+"/")
|
|
parts := strings.SplitN(relativePath, "/", 4)
|
|
|
|
if len(parts) >= 1 {
|
|
bucket = parts[0]
|
|
}
|
|
if len(parts) >= 2 {
|
|
namespace = parts[1]
|
|
}
|
|
if len(parts) >= 3 {
|
|
table = parts[2]
|
|
}
|
|
|
|
return
|
|
}
|
|
|
|
// ValidateTableBucketUploadWithClient validates upload and checks that the table exists and is ICEBERG format
|
|
func (v *TableBucketFileValidator) ValidateTableBucketUploadWithClient(
|
|
ctx context.Context,
|
|
client filer_pb.SeaweedFilerClient,
|
|
fullPath string,
|
|
) error {
|
|
// If not a table bucket path, nothing more to check
|
|
if !IsTableBucketPath(fullPath) {
|
|
return nil
|
|
}
|
|
|
|
// Get table info and verify it exists
|
|
bucket, namespace, table := GetTableInfoFromPath(fullPath)
|
|
if bucket == "" || namespace == "" || table == "" {
|
|
return nil // Not deep enough to need validation
|
|
}
|
|
|
|
if strings.HasPrefix(bucket, ".") {
|
|
return nil
|
|
}
|
|
|
|
resp, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
|
Directory: TablesPath,
|
|
Name: bucket,
|
|
})
|
|
if err != nil {
|
|
if errors.Is(err, filer_pb.ErrNotFound) {
|
|
return nil
|
|
}
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "failed to verify table bucket: " + err.Error(),
|
|
}
|
|
}
|
|
if resp == nil || !IsTableBucketEntry(resp.Entry) {
|
|
return nil
|
|
}
|
|
|
|
// Now check basic layout once we know this is a table bucket path.
|
|
if err := v.ValidateTableBucketUpload(fullPath); err != nil {
|
|
return err
|
|
}
|
|
|
|
// Verify the table exists and has ICEBERG format by checking its metadata
|
|
tablePath := GetTablePath(bucket, namespace, table)
|
|
dir, name := splitPath(tablePath)
|
|
|
|
resp, err = filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
|
Directory: dir,
|
|
Name: name,
|
|
})
|
|
if err != nil {
|
|
// Distinguish between "not found" and other errors
|
|
if errors.Is(err, filer_pb.ErrNotFound) {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table does not exist",
|
|
}
|
|
}
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "failed to verify table existence: " + err.Error(),
|
|
}
|
|
}
|
|
|
|
// Check if table has metadata indicating ICEBERG format
|
|
if resp.Entry == nil || resp.Entry.Extended == nil {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table is not a valid ICEBERG table (missing metadata)",
|
|
}
|
|
}
|
|
|
|
metadataBytes, ok := resp.Entry.Extended[ExtendedKeyMetadata]
|
|
if !ok {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table is not in ICEBERG format (missing format metadata)",
|
|
}
|
|
}
|
|
|
|
var metadata tableMetadataInternal
|
|
if err := json.Unmarshal(metadataBytes, &metadata); err != nil {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "failed to parse table metadata: " + err.Error(),
|
|
}
|
|
}
|
|
const TableFormatIceberg = "ICEBERG"
|
|
if metadata.Format != TableFormatIceberg {
|
|
return &IcebergLayoutError{
|
|
Code: ErrCodeInvalidIcebergLayout,
|
|
Message: "table is not in " + TableFormatIceberg + " format",
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|