seaweedFS

Author	SHA1	Message	Date
Chris Lu	995dfc4d5d	chore: remove ~50k lines of unreachable dead code (#8913 ) * chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking.	2026-04-03 16:04:27 -07:00
Chris Lu	d95df76bca	feat: separate scheduler lanes for iceberg, lifecycle, and volume management (#8787 ) * feat: introduce scheduler lanes for independent per-workload scheduling Split the single plugin scheduler loop into independent per-lane goroutines so that volume management, iceberg compaction, and lifecycle operations never block each other. Each lane has its own: - Goroutine (laneSchedulerLoop) - Wake channel for immediate scheduling - Admin lock scope (e.g. "plugin scheduler:default") - Configurable idle sleep duration - Loop state tracking Three lanes are defined: - default: vacuum, volume_balance, ec_balance, erasure_coding, admin_script - iceberg: iceberg_maintenance - lifecycle: s3_lifecycle (new, handler coming in a later commit) Job types are mapped to lanes via a hardcoded map with LaneDefault as the fallback. The SchedulerJobTypeState and SchedulerStatus types now include a Lane field for API consumers. * feat: per-lane execution reservation pools for resource isolation Each scheduler lane now maintains its own execution reservation map so that a busy volume lane cannot consume execution slots needed by iceberg or lifecycle lanes. The per-lane pool is used by default when dispatching jobs through the lane scheduler; the global pool remains as a fallback for the public DispatchProposals API. * feat: add per-lane scheduler status API and lane worker UI pages - GET /api/plugin/lanes returns all lanes with status and job types - GET /api/plugin/workers?lane=X filters workers by lane - GET /api/plugin/scheduler-states?lane=X filters job types by lane - GET /api/plugin/scheduler-status?lane=X returns lane-scoped status - GET /plugin/lanes/{lane}/workers renders per-lane worker page - SchedulerJobTypeState now includes a "lane" field The lane worker pages show scheduler status, job type configuration, and connected workers scoped to a single lane, with links back to the main plugin overview. * feat: add s3_lifecycle worker handler for object store lifecycle management Implements a full plugin worker handler for S3 lifecycle management, assigned to the new "lifecycle" scheduler lane. Detection phase: - Reads filer.conf to find buckets with TTL lifecycle rules - Creates one job proposal per bucket with active lifecycle rules - Supports bucket_filter wildcard pattern from admin config Execution phase: - Walks the bucket directory tree breadth-first - Identifies expired objects by checking TtlSec + Crtime < now - Deletes expired objects in configurable batches - Reports progress with scanned/expired/error counts - Supports dry_run mode for safe testing Configurable via admin UI: - batch_size: entries per filer listing page (default 1000) - max_deletes_per_bucket: safety cap per run (default 10000) - dry_run: detect without deleting - delete_marker_cleanup: clean expired delete markers - abort_mpu_days: abort stale multipart uploads The handler integrates with the existing PutBucketLifecycle flow which sets TtlSec on entries via filer.conf path rules. * feat: add per-lane submenu items under Workers sidebar menu Replace the single "Workers" sidebar link with a collapsible submenu containing three lane entries: - Default (volume management + admin scripts) -> /plugin - Iceberg (table compaction) -> /plugin/lanes/iceberg/workers - Lifecycle (S3 object expiration) -> /plugin/lanes/lifecycle/workers The submenu auto-expands when on any /plugin page and highlights the active lane. Icons match each lane's job type descriptor (server, snowflake, hourglass). * feat: scope plugin pages to their scheduler lane The plugin overview, configuration, detection, queue, and execution pages now filter workers, job types, scheduler states, and scheduler status to only show data for their lane. - Plugin() templ function accepts a lane parameter (default: "default") - JavaScript appends ?lane= to /api/plugin/workers, /job-types, /scheduler-states, and /scheduler-status API calls - GET /api/plugin/job-types now supports ?lane= filtering - When ?job= is provided (e.g. ?job=iceberg_maintenance), the lane is auto-derived from the job type so the page scopes correctly This ensures /plugin shows only default-lane workers and /plugin/configuration?job=iceberg_maintenance scopes to the iceberg lane. * fix: remove "Lane" from lane worker page titles and capitalize properly "lifecycle Lane Workers" -> "Lifecycle Workers" "iceberg Lane Workers" -> "Iceberg Workers" * refactor: promote lane items to top-level sidebar menu entries Move Default, Iceberg, and Lifecycle from a collapsible submenu to direct top-level items under the WORKERS heading. Removes the intermediate "Workers" parent link and collapse toggle. * admin: unify plugin lane routes and handlers * admin: filter plugin jobs and activities by lane * admin: reuse plugin UI for worker lane pages * fix: use ServerAddress.ToGrpcAddress() for filer connections in lifecycle handler ClusterContext addresses use ServerAddress format (host:port.grpcPort). Convert to the actual gRPC address via ToGrpcAddress() before dialing, and add a Ping verification after connecting. Fixes: "dial tcp: lookup tcp/8888.18888: unknown port" * fix: resolve ServerAddress gRPC port in iceberg and lifecycle filer connections ClusterContext addresses use ServerAddress format (host:httpPort.grpcPort). Both the iceberg and lifecycle handlers now detect the compound format and extract the gRPC port via ToGrpcAddress() before dialing. Plain host:port addresses (e.g. from tests) are passed through unchanged. Fixes: "dial tcp: lookup tcp/8888.18888: unknown port" * align url * Potential fix for code scanning alert no. 335: Incorrect conversion between integer types Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix: address PR review findings across scheduler lanes and lifecycle handler - Fix variable shadowing: rename loop var `w` to `worker` in GetPluginWorkersAPI to avoid shadowing the http.ResponseWriter param - Fix stale GetSchedulerStatus: aggregate loop states across all lanes instead of reading never-updated legacy schedulerLoopState - Scope InProcessJobs to lane in GetLaneSchedulerStatus - Fix AbortMPUDays=0 treated as unset: change <= 0 to < 0 so 0 disables - Propagate listing errors in lifecycle bucket walk instead of swallowing - Implement DeleteMarkerCleanup: scan for S3 delete marker entries and remove them - Implement AbortMPUDays: scan .uploads directory and remove stale multipart uploads older than the configured threshold - Fix success determination: mark job failed when result.errors > 0 even if no fatal error occurred - Add regression test for jobTypeLaneMap to catch drift from handler registrations * fix: guard against nil result in lifecycle completion and trim filer addresses - Guard result dereference in completion summary: use local vars defaulting to 0 when result is nil to prevent panic - Append trimmed filer addresses instead of originals so whitespace is not passed to the gRPC dialer * fix: propagate ctx cancellation from deleteExpiredObjects and add config logging - deleteExpiredObjects now returns a third error value when the context is canceled mid-batch; the caller stops processing further batches and returns the cancellation error to the job completion handler - readBoolConfig and readInt64Config now log unexpected ConfigValue types at V(1) for debugging, consistent with readStringConfig * fix: propagate errors in lifecycle cleanup helpers and use correct delete marker key - cleanupDeleteMarkers: return error on ctx cancellation and SeaweedList failures instead of silently continuing - abortIncompleteMPUs: log SeaweedList errors instead of discarding - isDeleteMarker: use ExtDeleteMarkerKey ("Seaweed-X-Amz-Delete-Marker") instead of ExtLatestVersionIsDeleteMarker which is for the parent entry - batchSize cap: use math.MaxInt instead of math.MaxInt32 * fix: propagate ctx cancellation from abortIncompleteMPUs and log unrecognized bool strings - abortIncompleteMPUs now returns (aborted, errors, ctxErr) matching cleanupDeleteMarkers; caller stops on cancellation or listing failure - readBoolConfig logs unrecognized string values before falling back * fix: shared per-bucket budget across lifecycle phases and allow cleanup without expired objects - Thread a shared remaining counter through TTL deletion, delete marker cleanup, and MPU abort so the total operations per bucket never exceed MaxDeletesPerBucket - Remove early return when no TTL-expired objects found so delete marker cleanup and MPU abort still run - Add NOTE on cleanupDeleteMarkers about version-safety limitation --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2026-03-26 19:28:13 -07:00
Chris Lu	55e988a7ee	iceberg: add sort-aware compaction rewrite (#8666 ) * iceberg: add sort-aware compaction rewrite * iceberg: share filtered row iteration in compaction * iceberg: rely on table sort order for sort rewrites * iceberg: harden sort compaction planning * iceberg: include rewrite strategy in planning config hash compactionPlanningConfigHash now incorporates RewriteStrategy and SortMaxInputBytes so cached planning results are invalidated when sort strategy settings change. Also use the bytesPerMB constant in compactionNoEligibleMessage.	2026-03-17 00:57:32 -07:00
Chris Lu	e5c0889473	iceberg: add delete file rewrite maintenance (#8664 ) * iceberg: add delete file rewrite maintenance * iceberg: preserve untouched delete files during rewrites * iceberg: share detection threshold defaults * iceberg: add partition-scoped maintenance filters (#8665) * iceberg: add partition-scoped maintenance filters * iceberg: tighten where-filter partition matching	2026-03-16 21:11:09 -07:00
Chris Lu	6e45fc0055	iceberg: cache detection planning results (#8667 ) * iceberg: cache detection planning results * iceberg: tighten planning index cache handling * iceberg: remove dead planning-index metadata fallback * iceberg: preserve partial planning index caches * iceberg: scope planning index caching per op * iceberg: rename copy vars to avoid shadowing builtin	2026-03-16 19:54:01 -07:00
Chris Lu	f71cef2dc8	iceberg: add resource-group proposal controls (#8668 ) * iceberg: add resource-group proposal controls * iceberg: tighten resource group config validation	2026-03-16 16:45:00 -07:00
Chris Lu	acea36a181	filer: add conditional update preconditions (#8647 ) * filer: add conditional update preconditions * iceberg: tighten metadata CAS preconditions	2026-03-16 12:33:32 -07:00
Chris Lu	6b2b442450	iceberg: detect maintenance work per operation (#8639 ) * iceberg: detect maintenance work per operation * iceberg: ignore delete manifests during detection * iceberg: clean up detection maintenance planning * iceberg: tighten detection manifest heuristics * Potential fix for code scanning alert no. 330: Incorrect conversion between integer types Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * iceberg: tolerate per-operation detection errors * iceberg: fix fake metadata location versioning * iceberg: check snapshot expiry before manifest loads * iceberg: make expire-snapshots switch case explicit --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2026-03-15 20:21:50 -07:00
Chris Lu	a00eddb525	Iceberg table maintenance Phase 3: multi-spec compaction, delete handling, and metrics (#8643 ) * Add multi-partition-spec compaction and delete-aware compaction (Phase 3) Multi-partition-spec compaction: - Add SpecID to compactionBin struct and group by spec+partition key - Remove the len(specIDs) > 1 skip that blocked spec-evolved tables - Write per-spec manifests in compaction commit using specByID map - Use per-bin PartitionSpec when calling NewDataFileBuilder Delete-aware compaction: - Add ApplyDeletes config (default: true) with readBoolConfig helper - Implement position delete collection (file_path + pos Parquet columns) - Implement equality delete collection (field ID to column mapping) - Update mergeParquetFiles to filter rows via position deletes (binary search) and equality deletes (hash set lookup) - Smart delete manifest carry-forward: drop when all data files compacted - Fix EXISTING/DELETED entries to include sequence numbers Tests for multi-spec bins, delete collection, merge filtering, and end-to-end compaction with position/equality/mixed deletes. * Add structured metrics and per-bin progress to iceberg maintenance - Change return type of all four operations from (string, error) to (string, map[string]int64, error) with structured metric counts (files_merged, snapshots_expired, orphans_removed, duration_ms, etc.) - Add onProgress callback to compactDataFiles for per-bin progress - In Execute, pass progress callback that sends JobProgressUpdate with per-bin stage messages - Accumulate per-operation metrics with dot-prefixed keys (e.g. compact.files_merged) into OutputValues on completion - Update testing_api.go wrappers and integration test call sites - Add tests: TestCompactDataFilesMetrics, TestExpireSnapshotsMetrics, TestExecuteCompletionOutputValues * Address review feedback: group equality deletes by field IDs, use metric constants - Group equality deletes by distinct equality_ids sets so different delete files with different equality columns are handled correctly - Use length-prefixed type-aware encoding in buildEqualityKey to avoid ambiguity between types and collisions from null bytes - Extract metric key strings into package-level constants * Fix buildEqualityKey to use length-prefixed type-aware encoding The previous implementation used plain String() concatenation with null byte separators, which caused type ambiguity (int 123 vs string "123") and separator collisions when values contain null bytes. Now each value is serialized as "kind:length:value" for unambiguous composite keys. This fix was missed in the prior cherry-pick due to a merge conflict. * Address nitpick review comments - Document patchManifestContentToDeletes workaround: explain that iceberg-go WriteManifest cannot create delete manifests, and note the fail-fast validation on pattern match - Document makeTestEntries: note that specID field is ignored and callers should use makeTestEntriesWithSpec for multi-spec testing * fmt * Fix path normalization, manifest threshold, and artifact filename collisions - Normalize file paths in position delete collection and lookup so that absolute S3 URLs and relative paths match correctly - Fix rewriteManifests threshold check to count only data manifests (was including delete manifests in the count and metric) - Add random suffix to artifact filenames in compactDataFiles and rewriteManifests to prevent collisions between concurrent runs - Sort compaction bins by SpecID then PartitionKey for deterministic ordering across specs * Fix pos delete read, deduplicate column resolution, minor cleanups - Remove broken Column() guard in position delete reading that silently defaulted pos to 0; unconditionally extract Int64() instead - Deduplicate column resolution in readEqualityDeleteFile by calling resolveEqualityColIndices instead of inlining the same logic - Add warning log in readBoolConfig for unrecognized string values - Fix CompactDataFiles call site in integration test to capture 3 return values * Advance progress on all bins, deterministic manifest order, assert metrics - Call onProgress for every bin iteration including skipped/failed bins so progress reporting never appears stalled - Sort spec IDs before iterating specEntriesMap to produce deterministic manifest list ordering across runs - Assert expected metric keys in CompactDataFiles integration test --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-15 19:18:14 -07:00
Chris Lu	e24630251c	iceberg: handle filer-backed compaction inputs (#8638 ) * iceberg: handle filer-backed compaction inputs * iceberg: preserve upsert creation times * iceberg: align compaction test schema * iceberg: tighten compact output assertion * iceberg: document compact output match * iceberg: clear stale chunks in upsert helper * iceberg: strengthen compaction integration coverage	2026-03-15 17:46:06 -07:00
Chris Lu	0afc675a55	iceberg: validate filer failover targets (#8637 ) * iceberg: validate filer failover targets * iceberg: tighten filer liveness checks * iceberg: relax filer test readiness deadline	2026-03-15 17:45:55 -07:00
Chris Lu	5f19e3259f	iceberg: keep split bins within target size (#8640 )	2026-03-15 12:48:31 -07:00
Chris Lu	d9d6707401	Change iceberg compaction target file size config from bytes to MB (#8636 ) Change iceberg target_file_size config from bytes to MB Rename the config field from target_file_size_bytes to target_file_size_mb with a default of 256 (MB). The value is converted to bytes internally. This makes the config more user-friendly — entering 256 is clearer than 268435456. Co-authored-by: Copilot <copilot@github.com>	2026-03-15 11:42:06 -07:00
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:27:42 -07:00
Chris Lu	587c24ec89	plugin worker: support job type categories (all, default, heavy) (#8547 ) * plugin worker: add handler registry with job categories Introduce a self-registration pattern for plugin worker job handlers. Each handler can register itself via init() with a HandlerFactory that declares its job type, category (default/heavy), CLI aliases, and a builder function. ResolveHandlerFactories accepts a mix of category names ("all", "default", "heavy") and explicit job type names/aliases, returning the matching factories. This enables workers to be configured by resource profile rather than requiring explicit job type enumeration. * plugin worker: register all handlers via init() Each job handler now self-registers into the global handler registry with its canonical job type, category, CLI aliases, and build function: - vacuum: category=default - volume_balance: category=default - admin_script: category=default - erasure_coding: category=heavy - iceberg_maintenance: category=heavy Adding a new job type now only requires adding the init() call in the handler file itself — no other files need to be touched. * plugin worker: replace hardcoded job type switch with registry Remove buildPluginWorkerHandler, parsePluginWorkerJobTypes, and canonicalPluginWorkerJobType from worker_runtime.go. The simplified buildPluginWorkerHandlers now delegates to pluginworker.ResolveHandlerFactories, which resolves category names ("all", "default", "heavy") and explicit job type names/aliases. The default job type is changed from an explicit list to "all", so new handlers registered via init() are automatically picked up. Update all tests to use the new API. * plugin worker: update CLI help text for job categories Update the -jobType flag description and command examples to document category support (all, default, heavy) alongside explicit job type names. * plugin worker: address review feedback - Add CategoryAll constant; use typed constants in tokenAsCategory - Pre-allocate result slice in ResolveHandlerFactories - Add vacuum aliases (vol.vacuum, volume.vacuum) - List alias examples (ec, balance, iceberg) in -jobType flag help - Create handlers aggregator package for subpackage blank imports so new handler subpackages only need to be added in one place - Make category tests relationship-based (subset/union checks) instead of asserting exact handler counts - Add clarifying comments to worker_test.go and mini_plugin_test.go listing expected handler names next to count assertions --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-07 18:30:58 -08:00
Chris Lu	72c2c7ef8b	Add iceberg_maintenance plugin worker handler (Phase 1) (#8501 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix unsafe int64→int narrowing for MaxSnapshotsToKeep Use int64(wouldKeep) instead of int(config.MaxSnapshotsToKeep) to avoid potential truncation on 32-bit platforms (CodeQL high severity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix unsafe int64→int narrowing for MinInputFiles Use int64(len(manifests)) instead of int(config.MinInputFiles) to avoid potential truncation on 32-bit platforms (CodeQL high severity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix unsafe int64→int narrowing for MaxCommitRetries Clamp MaxCommitRetries to [1,20] range and keep as int64 throughout the retry loop to avoid truncation on 32-bit platforms (CodeQL high severity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Sort snapshots explicitly by timestamp in expireSnapshots The previous logic relied on implicit ordering of the snapshot list. Now explicitly sorts snapshots by timestamp descending (most recent first) and uses a simpler keep-count loop: keep the first MaxSnapshotsToKeep newest snapshots plus the current snapshot unconditionally, then expire the rest that exceed the retention window. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Handle errors properly in listFilerEntries Previously all errors from ListEntries and Recv were silently swallowed. Now: treat "not found" errors as empty directory, propagate other ListEntries errors, and check for io.EOF explicitly on Recv instead of breaking on any error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix overly broad HasSuffix check in orphan detection The bare strings.HasSuffix(ref, entry.Name) could match files with similar suffixes (e.g. "123.avro" matching "snap-123.avro"). Replaced with exact relPath match and a "/"-prefixed suffix check to avoid false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Replace fmt.Sscanf with strconv.Atoi in extractMetadataVersion strconv.Atoi is more explicit and less fragile than fmt.Sscanf for parsing a simple integer from a trimmed string. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Recursively traverse directories for orphan file detection The orphan cleanup only listed a single directory level under data/ and metadata/, skipping IsDirectory entries. Partitioned Iceberg tables store data files in nested partition directories (e.g. data/region=us-east/file.parquet) which were never evaluated. Add walkFilerEntries helper that recursively descends into subdirectories, and use it in removeOrphans so all nested files are considered for orphan checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix manifest path drift from double time.Now() calls rewriteManifests called time.Now().UnixMilli() twice: once for the path embedded in WriteManifest and once for the filename passed to saveFilerFile. These timestamps would differ, causing the manifest's internal path reference to not match the actual saved filename. Compute the filename once and reuse it for both WriteManifest and saveFilerFile so they always reference the same path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add TestManifestRewritePathConsistency test Verifies that WriteManifest returns a ManifestFile whose FilePath() matches the path passed in, and that path.Base() of that path matches the filename used for saveFilerFile. This validates the single- timestamp pattern used in rewriteManifests produces consistent paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Make parseOperations return error on unknown operations Previously parseOperations silently dropped unknown operation names and could return an empty list. Now validates inputs against the canonical set and returns a clear error if any unknown operation is specified. Updated Execute to surface the error instead of proceeding with an empty operation list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use gRPC status codes instead of string matching in listFilerEntries Replace brittle strings.Contains(err.Error(), "not found") check with status.Code(err) == codes.NotFound for proper gRPC error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add stale-plan guard in commit closures for expireSnapshots and rewriteManifests Both operations plan outside the commit mutation using a snapshot ID captured from the initial metadata read. If the table head advances concurrently, the mutation would create a snapshot parented to the wrong head or remove snapshots based on a stale view. Add a guard inside each mutation closure that verifies currentMeta.CurrentSnapshot().SnapshotID still matches the planned snapshot ID. If it differs, return errStalePlan which propagates immediately (not retried, since the plan itself is invalid). Also fix rewriteManifests to derive SequenceNumber from the fresh metadata (cs.SequenceNumber) instead of the captured currentSnap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add compare-and-swap to updateTableMetadataXattr updateTableMetadataXattr previously re-read the entry but did not verify the metadataVersion matched what commitWithRetry had loaded. A concurrent update could be silently clobbered. Now accepts expectedVersion parameter and compares it against the stored metadataVersion before writing. Returns errMetadataVersionConflict on mismatch, which commitWithRetry treats as retryable (deletes the staged metadata file and retries with fresh state). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Export shared plugin worker helpers for use by sub-packages Export ShouldSkipDetectionByInterval, BuildExecutorActivity, and BuildDetectorActivity so the iceberg sub-package can reuse them without duplicating logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Refactor iceberg maintenance handler into weed/plugin/worker/iceberg package Split the 1432-line iceberg_maintenance_handler.go into focused files in a new iceberg sub-package: handler.go, config.go, detection.go, operations.go, filer_io.go, and compact.go (Phase 2 data compaction). Key changes: - Rename types to drop stutter (IcebergMaintenanceHandler → Handler, etc.) - Fix loadFileByIcebergPath to preserve nested directory paths via normalizeIcebergPath instead of path.Base which dropped subdirectories - Check SendProgress errors instead of discarding them - Add stale-plan guard to compactDataFiles commitWithRetry closure - Add "compact" operation to parseOperations canonical order - Duplicate readStringConfig/readInt64Config helpers (~20 lines) - Update worker_runtime.go to import new iceberg sub-package Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Remove iceberg_maintenance from default plugin worker job types Iceberg maintenance is not yet ready to be enabled by default. Workers can still opt in by explicitly listing iceberg_maintenance in their job types configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Clamp config values to safe minimums in ParseConfig Prevents misconfiguration by enforcing minimum values using the default constants for all config fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Harden filer I/O: path helpers, strict CAS guard, path traversal prevention - Use path.Dir/path.Base instead of strings.SplitN in loadCurrentMetadata - Make CAS guard error on missing or unparseable metadataVersion - Add path.Clean and traversal validation in loadFileByIcebergPath Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix compact: single snapshot ID, oversized bin splitting, ensureFilerDir - Use single newSnapID for all manifest entries in a compaction run - Add splitOversizedBin to break bins exceeding targetSize - Make ensureFilerDir only create on NotFound, propagate other errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add wildcard filters, scan limit, and context cancellation to table scanning - Use wildcard matchers (, ?) for bucket/namespace/table filters - Add limit parameter to scanTablesForMaintenance for early termination - Add ctx.Done() checks in bucket and namespace scan loops - Update filter UI descriptions and placeholders for wildcard support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Remove dead detection interval check and validate namespace parameter - Remove ineffective ShouldSkipDetectionByInterval call with hardcoded 0 - Add namespace to required parameter validation in Execute Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Improve operations: exponential backoff, orphan matching, full file cleanup - Use exponential backoff (50ms, 100ms, 200ms, ...) in commitWithRetry - Use normalizeIcebergPath for orphan matching instead of fragile suffix check - Add collectSnapshotFiles to traverse manifest lists → manifests → data files - Delete all unreferenced files after expiring snapshots, not just manifest lists - Refactor removeOrphans to reuse collectSnapshotFiles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * iceberg: fix ensureFilerDir to handle filer_pb.ErrNotFound sentinel filer_pb.LookupEntry converts gRPC NotFound errors to filer_pb.ErrNotFound (a plain sentinel), so status.Code() never returns codes.NotFound for that error. This caused ensureFilerDir to return an error instead of creating the directory when it didn't exist. * iceberg: clean up orphaned artifacts when compaction commit fails Track all files written during compaction (merged data files, manifest, manifest list) and delete them if the commit or any subsequent write step fails, preventing orphaned files from accumulating in the filer. * iceberg: derive tablePath from namespace/tableName when empty An empty table_path parameter would be passed to maintenance operations unchecked. Default it to path.Join(namespace, tableName) when not provided. * iceberg: make collectSnapshotFiles return error on read/parse failure Previously, errors reading manifests were logged and skipped, returning a partial reference set. This could cause incorrect delete decisions during snapshot expiration or orphan cleanup. Now the function returns an error and all callers abort when reference data is incomplete. * iceberg: include active metadata file in removeOrphans referenced set The metadataFileName returned by loadCurrentMetadata was discarded, so the active metadata file could be incorrectly treated as an orphan and deleted. Capture it and add it to the referencedFiles map. * iceberg: only retry commitWithRetry on metadata version conflicts Previously all errors from updateTableMetadataXattr triggered retries. Now only errMetadataVersionConflict causes retry; other errors (permissions, transport, malformed xattr) fail immediately. * iceberg: respect req.Limit in fakeFilerServer.ListEntries mock The mock ListEntries ignored the Limit field, so tests couldn't exercise pagination. Now it stops streaming once Limit entries have been sent. * iceberg: validate parquet schema compatibility before merging files mergeParquetFiles now compares each source file's schema against the first file's schema and aborts with a clear error if they differ, instead of blindly writing rows that could panic or produce corrupt output. * iceberg: normalize empty JobType to canonical jobType in Execute events When request.Job.JobType is empty, status events and completion messages were emitted with a blank job type. Derive a canonical value early and use it consistently in all outbound events. * iceberg: log warning on unexpected config value types in read helpers readStringConfig and readInt64Config now log a V(1) warning when they encounter an unhandled ConfigValue kind, aiding debugging of unexpected config types that silently fall back to defaults. * worker: add iceberg_maintenance to default plugin worker job types Workers using the default job types list didn't advertise the iceberg_maintenance handler despite the handler and canonical name being registered. Add it so workers pick up the handler by default. * iceberg: use defer and detached context for compaction artifact cleanup The cleanup closure used the job context which could already be canceled, and was not called on ctx.Done() early exits. Switch to a deferred cleanup with a detached context (30s timeout) so artifact deletion completes on all exit paths including context cancellation. * iceberg: use proportional jitter in commitWithRetry backoff Fixed 25ms max jitter becomes insignificant at higher retry attempts. Use 0-20% of the current backoff value instead so jitter scales with the exponential delay. * iceberg: add malformed filename cases to extractMetadataVersion test Cover edge cases like "invalid.metadata.json", "metadata.json", "", and "v.metadata.json" to ensure the function returns 0 for unparseable inputs. * iceberg: fail compaction on manifest read errors and skip delete manifests Previously, unreadable manifests were silently skipped during compaction, which could drop live files from the entry set. Now manifest read/parse errors are returned as fatal errors. Also abort compaction when delete manifests exist since the compactor does not apply deletes — carrying them through unchanged could produce incorrect results. * iceberg: use table-relative path for active metadata file in orphan scan metadataFileName was stored as a basename (e.g. "v1.metadata.json") but the orphan scanner matches against table-relative paths like "metadata/v1.metadata.json". Prefix with "metadata/" so the active metadata file is correctly recognized as referenced. * iceberg: fix MetadataBuilderFromBase location to use metadata file path The second argument to MetadataBuilderFromBase records the previous metadata file in the metadata log. Using meta.Location() (the table root) was incorrect — it must be the actual metadata file path so old metadata files can be tracked and eventually cleaned up. * iceberg: update metadataLocation and versionToken in xattr on commit updateTableMetadataXattr was only updating metadataVersion, modifiedAt, and fullMetadata but not metadataLocation or versionToken. This left catalog state inconsistent after maintenance commits — the metadataLocation still pointed to the old metadata file and the versionToken was stale. Add a newMetadataLocation parameter and regenerate the versionToken on every commit, matching the S3 Tables handler behavior. * iceberg: group manifest entries by partition spec in rewriteManifests rewriteManifests was writing all entries into a single manifest using the table's current partition spec. For spec-evolved tables where manifests reference different partition specs, this produces an invalid manifest. Group entries by the source manifest's PartitionSpecID and write one merged manifest per spec, looking up each spec from the table's PartitionSpecs list. * iceberg: remove dead code loop for non-data manifests in compaction The early abort guard at the top of compactDataFiles already ensures no delete manifests are present. The loop that copied non-data manifests into allManifests was unreachable dead code. * iceberg: use JSON encoding in partitionKey for unambiguous grouping partitionKey used fmt.Sprintf("%d=%v") joined by commas, which produces ambiguous keys when partition values contain commas or '='. Use json.Marshal for values and NUL byte as separator to eliminate collisions. * iceberg: precompute normalized reference set in removeOrphans The orphan check was O(files × refs) because it normalized each reference path inside the per-file loop. Precompute the normalized set once for O(1) lookups per candidate file. * iceberg: add artifact cleanup to rewriteManifests on commit failure rewriteManifests writes merged manifests and a manifest list to the filer before committing but did not clean them up on failure. Add the same deferred cleanup pattern used by compactDataFiles: track written artifacts and delete them if the commit does not succeed. * iceberg: pass isDeleteData=true in deleteFilerFile deleteFilerFile called DoRemove with isDeleteData=false, which only removed filer metadata and left chunk data behind on volume servers. All other data-file deletion callers in the codebase pass true. * iceberg: clean up test: remove unused snapID, simplify TestDetectWithFakeFiler Remove unused snapID variable and eliminate the unnecessary second fake filer + entry copy in TestDetectWithFakeFiler by capturing the client from the first startFakeFiler call. * fix: update TestWorkerDefaultJobTypes to expect 5 job types The test expected 4 default job types but iceberg_maintenance was added as a 5th default in a previous commit. * iceberg: document client-side CAS TOCTOU limitation in updateTableMetadataXattr Add a note explaining the race window where two workers can both pass the version check and race at UpdateEntry. The proper fix requires server-side precondition support on UpdateEntryRequest. * iceberg: remove unused sender variable in TestFullExecuteFlow * iceberg: abort compaction when multiple partition specs are present The compactor writes all entries into a single manifest using the current partition spec, which is invalid for spec-evolved tables. Detect multiple PartitionSpecIDs and skip compaction until per-spec compaction is implemented. * iceberg: validate tablePath to prevent directory traversal Sanitize the table_path parameter with path.Clean and verify it matches the expected namespace/tableName prefix to prevent path traversal attacks via crafted job parameters. * iceberg: cap retry backoff at 5s and make it context-aware The exponential backoff could grow unbounded and blocked on time.Sleep ignoring context cancellation. Cap at 5s and use a timer with select on ctx.Done so retries respect cancellation. * iceberg: write manifest list with new snapshot identity in rewriteManifests The manifest list was written with the old snapshot's ID and sequence number, but the new snapshot created afterwards used a different identity. Compute newSnapshotID and newSeqNum before writing manifests and the manifest list so all artifacts are consistent. * ec: also remove .vif file in removeEcVolumeFiles removeEcVolumeFiles cleaned up .ecx, .ecj, and shard files but not the .vif volume info file, leaving it orphaned. The .vif file lives in the data directory alongside shard files. The directory handling for index vs data files was already correct: .ecx/.ecj are removed from IdxDirectory and shard files from Directory, matching how NewEcVolume loads them. Revert "ec: also remove .vif file in removeEcVolumeFiles" This reverts commit acc82449e12a00115268a5652aef0d6c46d9f2dd. * iceberg: skip orphan entries with nil Attributes instead of defaulting to epoch When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), making unknown-age entries appear ancient and eligible for deletion. Skip these entries instead to avoid deleting files whose age cannot be determined. * iceberg: use unique metadata filenames to prevent concurrent write clobbering Add timestamp nonce to metadata filenames (e.g. v3-1709766000.metadata.json) so concurrent writers stage to distinct files. Update extractMetadataVersion to strip the nonce suffix, and loadCurrentMetadata to read the actual filename from the metadataLocation xattr field. * iceberg: defer artifact tracking until data file builder succeeds Move the writtenArtifacts append to after NewDataFileBuilder succeeds, so a failed builder doesn't leave a stale entry for an already-deleted file in the cleanup list. * iceberg: use detached context for metadata file cleanup Use context.WithTimeout(context.Background(), 10s) when deleting staged metadata files after CAS failure, so cleanup runs even if the original request context is canceled. * test: update default job types count to include iceberg_maintenance * iceberg: use parquet.EqualNodes for structural schema comparison Replace String()-based schema comparison with parquet.EqualNodes which correctly compares types, repetition levels, and logical types. * iceberg: add nonce-suffixed filename cases to TestExtractMetadataVersion * test: assert iceberg_maintenance is present in default job types * iceberg: validate operations config early in Detect Call parseOperations in Detect so typos in the operations config fail fast before emitting proposals, matching the validation already done in Execute. * iceberg: detect chunked files in loadFileByIcebergPath Return an explicit error when a file has chunks but no inline content, rather than silently returning empty data. Data files uploaded via S3 are stored as chunks, so compaction would otherwise produce corrupt merged files. --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 12:45:33 -08:00

16 Commits