4 Commits

Author SHA1 Message Date
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
e6ee293c17 Add table operations test (#8241)
* Add Trino blog operations test

* Update test/s3tables/catalog_trino/trino_blog_operations_test.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* feat: add table bucket path helpers and filer operations

- Add table object root and table location mapping directories
- Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers
- Support table location bucket mapping for S3 access

* feat: manage table bucket object roots on creation/deletion

- Create .objects directory for table buckets on creation
- Clean up table object bucket paths on deletion
- Enable S3 operations on table bucket object roots

* feat: add table location mapping for Iceberg REST

- Track table location bucket mappings when tables are created/updated/deleted
- Enable location-based routing for S3 operations on table data

* feat: route S3 operations to table bucket object roots

- Route table-s3 bucket names to mapped table paths
- Route table buckets to object root directories
- Support table location bucket mapping lookup

* feat: emit table-s3 locations from Iceberg REST

- Generate unique table-s3 bucket names with UUID suffix
- Store table metadata under table bucket paths
- Return table-s3 locations for Trino compatibility

* fix: handle missing directories in S3 list operations

- Propagate ErrNotFound from ListEntries for non-existent directories
- Treat missing directories as empty results for list operations
- Fixes Trino non-empty location checks on table creation

* test: improve Trino CSV parsing for single-value results

- Sanitize Trino output to skip jline warnings
- Handle single-value CSV results without header rows
- Strip quotes from numeric values in tests

* refactor: use bucket path helpers throughout S3 API

- Replace direct bucket path operations with helper functions
- Leverage centralized table bucket routing logic
- Improve maintainability with consistent path resolution

* fix: add table bucket cache and improve filer error handling

- Cache table bucket lookups to reduce filer overhead on repeated checks
- Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error
- Fix delete order in handler_bucket_get_list_delete: delete table object before directory
- Make location mapping errors best-effort: log and continue, don't fail API
- Update table location mappings to delete stale prior bucket mappings on update
- Add 1-second sleep before timestamp time travel query to ensure timestamps are in past
- Fix CSV parsing: examine all lines, not skip first; handle single-value rows

* fix: properly handle stale metadata location mapping cleanup

- Capture oldMetadataLocation before mutation in handleUpdateTable
- Update updateTableLocationMapping to accept both old and new locations
- Use passed-in oldMetadataLocation to detect location changes
- Delete stale mapping only when location actually changes
- Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping)
- Improve logging to show old -> new location transitions

* refactor: cleanup imports and cache design

- Remove unused 'sync' import from bucket_paths.go
- Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling
- Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache
- Improve cache separation: table buckets cache is now separate from bucket metadata cache

* fix: improve cache invalidation and add transient error handling

Cache invalidation (critical fix):
- Add tableLocationCache to BucketRegistry for location mapping lookups
- Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata
- Prevents stale cache entries when buckets are deleted/recreated

Transient error handling:
- Only cache table bucket lookups when conclusive (found or ErrNotFound)
- Skip caching on transient errors (network, permission, etc)
- Prevents marking real table buckets as non-table due to transient failures

Performance optimization:
- Cache tableLocationDir results to avoid repeated filer RPCs on hot paths
- tableLocationDir now checks cache before making expensive filer lookups
- Cache stores empty string for 'not found' to avoid redundant lookups

Code clarity:
- Add comment to deleteDirectory explaining DeleteEntry response lacks Error field

* go fmt

* fix: mirror transient error handling in tableLocationDir and optimize bucketDir

Transient error handling:
- tableLocationDir now only caches definitive results
- Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses
- Improves reliability on flaky systems or during recovery

Performance optimization:
- bucketDir avoids redundant isTableBucket call via bucketRoot
- Directly use s3a.option.BucketsPath for regular buckets
- Saves one cache lookup for every non-table bucket operation

* fix: revert bucketDir optimization to preserve bucketRoot logic

The optimization to directly use BucketsPath bypassed bucketRoot's logic
and caused issues with S3 list operations on delimiter+prefix cases.

Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly
handles all bucket types and ensures consistent path resolution across
the codebase.

The slight performance cost of an extra cache lookup is worth the correctness
and consistency benefits.

* feat: move table buckets under /buckets

Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries.

* Fix S3 Tables code review issues

- handler_bucket_create.go: Fix bucket existence check to properly validate
  entryResp.Entry before setting s3BucketExists flag (nil Entry should not
  indicate existing bucket)
- bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified
  buckets root path for all bucket types
- file_browser_data.go: Optimize by extracting table bucket check early to
  avoid redundant WithFilerClient call

* Fix list prefix delimiter handling

* Handle list errors conservatively

* Fix Trino FOR TIMESTAMP query - use past timestamp

Iceberg requires the timestamp to be strictly in the past.
Use current_timestamp - interval '1' second instead of current_timestamp.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-07 13:27:47 -08:00
Chris Lu
2b529e310d s3: Add SOSAPI support for Veeam integration (#7899)
* s3api: Add SOSAPI core implementation and tests

Implement Smart Object Storage API (SOSAPI) support for Veeam integration.

- Add s3api_sosapi.go with XML structures and handlers for system.xml and capacity.xml
- Implement virtual object detection and dynamic XML generation
- Add capacity retrieval via gRPC (to be optimized in follow-up)
- Include comprehensive unit tests covering detection, XML generation, and edge cases

This enables Veeam Backup & Replication to discover SeaweedFS capabilities and capacity.

* s3api: Integrate SOSAPI handlers into GetObject and HeadObject

Add early interception for SOSAPI virtual objects in GetObjectHandler and HeadObjectHandler.

- Check for SOSAPI objects (.system-*/system.xml, .system-*/capacity.xml) before normal processing
- Delegate to handleSOSAPIGetObject and handleSOSAPIHeadObject when detected
- Ensures virtual objects are served without hitting storage layer

* s3api: Allow anonymous access to SOSAPI virtual objects

Enable discovery of SOSAPI capabilities without requiring credentials.

- Modify AuthWithPublicRead to bypass auth for SOSAPI objects if bucket exists
- Supports Veeam's initial discovery phase before full IAM setup
- Validates bucket existence to prevent information disclosure

* s3api: Fix SOSAPI capacity retrieval to use proper master connection

Fix gRPC error by connecting directly to master servers instead of through filer.

- Use pb.WithOneOfGrpcMasterClients with s3a.option.Masters
- Matches pattern used in bucket_size_metrics.go
- Resolves "unknown service master_pb.Seaweed" error
- Gracefully handles missing master configuration

* Merge origin/master and implement robust SOSAPI capacity logic

- Resolved merge conflict in s3api_sosapi.go
- Replaced global Statistics RPC with VolumeList (topology) for accurate bucket-specific 'Used' calculation
- Added bucket quota support (report quota as Capacity if set)
- Implemented cluster-wide capacity calculation from topology when no quota
- Added unit tests for topology capacity and usage calculations

* s3api: Remove anonymous access to SOSAPI virtual objects

Reverts the implicit public access for system.xml and capacity.xml.
Requests to these objects now require standard S3 authentication,
unless the bucket has a public-read policy.

* s3api: Refactor SOSAPI handlers to use http.ServeContent

- Consolidate handleSOSAPIGetObject and handleSOSAPIHeadObject into serveSOSAPI
- Use http.ServeContent for standard Range, HEAD, and ETag handling
- Remove manual range request handler and reduce code duplication

* s3api: Unify SOSAPI request handling

- Replaced handleSOSAPIGetObject and handleSOSAPIHeadObject with single HandleSOSAPI function
- Updated call sites in s3api_object_handlers.go
- Simplifies logic and ensures consistent handling for both GET and HEAD requests via http.ServeContent

* s3api: Restore distinct SOSAPI GET/HEAD handlers

- Reverted unified handler to enforce distinct behavior for GET and HEAD
- GET: Supports Range requests via http.ServeContent
- HEAD: Explicitly ignores Range requests (matches MinIO behavior) and writes headers only

* s3api: Refactor SOSAPI handlers to eliminate duplication

- Extracted shared content generation logic into generateSOSAPIContent helper
- handleSOSAPIGetObject: Uses http.ServeContent (supports Range requests)
- handleSOSAPIHeadObject: Manually sets headers (no Range, no body)
- Maintains distinct behavior while following DRY principle

* s3api: Remove low-value SOSAPI tests

Removed tests that validate standard library behavior or trivial constant checks:
- TestSOSAPIConstants (string prefix/suffix checks)
- TestSystemInfoXMLRootElement (redundant with TestGenerateSystemXML)
- TestSOSAPIXMLContentType (tests httptest, not our code)
- TestHTTPTimeFormat (tests standard library)
- TestCapacityInfoXMLStruct (tests Go's XML marshaling)

Kept tests that validate actual business logic and edge cases.

* s3api: Use consistent S3-compliant error responses in SOSAPI

Replaced http.Error() with s3err.WriteErrorResponse() for internal errors
to ensure all SOSAPI errors return S3-compliant XML instead of plain text.

* s3api: Return error when no masters configured for SOSAPI capacity

Changed getCapacityInfo to return an error instead of silently returning
zero capacity when no master servers are configured. This helps surface
configuration issues rather than masking them.

* s3api: Use collection name with FilerGroup prefix for SOSAPI capacity

Fixed collectBucketUsageFromTopology to use s3a.getCollectionName(bucket)
instead of raw bucket name. This ensures collection comparisons match actual
volume collection names when FilerGroup prefix is configured.

* s3api: Apply PR review feedback for SOSAPI

- Renamed `bucket` parameter to `collectionName` in collectBucketUsageFromTopology for clarity
- Changed error checks from `==` to `errors.Is()` for better wrapped error handling
- Added `errors` import

* s3api: Avoid variable shadowing in SOSAPI capacity retrieval

Refactored `getCapacityInfo` to use distinct variable names for errors
to improve code clarity and avoid unintentional shadowing of the
return parameter.
2025-12-28 14:07:58 -08:00
Chris Lu
fba67ce0f0 s3api: Add SOSAPI core implementation and tests
Implement Smart Object Storage API (SOSAPI) support for Veeam integration.

- Add s3api_sosapi.go with XML structures and handlers for system.xml and capacity.xml
- Implement virtual object detection and dynamic XML generation
- Add capacity retrieval via gRPC (to be optimized in follow-up)
- Include comprehensive unit tests covering detection, XML generation, and edge cases

This enables Veeam Backup & Replication to discover SeaweedFS capabilities and capacity.
2025-12-28 12:56:41 -08:00