26 Commits

Author SHA1 Message Date
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
81369b8a83 improve: large file sync throughput for remote.cache and filer.sync (#8676)
* improve large file sync throughput for remote.cache and filer.sync

Three main throughput improvements:

1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file
   instead of always starting at 5MB. A 500MB file now uses ~16MB chunks
   (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk
   overhead (volume assign, gRPC call, needle write) by 3x.

2. Configurable concurrency at every layer:
   - remote.cache chunk concurrency: -chunkConcurrency flag (default 8)
   - remote.cache S3 download concurrency: -downloadConcurrency flag
     (default raised from 1 to 5 per chunk)
   - filer.sync chunk concurrency: -chunkConcurrency flag (default 32)

3. S3 multipart download concurrency raised from 1 to 5: the S3 manager
   downloader was using Concurrency=1, serializing all part downloads
   within each chunk. This alone can 5x per-chunk download speed.

The concurrency values flow through the gRPC request chain:
  shell command → CacheRemoteObjectToLocalClusterRequest →
  FetchAndWriteNeedleRequest → S3 downloader

Zero values in the request mean "use server defaults", maintaining
full backward compatibility with existing callers.

Ref #8481

* fix: use full maxMB for chunk size cap and remove loop guard

Address review feedback:
- Use full maxMB instead of maxMB/2 for maxChunkSize to avoid
  unnecessarily limiting chunk size for very large files.
- Remove chunkSize < maxChunkSize guard from the safety loop so it
  can always grow past maxChunkSize when needed to stay under 1000
  chunks (e.g., extremely large files with small maxMB).

* address review feedback: help text, validation, naming, docs

- Fix help text for -chunkConcurrency and -downloadConcurrency flags
  to say "0 = server default" instead of advertising specific numeric
  defaults that could drift from the server implementation.
- Validate chunkConcurrency and downloadConcurrency are within int32
  range before narrowing, returning a user-facing error if out of range.
- Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions.
- Add doc comment to SetChunkConcurrency noting it must be called
  during initialization before replication goroutines start.
- Replace doubling loop in chunk size safety check with direct
  ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap.

* address Copilot review: clamp concurrency, fix chunk count, clarify proto docs

- Use ceiling division for chunk count check to avoid overcounting
  when file size is an exact multiple of chunk size.
- Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024
  at filer, max 64 at volume server) to prevent excessive goroutines.
- Always use ReadFileWithConcurrency when the client supports it,
  falling back to the implementation's default when value is 0.
- Clarify proto comments that download_concurrency only applies when
  the remote storage client supports it (currently S3).
- Include specific server defaults in help text (e.g., "0 = server
  default 8") so users see the actual values in -h output.

* fix data race on executionErr and use %w for error wrapping

- Protect concurrent writes to executionErr in remote.cache worker
  goroutines with a sync.Mutex to eliminate the data race.
- Use %w instead of %v in volume_grpc_remote.go error formatting
  to preserve the error chain for errors.Is/errors.As callers.
2026-03-17 16:49:56 -07:00
Chris Lu
f3c5ba3cd6 feat(filer): add lazy directory listing for remote mounts (#8615)
* feat(filer): add lazy directory listing for remote mounts

Directory listings on remote mounts previously only queried the local
filer store. With lazy mounts the listing was empty; with eager mounts
it went stale over time.

Add on-demand directory listing that fetches from remote and caches
results with a 5-minute TTL:

- Add `ListDirectory` to `RemoteStorageClient` interface (delimiter-based,
  single-level listing, separate from recursive `Traverse`)
- Implement in S3, GCS, and Azure backends using each platform's
  hierarchical listing API
- Add `maybeLazyListFromRemote` to filer: before each directory listing,
  check if the directory is under a remote mount with an expired cache,
  fetch from remote, persist entries to the local store, then let existing
  listing logic run on the populated store
- Use singleflight to deduplicate concurrent requests for the same directory
- Skip local-only entries (no RemoteEntry) to avoid overwriting unsynced uploads
- Errors are logged and swallowed (availability over consistency)

* refactor: extract xattr key to constant xattrRemoteListingSyncedAt

* feat: make listing cache TTL configurable per mount via listing_cache_ttl_seconds

Add listing_cache_ttl_seconds field to RemoteStorageLocation protobuf.
When 0 (default), lazy directory listing is disabled for that mount.
When >0, enables on-demand directory listing with the specified TTL.

Expose as -listingCacheTTL flag on remote.mount command.

* refactor: address review feedback for lazy directory listing

- Add context.Context to ListDirectory interface and all implementations
- Capture startTime before remote call for accurate TTL tracking
- Simplify S3 ListDirectory using ListObjectsV2PagesWithContext
- Make maybeLazyListFromRemote return void (errors always swallowed)
- Remove redundant trailing-slash path manipulation in caller
- Update tests to match new signatures

* When an existing entry has Remote != nil, we should merge remote metadata   into it rather than replacing it.

* fix(gcs): wrap ListDirectory iterator error with context

The raw iterator error was returned without bucket/path context,
making it harder to debug. Wrap it consistently with the S3 pattern.

* fix(s3): guard against nil pointer dereference in Traverse and ListDirectory

Some S3-compatible backends may return nil for LastModified, Size, or
ETag fields. Check for nil before dereferencing to prevent panics.

* fix(filer): remove blanket 2-minute timeout from lazy listing context

Individual SDK operations (S3, GCS, Azure) already have per-request
timeouts and retry policies. The blanket timeout could cut off large
directory listings mid-operation even though individual pages were
succeeding.

* fix(filer): preserve trace context in lazy listing with WithoutCancel

Use context.WithoutCancel(ctx) instead of context.Background() so
trace/span values from the incoming request are retained for
distributed tracing, while still decoupling cancellation.

* fix(filer): use Store.FindEntry for internal lookups, add Uid/Gid to files, fix updateDirectoryListingSyncedAt

- Use f.Store.FindEntry instead of f.FindEntry for staleness check and
  child lookups to avoid unnecessary lazy-fetch overhead
- Set OS_UID/OS_GID on new file entries for consistency with directories
- In updateDirectoryListingSyncedAt, use Store.UpdateEntry for existing
  directories instead of CreateEntry to avoid deleteChunksIfNotNew and
  NotifyUpdateEvent side effects

* fix(filer): distinguish not-found from store errors in lazy listing

Previously, any error from Store.FindEntry was treated as "not found,"
which could cause entry recreation/overwrite on transient DB failures.
Now check for filer_pb.ErrNotFound explicitly and skip entries or
bail out on real store errors.

* refactor(filer): use errors.Is for ErrNotFound comparisons
2026-03-13 09:36:54 -07:00
Peter Dodd
0910252e31 feat: add statfile remote storage (#8443)
* feat: add statfile; add error for remote storage misses

* feat: statfile implementations for storage providers

* test: add unit tests for StatFile method across providers

Add comprehensive unit tests for the StatFile implementation covering:
- S3: interface compliance and error constant accessibility
- Azure: interface compliance, error constants, and field population
- GCS: interface compliance, error constants, error detection, and field population

Also fix variable shadowing issue in S3 and Azure StatFile implementations where
named return parameters were being shadowed by local variable declarations.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address StatFile review feedback

- Use errors.New for ErrRemoteObjectNotFound sentinel
- Fix S3 HeadObject 404 detection to use awserr.Error code check
- Remove hollow field-population tests that tested nothing
- Remove redundant stdlib error detection tests
- Trim verbose doc comment on ErrRemoteObjectNotFound

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address second round of StatFile review feedback

- Rename interface assertion tests to TestXxxRemoteStorageClientImplementsInterface
- Delegate readFileRemoteEntry to StatFile in all three providers
- Revert S3 404 detection to RequestFailure.StatusCode() check
- Fix double-slash in GCS error message format string
- Add storage type prefix to S3 error message for consistency

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: comments

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-25 10:24:06 -08:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00
chrislu
eaeb141b09 move proto package 2022-08-17 12:05:07 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
justin
d02f13c2d1 remove Redundant type conversion and use strings.TrimSuffix to enhance readability 2022-04-06 14:58:09 +08:00
Chris Lu
da49d25950 auto list of storage types 2021-09-04 00:18:21 -07:00
Chris Lu
0652805236 cloud drive: add createBucket() deleteBucket() 2021-09-03 22:30:55 -07:00
Chris Lu
83cd0fc739 cloud drive: add list buckets 2021-09-03 20:42:02 -07:00
Chris Lu
fbfc90fd1e adjust formatting remote location 2021-09-03 18:52:37 -07:00
Chris Lu
3faaa6e360 ensure cached client with updated storage conf 2021-09-01 01:27:45 -07:00
Chris Lu
a31f2907f0 cloud drive: filer.remote.sync supports remove folder 2021-08-29 18:46:28 -07:00
Chris Lu
001a472057 cloud mount: remote storage support hdfs 2021-08-29 18:41:29 -07:00
Chris Lu
05a648bb96 refactor: separating out remote.proto 2021-08-26 15:18:34 -07:00
Chris Lu
5a7c40510f format output 2021-08-15 20:07:13 -07:00
Chris Lu
c45c12aedc metadata updates 2021-08-15 19:27:30 -07:00
Chris Lu
713c035a6e shell: remote.cache remote.uncache 2021-08-09 14:35:18 -07:00
Chris Lu
7412ccdf88 write back remote entry to local entry after uploading to remote 2021-08-08 17:55:03 -07:00
Chris Lu
13e45e1605 filer.remote.sync can work now 2021-08-08 01:21:42 -07:00
Chris Lu
9df7d16791 read <- remote_storage 2021-07-31 22:39:38 -07:00
Chris Lu
899963ac20 remote storage location changed to struct 2021-07-29 02:08:55 -07:00
Chris Lu
c090d6bb25 add ReadRemote(), add read remote setup when filer starts 2021-07-28 22:43:12 -07:00
Chris Lu
1752eeb538 remote.mount saves the mapping 2021-07-27 03:26:35 -07:00
Chris Lu
99b599aa8a remote.mount 2021-07-26 22:53:44 -07:00