Files
seaweedFS/weed/pb/remote.proto
Chris Lu f3c5ba3cd6 feat(filer): add lazy directory listing for remote mounts (#8615)
* feat(filer): add lazy directory listing for remote mounts

Directory listings on remote mounts previously only queried the local
filer store. With lazy mounts the listing was empty; with eager mounts
it went stale over time.

Add on-demand directory listing that fetches from remote and caches
results with a 5-minute TTL:

- Add `ListDirectory` to `RemoteStorageClient` interface (delimiter-based,
  single-level listing, separate from recursive `Traverse`)
- Implement in S3, GCS, and Azure backends using each platform's
  hierarchical listing API
- Add `maybeLazyListFromRemote` to filer: before each directory listing,
  check if the directory is under a remote mount with an expired cache,
  fetch from remote, persist entries to the local store, then let existing
  listing logic run on the populated store
- Use singleflight to deduplicate concurrent requests for the same directory
- Skip local-only entries (no RemoteEntry) to avoid overwriting unsynced uploads
- Errors are logged and swallowed (availability over consistency)

* refactor: extract xattr key to constant xattrRemoteListingSyncedAt

* feat: make listing cache TTL configurable per mount via listing_cache_ttl_seconds

Add listing_cache_ttl_seconds field to RemoteStorageLocation protobuf.
When 0 (default), lazy directory listing is disabled for that mount.
When >0, enables on-demand directory listing with the specified TTL.

Expose as -listingCacheTTL flag on remote.mount command.

* refactor: address review feedback for lazy directory listing

- Add context.Context to ListDirectory interface and all implementations
- Capture startTime before remote call for accurate TTL tracking
- Simplify S3 ListDirectory using ListObjectsV2PagesWithContext
- Make maybeLazyListFromRemote return void (errors always swallowed)
- Remove redundant trailing-slash path manipulation in caller
- Update tests to match new signatures

* When an existing entry has Remote != nil, we should merge remote metadata   into it rather than replacing it.

* fix(gcs): wrap ListDirectory iterator error with context

The raw iterator error was returned without bucket/path context,
making it harder to debug. Wrap it consistently with the S3 pattern.

* fix(s3): guard against nil pointer dereference in Traverse and ListDirectory

Some S3-compatible backends may return nil for LastModified, Size, or
ETag fields. Check for nil before dereferencing to prevent panics.

* fix(filer): remove blanket 2-minute timeout from lazy listing context

Individual SDK operations (S3, GCS, Azure) already have per-request
timeouts and retry policies. The blanket timeout could cut off large
directory listings mid-operation even though individual pages were
succeeding.

* fix(filer): preserve trace context in lazy listing with WithoutCancel

Use context.WithoutCancel(ctx) instead of context.Background() so
trace/span values from the incoming request are retained for
distributed tracing, while still decoupling cancellation.

* fix(filer): use Store.FindEntry for internal lookups, add Uid/Gid to files, fix updateDirectoryListingSyncedAt

- Use f.Store.FindEntry instead of f.FindEntry for staleness check and
  child lookups to avoid unnecessary lazy-fetch overhead
- Set OS_UID/OS_GID on new file entries for consistency with directories
- In updateDirectoryListingSyncedAt, use Store.UpdateEntry for existing
  directories instead of CreateEntry to avoid deleteChunksIfNotNew and
  NotifyUpdateEvent side effects

* fix(filer): distinguish not-found from store errors in lazy listing

Previously, any error from Store.FindEntry was treated as "not found,"
which could cause entry recreation/overwrite on transient DB failures.
Now check for filer_pb.ErrNotFound explicitly and skip entries or
bail out on real store errors.

* refactor(filer): use errors.Is for ErrNotFound comparisons
2026-03-13 09:36:54 -07:00

78 lines
2.0 KiB
Protocol Buffer

syntax = "proto3";
package remote_pb;
option go_package = "github.com/seaweedfs/seaweedfs/weed/pb/remote_pb";
option java_package = "seaweedfs.client";
option java_outer_classname = "FilerProto";
/////////////////////////
// Remote Storage related
/////////////////////////
message RemoteConf {
string type = 1;
string name = 2;
string s3_access_key = 4;
string s3_secret_key = 5;
string s3_region = 6;
string s3_endpoint = 7;
string s3_storage_class = 8;
bool s3_force_path_style = 9;
bool s3_support_tagging = 13;
bool s3_v4_signature = 11;
string gcs_google_application_credentials = 10;
string gcs_project_id = 12;
string azure_account_name = 15;
string azure_account_key = 16;
string backblaze_key_id = 20;
string backblaze_application_key = 21;
string backblaze_endpoint = 22;
string backblaze_region = 23;
string aliyun_access_key = 25;
string aliyun_secret_key = 26;
string aliyun_endpoint = 27;
string aliyun_region = 28;
string tencent_secret_id = 30;
string tencent_secret_key = 31;
string tencent_endpoint = 32;
string baidu_access_key = 35;
string baidu_secret_key = 36;
string baidu_endpoint = 37;
string baidu_region = 38;
string wasabi_access_key = 40;
string wasabi_secret_key = 41;
string wasabi_endpoint = 42;
string wasabi_region = 43;
string filebase_access_key = 60;
string filebase_secret_key = 61;
string filebase_endpoint = 62;
string storj_access_key = 65;
string storj_secret_key = 66;
string storj_endpoint = 67;
string contabo_access_key = 68;
string contabo_secret_key = 69;
string contabo_endpoint = 70;
string contabo_region = 71;
}
message RemoteStorageMapping {
map<string,RemoteStorageLocation> mappings = 1;
string primary_bucket_storage_name = 2;
}
message RemoteStorageLocation {
string name = 1;
string bucket = 2;
string path = 3;
int32 listing_cache_ttl_seconds = 4; // 0 = disabled; >0 enables on-demand directory listing with this TTL in seconds
}