Commit Graph

64 Commits

Author SHA1 Message Date
Chris Lu
597d383ca4 filer.sync: fix data races in ChunkTransferStatus
Add sync.RWMutex to ChunkTransferStatus and lock around all field
mutations in fetchAndWrite. ActiveTransfers now returns value copies
under RLock so callers get immutable snapshots.
2026-04-02 13:04:21 -07:00
Chris Lu
b5cdd71600 filer.sync: include last error in stall diagnostics 2026-04-02 12:18:56 -07:00
Chris Lu
2d4ea8c665 filer.sync: show active chunk transfers when sync progress stalls
When the sync watermark is not advancing, print each in-progress chunk
transfer with its file path, bytes received so far, and current status
(downloading, uploading, or waiting with backoff duration). This helps
diagnose which files are blocking progress during replication.

Closes #8542
2026-04-02 12:14:25 -07:00
Chris Lu
81369b8a83 improve: large file sync throughput for remote.cache and filer.sync (#8676)
* improve large file sync throughput for remote.cache and filer.sync

Three main throughput improvements:

1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file
   instead of always starting at 5MB. A 500MB file now uses ~16MB chunks
   (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk
   overhead (volume assign, gRPC call, needle write) by 3x.

2. Configurable concurrency at every layer:
   - remote.cache chunk concurrency: -chunkConcurrency flag (default 8)
   - remote.cache S3 download concurrency: -downloadConcurrency flag
     (default raised from 1 to 5 per chunk)
   - filer.sync chunk concurrency: -chunkConcurrency flag (default 32)

3. S3 multipart download concurrency raised from 1 to 5: the S3 manager
   downloader was using Concurrency=1, serializing all part downloads
   within each chunk. This alone can 5x per-chunk download speed.

The concurrency values flow through the gRPC request chain:
  shell command → CacheRemoteObjectToLocalClusterRequest →
  FetchAndWriteNeedleRequest → S3 downloader

Zero values in the request mean "use server defaults", maintaining
full backward compatibility with existing callers.

Ref #8481

* fix: use full maxMB for chunk size cap and remove loop guard

Address review feedback:
- Use full maxMB instead of maxMB/2 for maxChunkSize to avoid
  unnecessarily limiting chunk size for very large files.
- Remove chunkSize < maxChunkSize guard from the safety loop so it
  can always grow past maxChunkSize when needed to stay under 1000
  chunks (e.g., extremely large files with small maxMB).

* address review feedback: help text, validation, naming, docs

- Fix help text for -chunkConcurrency and -downloadConcurrency flags
  to say "0 = server default" instead of advertising specific numeric
  defaults that could drift from the server implementation.
- Validate chunkConcurrency and downloadConcurrency are within int32
  range before narrowing, returning a user-facing error if out of range.
- Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions.
- Add doc comment to SetChunkConcurrency noting it must be called
  during initialization before replication goroutines start.
- Replace doubling loop in chunk size safety check with direct
  ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap.

* address Copilot review: clamp concurrency, fix chunk count, clarify proto docs

- Use ceiling division for chunk count check to avoid overcounting
  when file size is an exact multiple of chunk size.
- Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024
  at filer, max 64 at volume server) to prevent excessive goroutines.
- Always use ReadFileWithConcurrency when the client supports it,
  falling back to the implementation's default when value is 0.
- Clarify proto comments that download_concurrency only applies when
  the remote storage client supports it (currently S3).
- Include specific server defaults in help text (e.g., "0 = server
  default 8") so users see the actual values in -h output.

* fix data race on executionErr and use %w for error wrapping

- Protect concurrent writes to executionErr in remote.cache worker
  goroutines with a sync.Mutex to eliminate the data race.
- Use %w instead of %v in volume_grpc_remote.go error formatting
  to preserve the error chain for errors.Is/errors.As callers.
2026-03-17 16:49:56 -07:00
Chris Lu
7fcbffed7f filer.sync: support manifest chunks (#8299)
* filer.sync support manifest chunks

* filersink: address manifest sync review feedback
2026-02-10 20:18:35 -08:00
Chris Lu
be0379f6fd Fix filer.sync retry on stale chunk (#8298)
* Fix filer.sync stale chunk uploads

* Tweak filersink stale logging
2026-02-10 19:06:35 -08:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
Aleksey Kosov
165af32d6b added context to filer_client method calls (#6808)
Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-05-22 09:46:49 -07:00
chrislu
c6dec11ea5 [filer.sync] skip overwriting existing fresh entry 2024-07-16 09:38:10 -07:00
chrislu
81fdf3651b grpc connection to filer add sw-client-id header 2023-01-20 01:48:12 -08:00
chrislu
6c7fe40305 filer sink retries reading file chunks, skipping missing chunks
if the file chunk is not available during replication time, the file is skipped
2022-12-19 11:31:58 -08:00
chrislu
70a4c98b00 refactor filer_pb.Entry and filer.Entry to use GetChunks()
for later locking on reading chunks
2022-11-15 06:33:36 -08:00
chrislu
0d817bc347 fix invalid memory address or nil pointer dereference on filer.sync
fix https://github.com/seaweedfs/seaweedfs/issues/3826
2022-10-11 21:58:17 -07:00
chrislu
0452ae6a6c filer.sync: limit concurrency when fetching file chunks
fix https://github.com/seaweedfs/seaweedfs/issues/3787
2022-10-04 11:35:07 -07:00
chrislu
b463ca1a2f filer replication: compare content changes directly
Fix https://github.com/seaweedfs/seaweedfs/issues/3714

The destination chunks may be empty. For example, the file is updated and the volume is vacuumed. In this case, the sync would miss the old chunks. This is fine. However, the entry would have correct metadata but missing chunks.

For this case, the simple metadata comparison would be wrongly skipping data changes, and the file will stay empty unless file content md5 is changed.
2022-09-20 08:35:10 -07:00
Ryan Russell
d734fff322 docs: replicte -> replicate (#3664) 2022-09-14 10:01:18 -07:00
chrislu
cb6cf331ca filer.backup and filer.sync: include headers during backup and sync
fix https://github.com/seaweedfs/seaweedfs/issues/3532
2022-09-04 18:26:36 -07:00
Konstantin Lebedev
4d08393b7c filer prefer volume server in same data center (#3405)
* initial prefer same data center
https://github.com/seaweedfs/seaweedfs/issues/3404

* GetDataCenter

* prefer same data center for ReplicationSource

* GetDataCenterId

* remove glog
2022-08-04 17:35:00 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
139e039c44 filer.sync: pass attributes for mount
fix https://github.com/chrislusf/seaweedfs/issues/3012
2022-05-06 03:54:12 -07:00
chrislu
9405eaefdb filer.sync: fix replicating partially updated file
Run two servers with volumes and fillers:
server -dir=Server1alpha -master.port=11000 -filer -filer.port=11001 -volume.port=11002
server -dir=Server1sigma -master.port=11006 -filer -filer.port=11007 -volume.port=11008

Run Active-Passive filler.sync:
filer.sync -a localhost:11007 -b localhost:11001 -isActivePassive

Upload file to 11007 port:
curl -F file=@/Desktop/9.xml "http://localhost:11007/testFacebook/"

If we request a file on two servers now, everything will be correct, even if we add data to the file and upload it again:
curl "http://localhost:11007/testFacebook/9.xml"
EQUALS
curl "http://localhost:11001/testFacebook/9.xml"

However, if we change the already existing data in the file (for example, we change the first line in the file, reducing its length), then this file on the second server will not be valid and will not be equivalent to the first file

Снимок экрана 2022-02-07 в 14 21 11

This problem occurs on line 202 in the filer_sink.go file. In particular, this is due to incorrect mapping of chunk names in the DoMinusChunks function. The names of deletedChunks do not match the chunks of existingEntry.Chunks, since the first chunks come from another server and have a different addressing (name) compared to the addressing on the server where the file is being overwritten.

Deleted chunks are not actually deleted on the server to which the file is replicated.
2022-02-07 03:46:28 -08:00
chrislu
9f9ef1340c use streaming mode for long poll grpc calls
streaming mode would create separate grpc connections for each call.
this is to ensure the long poll connections are properly closed.
2021-12-26 00:15:03 -08:00
Chris Lu
99b599aa8a remote.mount 2021-07-26 22:53:44 -07:00
Chris Lu
7359193e97 go fmt 2021-07-21 14:38:12 -07:00
Chris Lu
7ab389e7ec optimization: improve random range query for large files 2021-07-19 23:07:22 -07:00
Chris Lu
450222dd64 add remote to filer.Entry and filer_pb entry, add RemoteConf 2021-07-19 02:47:27 -07:00
Chris Lu
540441fd38 go fmt 2021-02-28 20:34:14 -08:00
Chris Lu
678c54d705 data sink: add incremental mode 2021-02-28 16:19:03 -08:00
Chris Lu
a0e84c4fbc go fmt 2021-02-10 23:41:05 -08:00
Chris Lu
821c46edf1 Merge branch 'master' into support_ssd_volume 2021-02-09 11:37:07 -08:00
Chris Lu
80b8692688 filer.sync: replicate outside of either cluster, only need to see filers 2021-01-24 00:01:44 -08:00
Chris Lu
2b76854641 add "weed filer.cat" to read files directly from volume servers 2021-01-06 04:22:00 -08:00
Chris Lu
1bf22c0b5b go fmt 2020-12-16 09:14:05 -08:00
Chris Lu
51eadaf2b6 rename parameter name to "disk" 2020-12-13 12:05:31 -08:00
Chris Lu
0d2ec832e2 rename from volumeType to diskType 2020-12-13 11:59:32 -08:00
Chris Lu
e9cd798bd3 adding volume type 2020-12-13 00:58:58 -08:00
Chris Lu
f4abd01adf filer: cache small file to filer store 2020-11-30 04:34:04 -08:00
Chris Lu
e219c57849 passing full path when assign volume locations 2020-10-25 15:46:29 -07:00
Chris Lu
387ab6796f filer: cross cluster synchronization 2020-09-09 11:21:23 -07:00
Chris Lu
eb7929a971 rename filer2 to filer 2020-09-01 00:21:19 -07:00
Chris Lu
ca658a97c5 add signatures to messages to avoid double processing 2020-08-28 23:48:48 -07:00
Chris Lu
97d97f3528 go code can read and write chunk manifest 2020-07-19 17:59:43 -07:00
Chris Lu
31e23e9783 filer: support active<=>active filer replication 2020-06-30 22:53:57 -07:00
Chris Lu
ec2eb8bc48 add If-None-Match and If-Modified-Since
fix https://github.com/chrislusf/seaweedfs/issues/1269
2020-04-08 08:12:00 -07:00
Chris Lu
b97768c51c refactoring 2020-03-23 01:30:22 -07:00
Chris Lu
c0f0fdb3ba refactoring 2020-03-23 00:01:34 -07:00
Chris Lu
8645283a7b fuse mount: avoid lookup nil entry
fix https://github.com/chrislusf/seaweedfs/issues/1221
2020-03-07 16:51:46 -08:00
Chris Lu
892e726eb9 avoid reusing context object
fix https://github.com/chrislusf/seaweedfs/issues/1182
2020-02-25 21:50:12 -08:00
Chris Lu
7d10fdf737 fix directory lookup nil 2020-02-25 11:13:06 -08:00