* filer.sync: show active chunk transfers when sync progress stalls
When the sync watermark is not advancing, print each in-progress chunk
transfer with its file path, bytes received so far, and current status
(downloading, uploading, or waiting with backoff duration). This helps
diagnose which files are blocking progress during replication.
Closes#8542
* filer.sync: include last error in stall diagnostics
* filer.sync: fix data races in ChunkTransferStatus
Add sync.RWMutex to ChunkTransferStatus and lock around all field
mutations in fetchAndWrite. ActiveTransfers now returns value copies
under RLock so callers get immutable snapshots.
* improve large file sync throughput for remote.cache and filer.sync
Three main throughput improvements:
1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file
instead of always starting at 5MB. A 500MB file now uses ~16MB chunks
(32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk
overhead (volume assign, gRPC call, needle write) by 3x.
2. Configurable concurrency at every layer:
- remote.cache chunk concurrency: -chunkConcurrency flag (default 8)
- remote.cache S3 download concurrency: -downloadConcurrency flag
(default raised from 1 to 5 per chunk)
- filer.sync chunk concurrency: -chunkConcurrency flag (default 32)
3. S3 multipart download concurrency raised from 1 to 5: the S3 manager
downloader was using Concurrency=1, serializing all part downloads
within each chunk. This alone can 5x per-chunk download speed.
The concurrency values flow through the gRPC request chain:
shell command → CacheRemoteObjectToLocalClusterRequest →
FetchAndWriteNeedleRequest → S3 downloader
Zero values in the request mean "use server defaults", maintaining
full backward compatibility with existing callers.
Ref #8481
* fix: use full maxMB for chunk size cap and remove loop guard
Address review feedback:
- Use full maxMB instead of maxMB/2 for maxChunkSize to avoid
unnecessarily limiting chunk size for very large files.
- Remove chunkSize < maxChunkSize guard from the safety loop so it
can always grow past maxChunkSize when needed to stay under 1000
chunks (e.g., extremely large files with small maxMB).
* address review feedback: help text, validation, naming, docs
- Fix help text for -chunkConcurrency and -downloadConcurrency flags
to say "0 = server default" instead of advertising specific numeric
defaults that could drift from the server implementation.
- Validate chunkConcurrency and downloadConcurrency are within int32
range before narrowing, returning a user-facing error if out of range.
- Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions.
- Add doc comment to SetChunkConcurrency noting it must be called
during initialization before replication goroutines start.
- Replace doubling loop in chunk size safety check with direct
ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap.
* address Copilot review: clamp concurrency, fix chunk count, clarify proto docs
- Use ceiling division for chunk count check to avoid overcounting
when file size is an exact multiple of chunk size.
- Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024
at filer, max 64 at volume server) to prevent excessive goroutines.
- Always use ReadFileWithConcurrency when the client supports it,
falling back to the implementation's default when value is 0.
- Clarify proto comments that download_concurrency only applies when
the remote storage client supports it (currently S3).
- Include specific server defaults in help text (e.g., "0 = server
default 8") so users see the actual values in -h output.
* fix data race on executionErr and use %w for error wrapping
- Protect concurrent writes to executionErr in remote.cache worker
goroutines with a sync.Mutex to eliminate the data race.
- Use %w instead of %v in volume_grpc_remote.go error formatting
to preserve the error chain for errors.Is/errors.As callers.
* fix(replication): resume partial chunk reads on EOF instead of re-downloading
When replicating chunks and the source connection drops mid-transfer,
accumulate the bytes already received and retry with a Range header
to fetch only the remaining bytes. This avoids re-downloading
potentially large chunks from scratch on each retry, reducing load
on busy source servers and speeding up recovery.
* test(replication): add tests for downloadWithRange including gzip partial reads
Tests cover:
- No offset (no Range header sent)
- With offset (Range header verified)
- Content-Disposition filename extraction
- Partial read + resume: server drops connection mid-transfer, client
resumes with Range from the offset of received bytes
- Gzip partial read + resume: first response is gzip-encoded (Go auto-
decompresses), connection drops, resume request gets decompressed data
(Go doesn't add Accept-Encoding when Range is set, so the server
decompresses), combined bytes match original
* fix(replication): address PR review comments
- Consolidate downloadWithRange into DownloadFile with optional offset
parameter (variadic), eliminating code duplication (DRY)
- Validate HTTP response status: require 206 + correct Content-Range
when offset > 0, reject when server ignores Range header
- Use if/else for fullData assignment for clarity
- Add test for rejected Range (server returns 200 instead of 206)
* refactor(replication): remove unused ReplicationSource interface
The interface was never referenced and its signature didn't match
the actual FilerSource.ReadPart method.
---------
Co-authored-by: Copilot <copilot@github.com>
* filer.sync: add exponential backoff on unexpected EOF during replication
When the source volume server drops connections under high traffic,
filer.sync retries aggressively (every 1-6s), hammering the already
overloaded source. This adds a longer exponential backoff (10s to 2min)
specifically for "unexpected EOF" errors, reducing pressure on the
source while still retrying indefinitely until success.
Also adds more logging throughout the replication path:
- Log source URL and error at V(0) when ReadPart or io.ReadAll fails
- Log content-length and byte counts at V(4) on success
- Log backoff duration in retry messages
Fixes#8542
* filer.sync: extract backoff helper and fix 2-minute cap
- Extract nextEofBackoff() and isEofError() helpers to deduplicate
the backoff logic between fetchAndWrite and uploadManifestChunk
- Fix the cap: previously 80s would double to 160s and pass the
< 2min check uncapped. Now doubles first, then clamps to 2min.
* filer.sync: log source URL instead of empty upload URL on read errors
UploadUrl is not populated until after the reader is consumed, so the
V(0) and V(4) logs were printing an empty string. Add SourceUrl field
to UploadOption and populate it from the HTTP response in fetchAndWrite.
* filer.sync: guard isEofError against nil error
* filer.sync: use errors.Is for EOF detection, fix log wording
- Replace broad substring matching ("read input", "unexpected EOF")
with errors.Is(err, io.ErrUnexpectedEOF) and errors.Is(err, io.EOF)
so only actual EOF errors trigger the longer backoff
- Fix awkward log phrasing: "interrupted replicate" → "interrupted
while replicating"
* filer.sync: remove EOF backoff from uploadManifestChunk
uploadManifestChunk reads from an in-memory bytes.Reader, so any EOF
errors there are from the destination side, not a broken source stream.
The long source-oriented backoff is inappropriate; let RetryUntil
handle destination retries at its normal cadence.
---------
Co-authored-by: Copilot <copilot@github.com>
Fixes#7230
When a cluster goes down during file replication, the chunk upload process
would fail after a limited number of retries. Once the remote cluster came
back online, those failed uploads were never retried, leaving the clusters
out-of-sync.
This change enables the RetryForever flag in the UploadOption when
replicating chunks between filers. This ensures that upload operations
will keep retrying indefinitely, and once the remote cluster comes back
online, the pending uploads will automatically succeed.
Users no longer need to manually run fs.meta.save and fs.meta.load as
a workaround for out-of-sync clusters.
* Added global http client
* Added Do func for global http client
* Changed the code to use the global http client
* Fix http client in volume uploader
* Fixed pkg name
* Fixed http util funcs
* Fixed http client for bench_filer_upload
* Fixed http client for stress_filer_upload
* Fixed http client for filer_server_handlers_proxy
* Fixed http client for command_fs_merge_volumes
* Fixed http client for command_fs_merge_volumes and command_volume_fsck
* Fixed http client for s3api_server
* Added init global client for main funcs
* Rename global_client to client
* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold
* Reduce the visibility of some functions in the util/http/client pkg
* Added the loadSecurityConfig function
* Use util.LoadSecurityConfiguration() in NewHttpClient func
Fix https://github.com/seaweedfs/seaweedfs/issues/3714
The destination chunks may be empty. For example, the file is updated and the volume is vacuumed. In this case, the sync would miss the old chunks. This is fine. However, the entry would have correct metadata but missing chunks.
For this case, the simple metadata comparison would be wrongly skipping data changes, and the file will stay empty unless file content md5 is changed.
* remove old raft servers if they don't answer to pings for too long
add ping durations as options
rename ping fields
fix some todos
get masters through masterclient
raft remove server from leader
use raft servers to ping them
CheckMastersAlive for hashicorp raft only
* prepare blocking ping
* pass waitForReady as param
* pass waitForReady through all functions
* waitForReady works
* refactor
* remove unneeded params
* rollback unneeded changes
* fix
Run two servers with volumes and fillers:
server -dir=Server1alpha -master.port=11000 -filer -filer.port=11001 -volume.port=11002
server -dir=Server1sigma -master.port=11006 -filer -filer.port=11007 -volume.port=11008
Run Active-Passive filler.sync:
filer.sync -a localhost:11007 -b localhost:11001 -isActivePassive
Upload file to 11007 port:
curl -F file=@/Desktop/9.xml "http://localhost:11007/testFacebook/"
If we request a file on two servers now, everything will be correct, even if we add data to the file and upload it again:
curl "http://localhost:11007/testFacebook/9.xml"
EQUALS
curl "http://localhost:11001/testFacebook/9.xml"
However, if we change the already existing data in the file (for example, we change the first line in the file, reducing its length), then this file on the second server will not be valid and will not be equivalent to the first file
Снимок экрана 2022-02-07 в 14 21 11
This problem occurs on line 202 in the filer_sink.go file. In particular, this is due to incorrect mapping of chunk names in the DoMinusChunks function. The names of deletedChunks do not match the chunks of existingEntry.Chunks, since the first chunks come from another server and have a different addressing (name) compared to the addressing on the server where the file is being overwritten.
Deleted chunks are not actually deleted on the server to which the file is replicated.
Running mount outside of the cluster would not need to expose all the volume servers to outside of the cluster. The chunk read and write will go through the filer.