seaweedFS

Author	SHA1	Message	Date
Chris Lu	597d383ca4	filer.sync: fix data races in ChunkTransferStatus Add sync.RWMutex to ChunkTransferStatus and lock around all field mutations in fetchAndWrite. ActiveTransfers now returns value copies under RLock so callers get immutable snapshots.	2026-04-02 13:04:21 -07:00
Chris Lu	b5cdd71600	filer.sync: include last error in stall diagnostics	2026-04-02 12:18:56 -07:00
Chris Lu	2d4ea8c665	filer.sync: show active chunk transfers when sync progress stalls When the sync watermark is not advancing, print each in-progress chunk transfer with its file path, bytes received so far, and current status (downloading, uploading, or waiting with backoff duration). This helps diagnose which files are blocking progress during replication. Closes #8542	2026-04-02 12:14:25 -07:00
Chris Lu	b665c329bc	fix(replication): resume partial chunk reads on EOF instead of re-downloading (#8607 ) * fix(replication): resume partial chunk reads on EOF instead of re-downloading When replicating chunks and the source connection drops mid-transfer, accumulate the bytes already received and retry with a Range header to fetch only the remaining bytes. This avoids re-downloading potentially large chunks from scratch on each retry, reducing load on busy source servers and speeding up recovery. * test(replication): add tests for downloadWithRange including gzip partial reads Tests cover: - No offset (no Range header sent) - With offset (Range header verified) - Content-Disposition filename extraction - Partial read + resume: server drops connection mid-transfer, client resumes with Range from the offset of received bytes - Gzip partial read + resume: first response is gzip-encoded (Go auto- decompresses), connection drops, resume request gets decompressed data (Go doesn't add Accept-Encoding when Range is set, so the server decompresses), combined bytes match original * fix(replication): address PR review comments - Consolidate downloadWithRange into DownloadFile with optional offset parameter (variadic), eliminating code duplication (DRY) - Validate HTTP response status: require 206 + correct Content-Range when offset > 0, reject when server ignores Range header - Use if/else for fullData assignment for clarity - Add test for rejected Range (server returns 200 instead of 206) * refactor(replication): remove unused ReplicationSource interface The interface was never referenced and its signature didn't match the actual FilerSource.ReadPart method. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-11 22:38:22 -07:00
Chris Lu	0647f66bb5	filer.sync: add exponential backoff on unexpected EOF during replication (#8557 ) * filer.sync: add exponential backoff on unexpected EOF during replication When the source volume server drops connections under high traffic, filer.sync retries aggressively (every 1-6s), hammering the already overloaded source. This adds a longer exponential backoff (10s to 2min) specifically for "unexpected EOF" errors, reducing pressure on the source while still retrying indefinitely until success. Also adds more logging throughout the replication path: - Log source URL and error at V(0) when ReadPart or io.ReadAll fails - Log content-length and byte counts at V(4) on success - Log backoff duration in retry messages Fixes #8542 * filer.sync: extract backoff helper and fix 2-minute cap - Extract nextEofBackoff() and isEofError() helpers to deduplicate the backoff logic between fetchAndWrite and uploadManifestChunk - Fix the cap: previously 80s would double to 160s and pass the < 2min check uncapped. Now doubles first, then clamps to 2min. * filer.sync: log source URL instead of empty upload URL on read errors UploadUrl is not populated until after the reader is consumed, so the V(0) and V(4) logs were printing an empty string. Add SourceUrl field to UploadOption and populate it from the HTTP response in fetchAndWrite. * filer.sync: guard isEofError against nil error * filer.sync: use errors.Is for EOF detection, fix log wording - Replace broad substring matching ("read input", "unexpected EOF") with errors.Is(err, io.ErrUnexpectedEOF) and errors.Is(err, io.EOF) so only actual EOF errors trigger the longer backoff - Fix awkward log phrasing: "interrupted replicate" → "interrupted while replicating" * filer.sync: remove EOF backoff from uploadManifestChunk uploadManifestChunk reads from an in-memory bytes.Reader, so any EOF errors there are from the destination side, not a broken source stream. The long source-oriented backoff is inappropriate; let RetryUntil handle destination retries at its normal cadence. --------- Co-authored-by: Copilot <copilot@github.com>	2026-03-08 14:33:37 -07:00
Chris Lu	7fcbffed7f	filer.sync: support manifest chunks (#8299 ) * filer.sync support manifest chunks * filersink: address manifest sync review feedback	2026-02-10 20:18:35 -08:00
Chris Lu	be0379f6fd	Fix filer.sync retry on stale chunk (#8298 ) * Fix filer.sync stale chunk uploads * Tweak filersink stale logging	2026-02-10 19:06:35 -08:00
Chris Lu	cc2edfaf68	fix: enable RetryForever for active-active cluster sync to prevent out-of-sync (#7840 ) Fixes #7230 When a cluster goes down during file replication, the chunk upload process would fail after a limited number of retries. Once the remote cluster came back online, those failed uploads were never retried, leaving the clusters out-of-sync. This change enables the RetryForever flag in the UploadOption when replicating chunks between filers. This ensures that upload operations will keep retrying indefinitely, and once the remote cluster comes back online, the pending uploads will automatically succeed. Users no longer need to manually run fs.meta.save and fs.meta.load as a workaround for out-of-sync clusters.	2025-12-22 00:58:23 -08:00
Chris Lu	69553e5ba6	convert error fromating to %w everywhere (#6995 )	2025-07-16 23:39:27 -07:00
vadimartynov	86d92a42b4	Added tls for http clients (#5766 ) * Added global http client * Added Do func for global http client * Changed the code to use the global http client * Fix http client in volume uploader * Fixed pkg name * Fixed http util funcs * Fixed http client for bench_filer_upload * Fixed http client for stress_filer_upload * Fixed http client for filer_server_handlers_proxy * Fixed http client for command_fs_merge_volumes * Fixed http client for command_fs_merge_volumes and command_volume_fsck * Fixed http client for s3api_server * Added init global client for main funcs * Rename global_client to client * Changed: - fixed NewHttpClient; - added CheckIsHttpsClientEnabled func - updated security.toml in scaffold * Reduce the visibility of some functions in the util/http/client pkg * Added the loadSecurityConfig function * Use util.LoadSecurityConfiguration() in NewHttpClient func	2024-07-16 23:14:09 -07:00
chrislu	81fdf3651b	grpc connection to filer add sw-client-id header	2023-01-20 01:48:12 -08:00
chrislu	6ede19e825	add a simple file replication progress bar	2022-12-20 19:47:21 -08:00
chrislu	6c7fe40305	filer sink retries reading file chunks, skipping missing chunks if the file chunk is not available during replication time, the file is skipped	2022-12-19 11:31:58 -08:00
chrislu	ea2637734a	refactor filer proto chunk variable from mtime to modified_ts_ns	2022-10-28 12:53:19 -07:00
chrislu	ea271600ec	fix parameters	2022-10-04 12:36:05 -07:00
chrislu	0452ae6a6c	filer.sync: limit concurrency when fetching file chunks fix https://github.com/seaweedfs/seaweedfs/issues/3787	2022-10-04 11:35:07 -07:00
askeipx	2e78a522ab	remove old raft servers if they don't answer to pings for too long (#3398 ) * remove old raft servers if they don't answer to pings for too long add ping durations as options rename ping fields fix some todos get masters through masterclient raft remove server from leader use raft servers to ping them CheckMastersAlive for hashicorp raft only * prepare blocking ping * pass waitForReady as param * pass waitForReady through all functions * waitForReady works * refactor * remove unneeded params * rollback unneeded changes * fix	2022-08-23 23:18:21 -07:00
chrislu	4081d50607	filer sink: retryable data chunk uploading	2022-08-20 19:09:15 -07:00
Konstantin Lebedev	4d08393b7c	filer prefer volume server in same data center (#3405 ) * initial prefer same data center https://github.com/seaweedfs/seaweedfs/issues/3404 * GetDataCenter * prefer same data center for ReplicationSource * GetDataCenterId * remove glog	2022-08-04 17:35:00 -07:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
chrislu	9f9ef1340c	use streaming mode for long poll grpc calls streaming mode would create separate grpc connections for each call. this is to ensure the long poll connections are properly closed.	2021-12-26 00:15:03 -08:00
Chris Lu	e5fc35ed0c	change server address from string to a type	2021-09-12 22:47:52 -07:00
Chris Lu	6923af7280	refactoring	2021-09-06 16:20:49 -07:00
Chris Lu	8f8738867f	add retry to assign volume fix https://github.com/chrislusf/seaweedfs/issues/2056	2021-05-07 07:29:26 -07:00
Chris Lu	821c46edf1	Merge branch 'master' into support_ssd_volume	2021-02-09 11:37:07 -08:00
Chris Lu	990fa69bfe	add back AdjustedUrl() related code	2021-01-28 14:36:29 -08:00
Chris Lu	00707ec00f	mount: outsideContainerClusterMode proxy through filer Running mount outside of the cluster would not need to expose all the volume servers to outside of the cluster. The chunk read and write will go through the filer.	2021-01-24 19:01:58 -08:00
Chris Lu	6ca10725b8	Revert "mount: when outside cluster network, use filer as proxy to access volume servers" This reverts commit `096e088d7b`.	2021-01-24 03:15:19 -08:00
Chris Lu	096e088d7b	mount: when outside cluster network, use filer as proxy to access volume servers	2021-01-24 01:41:38 -08:00
Chris Lu	80b8692688	filer.sync: replicate outside of either cluster, only need to see filers	2021-01-24 00:01:44 -08:00
Chris Lu	1bf22c0b5b	go fmt	2020-12-16 09:14:05 -08:00
Chris Lu	0d2ec832e2	rename from volumeType to diskType	2020-12-13 11:59:32 -08:00
Chris Lu	e9cd798bd3	adding volume type	2020-12-13 00:58:58 -08:00
Chris Lu	e219c57849	passing full path when assign volume locations	2020-10-25 15:46:29 -07:00
Chris Lu	f375b93aef	renaming	2020-10-25 15:32:43 -07:00
Chris Lu	723ae11db4	refactoring in order to adjust volume server url later	2020-10-11 20:15:10 -07:00
Chris Lu	4fc0bd1a81	return http response directly	2020-09-09 03:53:09 -07:00
Chris Lu	97239ce6f1	rename filechunk is_gzipped to is_compressed	2020-06-20 08:15:49 -07:00
Chris Lu	ed3cf811f5	refactoring	2020-04-29 13:26:02 -07:00
Chris Lu	eedd33dda3	refactoring	2020-03-28 13:41:58 -07:00
Chris Lu	b97768c51c	refactoring	2020-03-23 01:30:22 -07:00
Chris Lu	bec6ec7db6	go fmt	2020-03-17 10:01:55 -07:00
HongyanShen	81610ed006	fix: #1226	2020-03-11 14:37:14 +08:00
Chris Lu	2e3f6ad3a9	filer: remember content is gzipped or not	2020-03-08 21:39:33 -07:00
Chris Lu	13e215ee5c	filer: option to encrypt data on volume server	2020-03-06 00:49:47 -08:00
Chris Lu	f90c43635d	refactoring	2020-03-04 00:39:47 -08:00
Chris Lu	892e726eb9	avoid reusing context object fix https://github.com/chrislusf/seaweedfs/issues/1182	2020-02-25 21:50:12 -08:00
Chris Lu	0841bedb15	move filer assign volume grpc errror to response	2020-02-25 17:15:09 -08:00
Chris Lu	6ab7368ef2	filer: dynamically create bucket under /buckets folder	2020-02-24 22:28:45 -08:00
Chris Lu	72a64a5cf8	use the same context object in order to retry	2020-01-26 14:42:11 -08:00

1 2

56 Commits