38 Commits

Author SHA1 Message Date
Chris Lu
8572aae403 filer.sync: support per-cluster mTLS with -a.security and -b.security (#8872)
* filer.sync: support per-cluster mTLS with -a.security and -b.security flags

When syncing between two clusters that use different certificate authorities,
a single security.toml cannot authenticate to both. Add -a.security and
-b.security flags so each filer can use its own security.toml for TLS.

Closes #8481

* security: fatal on failure to read explicitly provided security config

When -a.security or -b.security is specified, falling back to insecure
credentials on read error would silently bypass mTLS. Fatal instead.

* fix(filer.sync): use source filer's fromTsMs flag in initOffsetFromTsMs

A→B was using bFromTsMs and B→A was using aFromTsMs — these were
swapped. Each path should seed the target's offset with the source
filer's starting timestamp.

* security: return error from LoadClientTLSFromFile, resolve relative PEM paths

Change LoadClientTLSFromFile to return (grpc.DialOption, error) so
callers can handle failures explicitly instead of a silent insecure
fallback. Resolve relative PEM paths (grpc.ca, grpc.client.cert,
grpc.client.key) against the config file's directory.
2026-04-01 11:05:43 -07:00
Chris Lu
b665c329bc fix(replication): resume partial chunk reads on EOF instead of re-downloading (#8607)
* fix(replication): resume partial chunk reads on EOF instead of re-downloading

When replicating chunks and the source connection drops mid-transfer,
accumulate the bytes already received and retry with a Range header
to fetch only the remaining bytes. This avoids re-downloading
potentially large chunks from scratch on each retry, reducing load
on busy source servers and speeding up recovery.

* test(replication): add tests for downloadWithRange including gzip partial reads

Tests cover:
- No offset (no Range header sent)
- With offset (Range header verified)
- Content-Disposition filename extraction
- Partial read + resume: server drops connection mid-transfer, client
  resumes with Range from the offset of received bytes
- Gzip partial read + resume: first response is gzip-encoded (Go auto-
  decompresses), connection drops, resume request gets decompressed data
  (Go doesn't add Accept-Encoding when Range is set, so the server
  decompresses), combined bytes match original

* fix(replication): address PR review comments

- Consolidate downloadWithRange into DownloadFile with optional offset
  parameter (variadic), eliminating code duplication (DRY)
- Validate HTTP response status: require 206 + correct Content-Range
  when offset > 0, reject when server ignores Range header
- Use if/else for fullData assignment for clarity
- Add test for rejected Range (server returns 200 instead of 206)

* refactor(replication): remove unused ReplicationSource interface

The interface was never referenced and its signature didn't match
the actual FilerSource.ReadPart method.

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-11 22:38:22 -07:00
Chris Lu
0647f66bb5 filer.sync: add exponential backoff on unexpected EOF during replication (#8557)
* filer.sync: add exponential backoff on unexpected EOF during replication

When the source volume server drops connections under high traffic,
filer.sync retries aggressively (every 1-6s), hammering the already
overloaded source. This adds a longer exponential backoff (10s to 2min)
specifically for "unexpected EOF" errors, reducing pressure on the
source while still retrying indefinitely until success.

Also adds more logging throughout the replication path:
- Log source URL and error at V(0) when ReadPart or io.ReadAll fails
- Log content-length and byte counts at V(4) on success
- Log backoff duration in retry messages

Fixes #8542

* filer.sync: extract backoff helper and fix 2-minute cap

- Extract nextEofBackoff() and isEofError() helpers to deduplicate
  the backoff logic between fetchAndWrite and uploadManifestChunk
- Fix the cap: previously 80s would double to 160s and pass the
  < 2min check uncapped. Now doubles first, then clamps to 2min.

* filer.sync: log source URL instead of empty upload URL on read errors

UploadUrl is not populated until after the reader is consumed, so the
V(0) and V(4) logs were printing an empty string. Add SourceUrl field
to UploadOption and populate it from the HTTP response in fetchAndWrite.

* filer.sync: guard isEofError against nil error

* filer.sync: use errors.Is for EOF detection, fix log wording

- Replace broad substring matching ("read input", "unexpected EOF")
  with errors.Is(err, io.ErrUnexpectedEOF) and errors.Is(err, io.EOF)
  so only actual EOF errors trigger the longer backoff
- Fix awkward log phrasing: "interrupted replicate" → "interrupted
  while replicating"

* filer.sync: remove EOF backoff from uploadManifestChunk

uploadManifestChunk reads from an in-memory bytes.Reader, so any EOF
errors there are from the destination side, not a broken source stream.
The long source-oriented backoff is inappropriate; let RetryUntil
handle destination retries at its normal cadence.

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-08 14:33:37 -07:00
Aleksey Kosov
4511c2cc1f Changes logging function (#6919)
* updated logging methods for stores

* updated logging methods for stores

* updated logging methods for filer

* updated logging methods for uploader and http_util

* updated logging methods for weed server

---------

Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-06-24 08:44:06 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
vadimartynov
86d92a42b4 Added tls for http clients (#5766)
* Added global http client

* Added Do func for global http client

* Changed the code to use the global http client

* Fix http client in volume uploader

* Fixed pkg name

* Fixed http util funcs

* Fixed http client for bench_filer_upload

* Fixed http client for stress_filer_upload

* Fixed http client for filer_server_handlers_proxy

* Fixed http client for command_fs_merge_volumes

* Fixed http client for command_fs_merge_volumes and command_volume_fsck

* Fixed http client for s3api_server

* Added init global client for main funcs

* Rename global_client to client

* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold

* Reduce the visibility of some functions in the util/http/client pkg

* Added the loadSecurityConfig function

* Use util.LoadSecurityConfiguration() in NewHttpClient func
2024-07-16 23:14:09 -07:00
chrislu
81fdf3651b grpc connection to filer add sw-client-id header 2023-01-20 01:48:12 -08:00
askeipx
2e78a522ab remove old raft servers if they don't answer to pings for too long (#3398)
* remove old raft servers if they don't answer to pings for too long

add ping durations as options

rename ping fields

fix some todos

get masters through masterclient

raft remove server from leader

use raft servers to ping them

CheckMastersAlive for hashicorp raft only

* prepare blocking ping

* pass waitForReady as param

* pass waitForReady through all functions

* waitForReady works

* refactor

* remove unneeded params

* rollback unneeded changes

* fix
2022-08-23 23:18:21 -07:00
Konstantin Lebedev
4d08393b7c filer prefer volume server in same data center (#3405)
* initial prefer same data center
https://github.com/seaweedfs/seaweedfs/issues/3404

* GetDataCenter

* prefer same data center for ReplicationSource

* GetDataCenterId

* remove glog
2022-08-04 17:35:00 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
9f9ef1340c use streaming mode for long poll grpc calls
streaming mode would create separate grpc connections for each call.
this is to ensure the long poll connections are properly closed.
2021-12-26 00:15:03 -08:00
Chris Lu
0db2517994 go fmt 2021-08-14 02:55:44 -07:00
Chris Lu
5a0f92423e use grpc and jwt 2021-08-12 21:40:33 -07:00
Chris Lu
921e0d5008 remove verbose log 2021-05-26 14:43:34 -07:00
Chris Lu
9abb041763 filer source: support filerProxy mode 2021-02-28 16:19:47 -08:00
Chris Lu
990fa69bfe add back AdjustedUrl() related code 2021-01-28 14:36:29 -08:00
Chris Lu
00707ec00f mount: outsideContainerClusterMode proxy through filer
Running mount outside of the cluster would not need to expose all the volume servers to outside of the cluster. The chunk read and write will go through the filer.
2021-01-24 19:01:58 -08:00
Chris Lu
6ca10725b8 Revert "mount: when outside cluster network, use filer as proxy to access volume servers"
This reverts commit 096e088d7b.
2021-01-24 03:15:19 -08:00
Chris Lu
096e088d7b mount: when outside cluster network, use filer as proxy to access volume servers 2021-01-24 01:41:38 -08:00
Chris Lu
80b8692688 filer.sync: replicate outside of either cluster, only need to see filers 2021-01-24 00:01:44 -08:00
Chris Lu
723ae11db4 refactoring in order to adjust volume server url later 2020-10-11 20:15:10 -07:00
Chris Lu
a8624c2e4f read from alternative replica
related to https://github.com/chrislusf/seaweedfs/issues/1512
2020-10-07 22:49:04 -07:00
Chris Lu
387ab6796f filer: cross cluster synchronization 2020-09-09 11:21:23 -07:00
Chris Lu
4fc0bd1a81 return http response directly 2020-09-09 03:53:09 -07:00
Chris Lu
ed3cf811f5 refactoring 2020-04-29 13:26:02 -07:00
Chris Lu
f90c43635d refactoring 2020-03-04 00:39:47 -08:00
Chris Lu
892e726eb9 avoid reusing context object
fix https://github.com/chrislusf/seaweedfs/issues/1182
2020-02-25 21:50:12 -08:00
Chris Lu
d335f04de6 support env variables to overwrite toml file 2020-01-29 09:09:55 -08:00
Chris Lu
72a64a5cf8 use the same context object in order to retry 2020-01-26 14:42:11 -08:00
Chris Lu
c789b496d8 use cached grpc client 2019-04-05 20:31:58 -07:00
Chris Lu
55bab1b456 add context.Context 2019-03-15 17:20:24 -07:00
Chris Lu
77b9af531d adding grpc mutual tls 2019-02-18 12:11:52 -08:00
Chris Lu
e8ef501f02 add s3 replication sink 2018-10-03 23:36:52 -07:00
Chris Lu
db69ce89f0 go fmt 2018-09-21 01:56:43 -07:00
Chris Lu
a6cfaba018 able to sync the changes 2018-09-21 01:54:29 -07:00
Chris Lu
25fb6f9a46 fix compilation 2018-09-17 02:23:21 -07:00
Chris Lu
779641e9d4 adjust replicated entry name 2018-09-17 01:37:24 -07:00
Chris Lu
788acdf527 add WIP filer.replicate 2018-09-17 00:27:56 -07:00