Commit Graph

143 Commits

Author SHA1 Message Date
Chris Lu
57ab99d13e fix: generate topology uuid uniformly in single-master mode (#8405)
* fix: ensure topology uuid is generated in single master setups

* ensureTopologyId adds a Hashicorp-aware implementation

* simplify
2026-02-22 23:45:48 -08:00
Chris Lu
3300874cb5 filer: add default log purging to master maintenance scripts (#8359)
* filer: add default log purging to master maintenance scripts

* filer: fix default maintenance scripts to include full set of tasks

* filer: refactor maintenance scripts to avoid duplication
2026-02-16 16:58:15 -08:00
Chris Lu
cb9e21cdc5 Normalize hashicorp raft peer ids (#8253)
* Normalize raft voter ids

* 4.11

* Update raft_hashicorp.go
2026-02-09 07:46:34 -08:00
Chris Lu
753e1db096 Prevent split-brain: Persistent ClusterID and Join Validation (#8022)
* Prevent split-brain: Persistent ClusterID and Join Validation

- Persist ClusterId in Raft store to survive restarts.
- Validate ClusterId on Raft command application (piggybacked on MaxVolumeId).
- Prevent masters with conflicting ClusterIds from joining/operating together.
- Update Telemetry to report the persistent ClusterId.

* Refine ClusterID validation based on feedback

- Improved error message in cluster_commands.go.
- Added ClusterId mismatch check in RaftServer.Recovery.

* Handle Raft errors and support Hashicorp Raft for ClusterId

- Check for errors when persisting ClusterId in legacy Raft.
- Implement ClusterId generation and persistence for Hashicorp Raft leader changes.
- Ensure consistent error logging.

* Refactor ClusterId validation

- Centralize ClusterId mismatch check in Topology.SetClusterId.
- Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId.

* Fix goroutine leak and add timeout

- Handle channel closure in Hashicorp Raft leader listener.
- Add timeout to Raft Apply call to prevent blocking.

* Fix deadlock in legacy Raft listener

- Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock).

* Rename ClusterId to SystemId

- Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry).
- Regenerated telemetry.pb.go with new field.

* Rename SystemId to TopologyId

- Rename to SystemId was intermediate step.
- Final name is TopologyId for the persistent cluster identifier.
- Updated protobuf, topology, raft server, master server, and telemetry.

* Optimize Hashicorp Raft listener

- Integrated TopologyId generation into existing monitorLeaderLoop.
- Removed extra goroutine in master_server.go.

* Fix optimistic TopologyId update

- Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go.
- State is now solely updated via the Raft state machine Apply/Restore methods after consensus.

* Add explicit log for recovered TopologyId

- Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup.

* Add Raft barrier to prevent TopologyId race condition

- Implement ensureTopologyId helper method
- Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId
- Ensures persisted TopologyId is recovered before generating new one
- Prevents race where generation happens during log replay

* Serialize TopologyId generation with mutex

- Add topologyIdGenLock mutex to MasterServer struct
- Wrap ensureTopologyId method with lock to prevent concurrent generation
- Fixes race where event listener and manual leadership check both generate IDs
- Second caller waits for first to complete and sees the generated ID

* Add TopologyId recovery logging to Apply method

- Change log level from V(1) to V(0) for visibility
- Log 'Recovered TopologyId' when applying from Raft log
- Ensures recovery is visible whether from snapshot or log replay
- Matches Recovery() method logging for consistency

* Fix Raft barrier timing issue

- Add 100ms delay after barrier command to ensure log application completes
- Add debug logging to track barrier execution and TopologyId state
- Return early if barrier command fails
- Prevents TopologyId generation before old logs are fully applied

* ensure leader

* address comments

* address comments

* redundant

* clean up

* double check

* refactoring

* comment
2026-01-18 14:02:34 -08:00
Chris Lu
07dc552e1c master: Fix raft url (#7255)
* fix signature

* fix url scheme
2025-09-18 14:46:53 -07:00
Dmitriy Pavlov
cd78e653e1 add disable volume_growth flag (#7196) 2025-09-04 05:39:56 -07:00
Chris Lu
e446234e9c remove spoof-able request header (#7103)
* remove spoof-able request header

https://github.com/seaweedfs/seaweedfs/issues/7094#issuecomment-3158320497

* Update weed/security/guard.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-06 10:08:30 -07:00
Chris Lu
0703308270 remote address parsing should handle special cases (#7101)
* remote address parsing should handle special cases

* handling ipv6

* simplify

* Update weed/security/guard.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update weed/security/guard.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* x-real-ip

* Update guard.go

* fixes

 Hostname Whitelisting: Fully restored - supports localhost, example.com, etc.
 IP Whitelisting: Still works - supports exact IPs and CIDR ranges
 Header Support: Consistent handling of X-Forwarded-For, X-Real-IP

* simplify

* Update weed/security/guard.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/security/guard.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update guard.go

* adjust function signature

* Update weed/security/guard.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* indention

* skip empty host

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-06 01:03:00 -07:00
chrislu
798f797158 use float for sleep seconds
fix https://github.com/seaweedfs/seaweedfs/pull/6795
2025-07-06 14:16:41 -07:00
chrislu
1733d0ce68 remove features and deployments fields 2025-06-28 20:03:06 -07:00
Chris Lu
a1aab8a083 add telemetry (#6926)
* add telemetry

* fix go mod

* add default telemetry server url

* Update README.md

* replace with broker count instead of s3 count

* Update telemetry.pb.go

* github action to deploy
2025-06-28 14:11:55 -07:00
Aleksey Kosov
5182d46e22 Added middleware for processing request_id grpc and http requests (#6805) 2025-05-21 07:57:39 -07:00
Lisandro Pin
fc4df944a0 Remove rate limit semaphore on master's leader selection logic. (#6494)
This was introduced by 054374c7 (2024-03-12) and serves no practical purpose,
yet it caps the maximum QPS master servers can handle.
2025-01-30 13:08:36 -08:00
Konstantin Lebedev
b65eb2ec45 [security] reload whiteList on http seerver (#6302)
* reload whiteList

* white_list add to scaffold
2024-12-02 10:38:10 -08:00
Konstantin Lebedev
fec88e64eb [master] update LastLeaderChangeTime for hashicorp raft (#6292) 2024-11-26 08:02:45 -08:00
chrislu
ccf1795e6f wait a bit before getting the next volume id if the leader is recently elected 2024-11-23 19:58:45 -08:00
chrislu
6564ceda91 skip resource heavy commands from running on master nodes 2024-09-29 10:51:17 -07:00
chrislu
4463296811 add parallel vacuuming 2024-08-21 22:53:54 -07:00
Riccardo Bertossa
6fe8639504 add http endpoint to get the size of a collection (#5910) 2024-08-19 07:44:45 -07:00
wyang
4b1f539ab8 fix allocate reduplicated volumeId to different volume (#5811)
* fix allocate reduplicated volumeId to different volume

* only check barrier when read

---------

Co-authored-by: Yang Wang <yangwang@weride.ai>
2024-07-26 21:48:36 -07:00
vadimartynov
86d92a42b4 Added tls for http clients (#5766)
* Added global http client

* Added Do func for global http client

* Changed the code to use the global http client

* Fix http client in volume uploader

* Fixed pkg name

* Fixed http util funcs

* Fixed http client for bench_filer_upload

* Fixed http client for stress_filer_upload

* Fixed http client for filer_server_handlers_proxy

* Fixed http client for command_fs_merge_volumes

* Fixed http client for command_fs_merge_volumes and command_volume_fsck

* Fixed http client for s3api_server

* Added init global client for main funcs

* Rename global_client to client

* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold

* Reduce the visibility of some functions in the util/http/client pkg

* Added the loadSecurityConfig function

* Use util.LoadSecurityConfiguration() in NewHttpClient func
2024-07-16 23:14:09 -07:00
Konstantin Lebedev
67edf1d014 [master] Do Automatic Volume Grow in background (#5781)
* Do Automatic Volume Grow in backgound

* pass lastGrowCount to master

* fix build

* fix count to uint64
2024-07-16 08:03:40 -07:00
vadimartynov
8aae82dd71 Added context for the MasterClient's methods to avoid endless loops (#5628)
* Added context for the MasterClient's methods to avoid endless loops

* Returned WithClient function. Added WithClientCustomGetMaster function

* Hid unused ctx arguments

* Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions

* Changed the context termination check in the tryConnectToMaster function

* Added a child context to the tryConnectToMaster function

* Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark
2024-06-14 11:40:34 -07:00
shenxingwuying
ee25ada732 reduce ambiguity about use memory_sequencer (#5555) 2024-04-29 21:51:00 -07:00
chrislu
55976ae04a avoid repeated calls to heavy-weighted viper 2024-04-18 09:09:45 -07:00
chrislu
d9490c5e1f rename 2024-04-18 08:47:45 -07:00
Nico D'Cotta
796b7508f3 Implement SRV lookups for filer (#4767) 2023-08-24 07:08:56 -07:00
chrislu
a315490f7d proxy to master uses http address
fix https://github.com/seaweedfs/seaweedfs/issues/4607
2023-07-04 11:45:21 -07:00
chrislu
adb90bd252 avoid lower casing the command
fix https://github.com/seaweedfs/seaweedfs/pull/4321
2023-03-19 21:20:46 -07:00
Konstantin Lebedev
b9933d5589 master server graceful stop (#3797) 2022-10-06 09:30:30 -07:00
Konstantin Lebedev
e90ab4ac60 avoid race conditions for OnPeerUpdate (#3525)
https://github.com/seaweedfs/seaweedfs/issues/3524
2022-08-26 10:18:49 -07:00
Patrick Schmidt
7b424a54dc Add raft server access mutex to avoid races (#3503) 2022-08-24 09:49:05 -07:00
chrislu
10414fd81c ping timeout at 15 seconds
this 72 minute timeout setting seems unreasonably long

15 seconds is around the time when a new raft leader should be elected.
2022-08-23 23:28:16 -07:00
askeipx
2e78a522ab remove old raft servers if they don't answer to pings for too long (#3398)
* remove old raft servers if they don't answer to pings for too long

add ping durations as options

rename ping fields

fix some todos

get masters through masterclient

raft remove server from leader

use raft servers to ping them

CheckMastersAlive for hashicorp raft only

* prepare blocking ping

* pass waitForReady as param

* pass waitForReady through all functions

* waitForReady works

* refactor

* remove unneeded params

* rollback unneeded changes

* fix
2022-08-23 23:18:21 -07:00
Konstantin Lebedev
4d4cd0948d avoid infinite loop WaitUntilConnected() (#3431)
https://github.com/seaweedfs/seaweedfs/issues/3421
2022-08-11 15:03:26 -07:00
Chris Lu
b59bc607bf Merge pull request #3338 from kmlebedev/issues/3083
rollback over onPeerUpdate implementation of automatic clean-up of failed servers in favor of synchronous ping
2022-08-01 08:23:10 -07:00
Konstantin Lebedev
a98f6d66a3 rollback over onPeerupdate implementation of automatic clean-up of failed servers in favor of synchronous ping 2022-08-01 12:51:41 +05:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
bb01b68fa0 refactor 2022-07-28 23:24:38 -07:00
chrislu
68065128b8 add dc and rack 2022-07-28 23:22:51 -07:00
chrislu
3828b8ce87 "github.com/chrislusf/raft" => "github.com/seaweedfs/raft" 2022-07-27 12:12:40 -07:00
Konstantin Lebedev
c88ea31f62 fix RUnlock of unlocked RWMutex 2022-07-26 12:57:07 +05:00
Konstantin Lebedev
3c42814b58 avoid deadlock 2022-07-21 17:15:10 +05:00
Konstantin Lebedev
93ca87b7cb use safe onPeerUpdateDoneCns 2022-07-21 15:51:14 +05:00
Konstantin Lebedev
7875470e74 onPeerUpdateGoroutineCount use int32 2022-07-20 18:40:50 +05:00
Konstantin Lebedev
6c390851e7 fix design 2022-07-20 18:08:12 +05:00
Konstantin Lebedev
f6a966b4fc add waiting log message 2022-07-20 00:31:57 +05:00
Konstantin Lebedev
6cfbfb0849 check for ping before deleting raft server
https://github.com/chrislusf/seaweedfs/issues/3083
2022-07-20 00:04:12 +05:00
Konstantin Lebedev
f419d5643a fix typo
add remove logs
2022-07-19 11:50:52 +05:00
chrislu
6adc42147f fresh filer store bootstrap from the oldest peer 2022-05-30 21:27:48 -07:00