Commit Graph

79 Commits

Author SHA1 Message Date
Chris Lu
753e1db096 Prevent split-brain: Persistent ClusterID and Join Validation (#8022)
* Prevent split-brain: Persistent ClusterID and Join Validation

- Persist ClusterId in Raft store to survive restarts.
- Validate ClusterId on Raft command application (piggybacked on MaxVolumeId).
- Prevent masters with conflicting ClusterIds from joining/operating together.
- Update Telemetry to report the persistent ClusterId.

* Refine ClusterID validation based on feedback

- Improved error message in cluster_commands.go.
- Added ClusterId mismatch check in RaftServer.Recovery.

* Handle Raft errors and support Hashicorp Raft for ClusterId

- Check for errors when persisting ClusterId in legacy Raft.
- Implement ClusterId generation and persistence for Hashicorp Raft leader changes.
- Ensure consistent error logging.

* Refactor ClusterId validation

- Centralize ClusterId mismatch check in Topology.SetClusterId.
- Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId.

* Fix goroutine leak and add timeout

- Handle channel closure in Hashicorp Raft leader listener.
- Add timeout to Raft Apply call to prevent blocking.

* Fix deadlock in legacy Raft listener

- Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock).

* Rename ClusterId to SystemId

- Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry).
- Regenerated telemetry.pb.go with new field.

* Rename SystemId to TopologyId

- Rename to SystemId was intermediate step.
- Final name is TopologyId for the persistent cluster identifier.
- Updated protobuf, topology, raft server, master server, and telemetry.

* Optimize Hashicorp Raft listener

- Integrated TopologyId generation into existing monitorLeaderLoop.
- Removed extra goroutine in master_server.go.

* Fix optimistic TopologyId update

- Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go.
- State is now solely updated via the Raft state machine Apply/Restore methods after consensus.

* Add explicit log for recovered TopologyId

- Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup.

* Add Raft barrier to prevent TopologyId race condition

- Implement ensureTopologyId helper method
- Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId
- Ensures persisted TopologyId is recovered before generating new one
- Prevents race where generation happens during log replay

* Serialize TopologyId generation with mutex

- Add topologyIdGenLock mutex to MasterServer struct
- Wrap ensureTopologyId method with lock to prevent concurrent generation
- Fixes race where event listener and manual leadership check both generate IDs
- Second caller waits for first to complete and sees the generated ID

* Add TopologyId recovery logging to Apply method

- Change log level from V(1) to V(0) for visibility
- Log 'Recovered TopologyId' when applying from Raft log
- Ensures recovery is visible whether from snapshot or log replay
- Matches Recovery() method logging for consistency

* Fix Raft barrier timing issue

- Add 100ms delay after barrier command to ensure log application completes
- Add debug logging to track barrier execution and TopologyId state
- Return early if barrier command fails
- Prevents TopologyId generation before old logs are fully applied

* ensure leader

* address comments

* address comments

* redundant

* clean up

* double check

* refactoring

* comment
2026-01-18 14:02:34 -08:00
Chris Lu
7acebf11ea Master: volume assignment concurrency (#7159)
* volume assginment concurrency

* accurate tests

* ensure uniqness

* reserve atomically

* address comments

* atomic

* ReserveOneVolumeForReservation

* duplicated

* Update weed/topology/node.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/topology/node.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* atomic counter

* dedup

* select the appropriate functions based on the useReservations flag

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-23 21:02:30 -07:00
Lisandro Pin
cea34dc21a Fix implementation of master_pb.CollectionList RPC call (#6715) 2025-04-16 14:28:58 -07:00
Konstantin Lebedev
e2e97db917 [master] avoid timeout when assigning for main request with filter by DC or rack (#6291)
* avoid timeout when assigning for main request with filter by DC or rack

https://github.com/seaweedfs/seaweedfs/issues/6290

* use constant NoWritableVolumes
2024-11-26 08:33:31 -08:00
Konstantin Lebedev
8836fa19b6 use ShouldGrowVolumesByDcAndRack (#6280) 2024-11-25 09:30:37 -08:00
chrislu
ccf1795e6f wait a bit before getting the next volume id if the leader is recently elected 2024-11-23 19:58:45 -08:00
Konstantin Lebedev
67a252ee8a [master] refactor func ShouldGrowVolumes (#5884) 2024-09-04 08:16:44 -07:00
chrislu
a4b25a642d math/rand => math/rand/v2 2024-08-29 09:52:21 -07:00
Konstantin Lebedev
b2ffcdaab2 [master] do sync grow request only if absolutely necessary (#5821)
* do sync grow request only if absolutely necessary
https://github.com/seaweedfs/seaweedfs/pull/5819

* remove check VolumeGrowStrategy Threshold on PickForWrite

* fix fmt.Errorf
2024-07-30 13:21:35 -07:00
wyang
4b1f539ab8 fix allocate reduplicated volumeId to different volume (#5811)
* fix allocate reduplicated volumeId to different volume

* only check barrier when read

---------

Co-authored-by: Yang Wang <yangwang@weride.ai>
2024-07-26 21:48:36 -07:00
Konstantin Lebedev
04f4b10884 fix: avoid timeout if datacenter does not exist in topology (#5772)
* fix: avoid timeout if datacenter does not exist in topology

* fix: error msg

* fix: rm dublicate check

* fix: compare

* revert minor change
2024-07-12 11:19:08 -07:00
Konstantin Lebedev
0f8e76bbd6 fix: clean metric MasterReplicaPlacementMismatch for unregister volume (#5239) 2024-01-25 00:23:24 -08:00
chrislu
bebbc9fe44 create volume grow request if the selected volume is close to full 2023-12-27 11:45:44 -08:00
chrislu
c6b1dc7058 remove unused code 2023-12-24 11:11:41 -08:00
Konstantin Lebedev
5ee04d20fa Healthz check for deadlocks (#4558) 2023-06-09 09:42:48 -07:00
Stewart Miles
264be0d2d4 Retry until a leader is selected. (#4318)
Fixes regression introduced in
https://github.com/seaweedfs/seaweedfs/pull/4313

Related to #4307
2023-03-16 20:50:38 -07:00
Stewart Miles
57ab1f8516 Use exponential backoff to query leader. (#4313)
`topology.Leader()` was using a backoff that typically
resulted in at least a 5s delay when initially starting
a master and raft server. This changes the backoff
algorithm to use exponential backoff starting with 100ms
and waiting up to 20s for leader selection.

Related to #4307
2023-03-15 17:49:46 -07:00
zemul
0bf56298d5 fix chunk.ModifiedTsNs (#4264)
* fix

* fix mtime s > ns

---------

Co-authored-by: zemul <zhouzemiao@ihuman.com>
2023-03-02 08:24:36 -08:00
Guo Lei
d8cfa1552b support enable/disable vacuum (#4087)
* stop vacuum

* suspend/resume vacuum

* remove unused code

* rename

* rename param
2022-12-28 01:36:44 -08:00
chrislu
3cb914f7e1 avoid dead lock 2022-09-10 11:26:19 -07:00
chrislu
576c113c59 replace PR https://github.com/seaweedfs/seaweedfs/pull/3621
replace https://github.com/seaweedfs/seaweedfs/pull/3621
2022-09-10 11:22:16 -07:00
Patrick Schmidt
7b424a54dc Add raft server access mutex to avoid races (#3503) 2022-08-24 09:49:05 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
3828b8ce87 "github.com/chrislusf/raft" => "github.com/seaweedfs/raft" 2022-07-27 12:12:40 -07:00
guol-fnst
b12944f9c6 fix naming convention
notify volume server of duplicate directoris
improve searching efficiency
2022-05-17 15:41:49 +08:00
guol-fnst
de6aa9cce8 avoid duplicated volume directory 2022-05-16 19:33:51 +08:00
chrislu
00c1dfec4f go fmt 2022-05-01 23:16:29 -07:00
Chris Lu
a87f57e47c Merge pull request #2868 from kmlebedev/hashicorp_raft
hashicorp raft
2022-04-10 23:00:05 -07:00
shibinbin
c20e1edd99 fix: master lose some volumes 2022-04-07 15:18:28 +08:00
Konstantin Lebedev
14dd971890 hashicorp raft with state machine 2022-04-04 17:51:51 +05:00
chrislu
5eacff9d4f log message adds server name
address https://github.com/chrislusf/seaweedfs/issues/2514#issuecomment-995925733
2021-12-16 10:46:26 -08:00
Chris Lu
e5fc35ed0c change server address from string to a type 2021-09-12 22:47:52 -07:00
Chris Lu
7a13816e94 refactor 2021-09-05 23:17:15 -07:00
Chris Lu
d474ce6fe3 master: avoid repeated leader redirection
fix https://github.com/chrislusf/seaweedfs/issues/2146
2021-06-21 22:56:07 -07:00
Chris Lu
d2d36a3f9d master: avoid creating too many volumes
fix https://github.com/chrislusf/seaweedfs/issues/2062
2021-05-11 10:05:31 -07:00
qieqieplus
c4d32f6937 ahead of time volume assignment 2021-05-06 18:55:44 +08:00
Chris Lu
f8446b42ab this can compile now!!! 2021-02-16 02:47:02 -08:00
Chris Lu
4bd8a692d8 disk type can be generic tags 2021-02-13 13:50:14 -08:00
Chris Lu
0d2ec832e2 rename from volumeType to diskType 2020-12-13 11:59:32 -08:00
Chris Lu
715b199eeb fix tests 2020-12-13 04:14:50 -08:00
Chris Lu
d156c74ec0 volume server set volume type and heartbeat to the master 2020-12-13 03:11:24 -08:00
Chris Lu
e9cd798bd3 adding volume type 2020-12-13 00:58:58 -08:00
Chris Lu
c7ebadc25d avoid possible concurrent access inside ensureCorrectWritables() 2020-11-22 17:15:59 -08:00
Chris Lu
720b1d9b88 adding locking to avoid nil VolumeLocationList
fix panic: runtime error: invalid memory address or nil pointer dereference
Oct 22 00:53:44 bedb-master1 weed[8055]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x17658da]
Oct 22 00:53:44 bedb-master1 weed[8055]: goroutine 310 [running]:
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*VolumeLocationList).Length(...)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_location_list.go:35
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*VolumeLayout).enoughCopies(...)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_layout.go:376
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*VolumeLayout).ensureCorrectWritables(0xc000111d50, 0xc000b55438)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_layout.go:202 +0x5a
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*Topology).SyncDataNodeRegistration(0xc00042ac60, 0xc001454d30, 0x1, 0x1, 0xc0005fc000, 0xc00135de40, 0x4, 0xc00135de50, 0x10, 0x10d, ...)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/topology.go:224 +0x616
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/server.(*MasterServer).SendHeartbeat(0xc000162700, 0x23b97c0, 0xc000ae2c90, 0x0, 0x0)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/server/master_grpc_server.go:106 +0x325
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/pb/master_pb._Seaweed_SendHeartbeat_Handler(0x1f8e7c0, 0xc000162700, 0x23b0a60, 0xc00024b440, 0x3172c38, 0xc000ab7100)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/pb/master_pb/master.pb.go:4250 +0xad
Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(*Server).processStreamingRPC(0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100, 0xc0001fea80, 0x311fec0, 0x0, 0x0, 0x0)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1329 +0xcd8
Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(*Server).handleStream(0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100, 0x0)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1409 +0xc5c
Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0001ce8b0, 0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xa5
Oct 22 00:53:44 bedb-master1 weed[8055]: created by google.golang.org/grpc.(*Server).serveStreams.func1
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xa5
Oct 22 00:53:44 bedb-master1 systemd[1]: weedmaster.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 22 00:53:44 bedb-master1 systemd[1]: weedmaster.service: Failed with result 'exit-code'.
2020-10-21 23:15:48 -07:00
Chris Lu
152a6cbc2b minor adjustments 2020-08-10 20:42:27 -07:00
cheng.li01
25fbff5d52 fix bug: two same volumeId in different collections
1, there will be two leader when master server startup in a few seconds
2, raft server will get a leader even there is only one master, so there is no need to do hard code to set the server to be leader
2020-08-10 16:37:47 +08:00
Evgenii Kozlov
0e0db70f55 Set volumes ReadOnly if low free disk space 2020-06-05 18:18:15 +03:00
James Hartig
eae3f27c80 Added treat_replication_as_minimums master toml option 2020-04-01 19:08:48 -04:00
Chris Lu
d1ab16b6e3 treat it as a single node cluster if empty raft server name
possible fix for https://github.com/chrislusf/seaweedfs/issues/1118
2020-01-10 00:37:44 -08:00
Chris Lu
09ca936c78 shell: add ec.decode command 2019-12-23 12:48:20 -08:00