Commit Graph

80 Commits

Author SHA1 Message Date
Chris Lu
b08bb8237c Fix master leader election startup issue (#8340)
* Fix master leader election startup issue

Fixes #error-log-leader-not-selected-yet

* Fix master leader election startup issue

This change improves server address comparison using the 'Equals' method and handles recursion in topology leader lookup, resolving the 'leader not selected yet' error during master startup.

* Merge user improvements: use MaybeLeader for non-blocking checks

* not useful test

* Address code review: optimize Equals, fix deadlock in IsLeader, safe access in Leader
2026-02-13 15:39:39 -08:00
Chris Lu
753e1db096 Prevent split-brain: Persistent ClusterID and Join Validation (#8022)
* Prevent split-brain: Persistent ClusterID and Join Validation

- Persist ClusterId in Raft store to survive restarts.
- Validate ClusterId on Raft command application (piggybacked on MaxVolumeId).
- Prevent masters with conflicting ClusterIds from joining/operating together.
- Update Telemetry to report the persistent ClusterId.

* Refine ClusterID validation based on feedback

- Improved error message in cluster_commands.go.
- Added ClusterId mismatch check in RaftServer.Recovery.

* Handle Raft errors and support Hashicorp Raft for ClusterId

- Check for errors when persisting ClusterId in legacy Raft.
- Implement ClusterId generation and persistence for Hashicorp Raft leader changes.
- Ensure consistent error logging.

* Refactor ClusterId validation

- Centralize ClusterId mismatch check in Topology.SetClusterId.
- Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId.

* Fix goroutine leak and add timeout

- Handle channel closure in Hashicorp Raft leader listener.
- Add timeout to Raft Apply call to prevent blocking.

* Fix deadlock in legacy Raft listener

- Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock).

* Rename ClusterId to SystemId

- Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry).
- Regenerated telemetry.pb.go with new field.

* Rename SystemId to TopologyId

- Rename to SystemId was intermediate step.
- Final name is TopologyId for the persistent cluster identifier.
- Updated protobuf, topology, raft server, master server, and telemetry.

* Optimize Hashicorp Raft listener

- Integrated TopologyId generation into existing monitorLeaderLoop.
- Removed extra goroutine in master_server.go.

* Fix optimistic TopologyId update

- Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go.
- State is now solely updated via the Raft state machine Apply/Restore methods after consensus.

* Add explicit log for recovered TopologyId

- Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup.

* Add Raft barrier to prevent TopologyId race condition

- Implement ensureTopologyId helper method
- Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId
- Ensures persisted TopologyId is recovered before generating new one
- Prevents race where generation happens during log replay

* Serialize TopologyId generation with mutex

- Add topologyIdGenLock mutex to MasterServer struct
- Wrap ensureTopologyId method with lock to prevent concurrent generation
- Fixes race where event listener and manual leadership check both generate IDs
- Second caller waits for first to complete and sees the generated ID

* Add TopologyId recovery logging to Apply method

- Change log level from V(1) to V(0) for visibility
- Log 'Recovered TopologyId' when applying from Raft log
- Ensures recovery is visible whether from snapshot or log replay
- Matches Recovery() method logging for consistency

* Fix Raft barrier timing issue

- Add 100ms delay after barrier command to ensure log application completes
- Add debug logging to track barrier execution and TopologyId state
- Return early if barrier command fails
- Prevents TopologyId generation before old logs are fully applied

* ensure leader

* address comments

* address comments

* redundant

* clean up

* double check

* refactoring

* comment
2026-01-18 14:02:34 -08:00
Chris Lu
7acebf11ea Master: volume assignment concurrency (#7159)
* volume assginment concurrency

* accurate tests

* ensure uniqness

* reserve atomically

* address comments

* atomic

* ReserveOneVolumeForReservation

* duplicated

* Update weed/topology/node.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/topology/node.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* atomic counter

* dedup

* select the appropriate functions based on the useReservations flag

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-23 21:02:30 -07:00
Lisandro Pin
cea34dc21a Fix implementation of master_pb.CollectionList RPC call (#6715) 2025-04-16 14:28:58 -07:00
Konstantin Lebedev
e2e97db917 [master] avoid timeout when assigning for main request with filter by DC or rack (#6291)
* avoid timeout when assigning for main request with filter by DC or rack

https://github.com/seaweedfs/seaweedfs/issues/6290

* use constant NoWritableVolumes
2024-11-26 08:33:31 -08:00
Konstantin Lebedev
8836fa19b6 use ShouldGrowVolumesByDcAndRack (#6280) 2024-11-25 09:30:37 -08:00
chrislu
ccf1795e6f wait a bit before getting the next volume id if the leader is recently elected 2024-11-23 19:58:45 -08:00
Konstantin Lebedev
67a252ee8a [master] refactor func ShouldGrowVolumes (#5884) 2024-09-04 08:16:44 -07:00
chrislu
a4b25a642d math/rand => math/rand/v2 2024-08-29 09:52:21 -07:00
Konstantin Lebedev
b2ffcdaab2 [master] do sync grow request only if absolutely necessary (#5821)
* do sync grow request only if absolutely necessary
https://github.com/seaweedfs/seaweedfs/pull/5819

* remove check VolumeGrowStrategy Threshold on PickForWrite

* fix fmt.Errorf
2024-07-30 13:21:35 -07:00
wyang
4b1f539ab8 fix allocate reduplicated volumeId to different volume (#5811)
* fix allocate reduplicated volumeId to different volume

* only check barrier when read

---------

Co-authored-by: Yang Wang <yangwang@weride.ai>
2024-07-26 21:48:36 -07:00
Konstantin Lebedev
04f4b10884 fix: avoid timeout if datacenter does not exist in topology (#5772)
* fix: avoid timeout if datacenter does not exist in topology

* fix: error msg

* fix: rm dublicate check

* fix: compare

* revert minor change
2024-07-12 11:19:08 -07:00
Konstantin Lebedev
0f8e76bbd6 fix: clean metric MasterReplicaPlacementMismatch for unregister volume (#5239) 2024-01-25 00:23:24 -08:00
chrislu
bebbc9fe44 create volume grow request if the selected volume is close to full 2023-12-27 11:45:44 -08:00
chrislu
c6b1dc7058 remove unused code 2023-12-24 11:11:41 -08:00
Konstantin Lebedev
5ee04d20fa Healthz check for deadlocks (#4558) 2023-06-09 09:42:48 -07:00
Stewart Miles
264be0d2d4 Retry until a leader is selected. (#4318)
Fixes regression introduced in
https://github.com/seaweedfs/seaweedfs/pull/4313

Related to #4307
2023-03-16 20:50:38 -07:00
Stewart Miles
57ab1f8516 Use exponential backoff to query leader. (#4313)
`topology.Leader()` was using a backoff that typically
resulted in at least a 5s delay when initially starting
a master and raft server. This changes the backoff
algorithm to use exponential backoff starting with 100ms
and waiting up to 20s for leader selection.

Related to #4307
2023-03-15 17:49:46 -07:00
zemul
0bf56298d5 fix chunk.ModifiedTsNs (#4264)
* fix

* fix mtime s > ns

---------

Co-authored-by: zemul <zhouzemiao@ihuman.com>
2023-03-02 08:24:36 -08:00
Guo Lei
d8cfa1552b support enable/disable vacuum (#4087)
* stop vacuum

* suspend/resume vacuum

* remove unused code

* rename

* rename param
2022-12-28 01:36:44 -08:00
chrislu
3cb914f7e1 avoid dead lock 2022-09-10 11:26:19 -07:00
chrislu
576c113c59 replace PR https://github.com/seaweedfs/seaweedfs/pull/3621
replace https://github.com/seaweedfs/seaweedfs/pull/3621
2022-09-10 11:22:16 -07:00
Patrick Schmidt
7b424a54dc Add raft server access mutex to avoid races (#3503) 2022-08-24 09:49:05 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
3828b8ce87 "github.com/chrislusf/raft" => "github.com/seaweedfs/raft" 2022-07-27 12:12:40 -07:00
guol-fnst
b12944f9c6 fix naming convention
notify volume server of duplicate directoris
improve searching efficiency
2022-05-17 15:41:49 +08:00
guol-fnst
de6aa9cce8 avoid duplicated volume directory 2022-05-16 19:33:51 +08:00
chrislu
00c1dfec4f go fmt 2022-05-01 23:16:29 -07:00
Chris Lu
a87f57e47c Merge pull request #2868 from kmlebedev/hashicorp_raft
hashicorp raft
2022-04-10 23:00:05 -07:00
shibinbin
c20e1edd99 fix: master lose some volumes 2022-04-07 15:18:28 +08:00
Konstantin Lebedev
14dd971890 hashicorp raft with state machine 2022-04-04 17:51:51 +05:00
chrislu
5eacff9d4f log message adds server name
address https://github.com/chrislusf/seaweedfs/issues/2514#issuecomment-995925733
2021-12-16 10:46:26 -08:00
Chris Lu
e5fc35ed0c change server address from string to a type 2021-09-12 22:47:52 -07:00
Chris Lu
7a13816e94 refactor 2021-09-05 23:17:15 -07:00
Chris Lu
d474ce6fe3 master: avoid repeated leader redirection
fix https://github.com/chrislusf/seaweedfs/issues/2146
2021-06-21 22:56:07 -07:00
Chris Lu
d2d36a3f9d master: avoid creating too many volumes
fix https://github.com/chrislusf/seaweedfs/issues/2062
2021-05-11 10:05:31 -07:00
qieqieplus
c4d32f6937 ahead of time volume assignment 2021-05-06 18:55:44 +08:00
Chris Lu
f8446b42ab this can compile now!!! 2021-02-16 02:47:02 -08:00
Chris Lu
4bd8a692d8 disk type can be generic tags 2021-02-13 13:50:14 -08:00
Chris Lu
0d2ec832e2 rename from volumeType to diskType 2020-12-13 11:59:32 -08:00
Chris Lu
715b199eeb fix tests 2020-12-13 04:14:50 -08:00
Chris Lu
d156c74ec0 volume server set volume type and heartbeat to the master 2020-12-13 03:11:24 -08:00
Chris Lu
e9cd798bd3 adding volume type 2020-12-13 00:58:58 -08:00
Chris Lu
c7ebadc25d avoid possible concurrent access inside ensureCorrectWritables() 2020-11-22 17:15:59 -08:00
Chris Lu
720b1d9b88 adding locking to avoid nil VolumeLocationList
fix panic: runtime error: invalid memory address or nil pointer dereference
Oct 22 00:53:44 bedb-master1 weed[8055]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x17658da]
Oct 22 00:53:44 bedb-master1 weed[8055]: goroutine 310 [running]:
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*VolumeLocationList).Length(...)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_location_list.go:35
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*VolumeLayout).enoughCopies(...)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_layout.go:376
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*VolumeLayout).ensureCorrectWritables(0xc000111d50, 0xc000b55438)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_layout.go:202 +0x5a
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(*Topology).SyncDataNodeRegistration(0xc00042ac60, 0xc001454d30, 0x1, 0x1, 0xc0005fc000, 0xc00135de40, 0x4, 0xc00135de50, 0x10, 0x10d, ...)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/topology.go:224 +0x616
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/server.(*MasterServer).SendHeartbeat(0xc000162700, 0x23b97c0, 0xc000ae2c90, 0x0, 0x0)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/server/master_grpc_server.go:106 +0x325
Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/pb/master_pb._Seaweed_SendHeartbeat_Handler(0x1f8e7c0, 0xc000162700, 0x23b0a60, 0xc00024b440, 0x3172c38, 0xc000ab7100)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/pb/master_pb/master.pb.go:4250 +0xad
Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(*Server).processStreamingRPC(0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100, 0xc0001fea80, 0x311fec0, 0x0, 0x0, 0x0)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1329 +0xcd8
Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(*Server).handleStream(0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100, 0x0)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1409 +0xc5c
Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0001ce8b0, 0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100)
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xa5
Oct 22 00:53:44 bedb-master1 weed[8055]: created by google.golang.org/grpc.(*Server).serveStreams.func1
Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xa5
Oct 22 00:53:44 bedb-master1 systemd[1]: weedmaster.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 22 00:53:44 bedb-master1 systemd[1]: weedmaster.service: Failed with result 'exit-code'.
2020-10-21 23:15:48 -07:00
Chris Lu
152a6cbc2b minor adjustments 2020-08-10 20:42:27 -07:00
cheng.li01
25fbff5d52 fix bug: two same volumeId in different collections
1, there will be two leader when master server startup in a few seconds
2, raft server will get a leader even there is only one master, so there is no need to do hard code to set the server to be leader
2020-08-10 16:37:47 +08:00
Evgenii Kozlov
0e0db70f55 Set volumes ReadOnly if low free disk space 2020-06-05 18:18:15 +03:00
James Hartig
eae3f27c80 Added treat_replication_as_minimums master toml option 2020-04-01 19:08:48 -04:00
Chris Lu
d1ab16b6e3 treat it as a single node cluster if empty raft server name
possible fix for https://github.com/chrislusf/seaweedfs/issues/1118
2020-01-10 00:37:44 -08:00