seaweedFS

Author	SHA1	Message	Date
Chris Lu	4efe0acaf5	fix(master): fast resume state and default resumeState to true (#8925 ) * fix(master): fast resume state and default resumeState to true When resumeState is enabled in single-master mode, the raft server had existing log entries so the self-join path couldn't promote to leader. The server waited the full election timeout (10-20s) before self-electing. Fix by temporarily setting election timeout to 1ms before Start() when in single-master + resumeState mode with existing log, then restoring the original timeout after leader election. This makes resume near-instant. Also change the default for resumeState from false to true across all CLI commands (master, mini, server) so state is preserved by default. * fix(master): prevent fastResume goroutine from hanging forever Use defer to guarantee election timeout is always restored, and bound the polling loop with a timeout so it cannot spin indefinitely if leader election never succeeds. * fix(master): use ticker instead of time.After in fastResume polling loop	2026-04-04 14:15:56 -07:00
Chris Lu	15f4a97029	fix: improve raft leader election reliability and failover speed (#8692 ) * fix: clear raft vote state file on non-resume startup The seaweedfs/raft library v1.1.7 added a persistent `state` file for currentTerm and votedFor. When RaftResumeState=false (the default), the log, conf, and snapshot directories are cleared but this state file was not. On repeated restarts, different masters accumulate divergent terms, causing AppendEntries rejections and preventing leader election. Fixes #8690 * fix: recover TopologyId from snapshot before clearing raft state When RaftResumeState=false clears log/conf/snapshot, the TopologyId (used for license validation) was lost. Now extract it from the latest snapshot before cleanup and restore it on the topology. Both seaweedfs/raft and hashicorp/raft paths are handled, with a shared recoverTopologyIdFromState helper in raft_common.go. * fix: stagger multi-master bootstrap delay by peer index Previously all masters used a fixed 1500ms delay before the bootstrap check. Now the delay is proportional to the peer's sorted index with randomization (matching the hashicorp raft path), giving the designated bootstrap node (peer 0) a head start while later peers wait for gRPC servers to be ready. Also adds diagnostic logging showing why DoJoinCommand was or wasn't called, making leader election issues easier to diagnose from logs. * fix: skip unreachable masters during leader reconnection When a master leader goes down, non-leader masters still redirect clients to the stale leader address. The masterClient would follow these redirects, fail, and retry — wasting round-trips each cycle. Now tryAllMasters tracks which masters failed within a cycle and skips redirects pointing to them, reducing log spam and connection overhead during leader failover. * fix: take snapshot after TopologyId generation for recovery After generating a new TopologyId on the leader, immediately take a raft snapshot so the ID can be recovered from the snapshot on future restarts with RaftResumeState=false. Without this, short-lived clusters would lose the TopologyId on restart since no automatic snapshot had been taken yet. * test: add multi-master raft failover integration tests Integration test framework and 5 test scenarios for 3-node master clusters: - TestLeaderConsistencyAcrossNodes: all nodes agree on leader and TopologyId - TestLeaderDownAndRecoverQuickly: leader stops, new leader elected, old leader rejoins as follower - TestLeaderDownSlowRecover: leader gone for extended period, cluster continues with 2/3 quorum - TestTwoMastersDownAndRestart: quorum lost (2/3 down), recovered when both restart - TestAllMastersDownAndRestart: full cluster restart, leader elected, all nodes agree on TopologyId * fix: address PR review comments - peerIndex: return -1 (not 0) when self not found, add warning log - recoverTopologyIdFromSnapshot: defer dir.Close() - tests: check GetTopologyId errors instead of discarding them * fix: address review comments on failover tests - Assert no leader after quorum loss (was only logging) - Verify follower cs.Leader matches expected leader via ServerAddress.ToHttpAddress() comparison - Check GetTopologyId error in TestTwoMastersDownAndRestart	2026-03-18 23:28:07 -07:00
Chris Lu	7b8df39cf7	s3api: add AttachUserPolicy/DetachUserPolicy/ListAttachedUserPolicies (#8379 ) * iam: add XML responses for managed user policy APIs * s3api: implement attach/detach/list attached user policies * s3api: add embedded IAM tests for managed user policies * iam: update CredentialStore interface and Manager for managed policies Updated the `CredentialStore` interface to include `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` methods. The `CredentialManager` was updated to delegate these calls to the store. Added common error variables for policy management. * iam: implement managed policy methods in MemoryStore Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` in the MemoryStore. Also ensured deep copying of identities includes PolicyNames. * iam: implement managed policy methods in PostgresStore Modified Postgres schema to include `policy_names` JSONB column in `users`. Implemented `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies`. Updated user CRUD operations to handle policy names persistence. * iam: implement managed policy methods in remaining stores Implemented user policy management in: - `FilerEtcStore` (partial implementation) - `IamGrpcStore` (delegated via GetUser/UpdateUser) - `PropagatingCredentialStore` (to broadcast updates) Ensures cluster-wide consistency for policy attachments. * s3api: refactor EmbeddedIamApi to use managed policy APIs - Refactored `AttachUserPolicy`, `DetachUserPolicy`, and `ListAttachedUserPolicies` to use `e.credentialManager` directly. - Fixed a critical error suppression bug in `ExecuteAction` that always returned success even on failure. - Implemented robust error matching using string comparison fallbacks. - Improved consistency by reloading configuration after policy changes. * s3api: update and refine IAM integration tests - Updated tests to use a real `MemoryStore`-backed `CredentialManager`. - Refined test configuration synchronization using `sync.Once` and manual deep-copying to prevent state corruption. - Improved `extractEmbeddedIamErrorCodeAndMessage` to handle more XML formats robustly. - Adjusted test expectations to match current AWS IAM behavior. * fix compilation * visibility * ensure 10 policies * reload * add integration tests * Guard raft command registration * Allow IAM actions in policy tests * Validate gRPC policy attachments * Revert Validate gRPC policy attachments * Tighten gRPC policy attach/detach * Improve IAM managed policy handling * Improve managed policy filters	2026-02-19 12:26:27 -08:00
Chris Lu	753e1db096	Prevent split-brain: Persistent ClusterID and Join Validation (#8022 ) * Prevent split-brain: Persistent ClusterID and Join Validation - Persist ClusterId in Raft store to survive restarts. - Validate ClusterId on Raft command application (piggybacked on MaxVolumeId). - Prevent masters with conflicting ClusterIds from joining/operating together. - Update Telemetry to report the persistent ClusterId. * Refine ClusterID validation based on feedback - Improved error message in cluster_commands.go. - Added ClusterId mismatch check in RaftServer.Recovery. * Handle Raft errors and support Hashicorp Raft for ClusterId - Check for errors when persisting ClusterId in legacy Raft. - Implement ClusterId generation and persistence for Hashicorp Raft leader changes. - Ensure consistent error logging. * Refactor ClusterId validation - Centralize ClusterId mismatch check in Topology.SetClusterId. - Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId. * Fix goroutine leak and add timeout - Handle channel closure in Hashicorp Raft leader listener. - Add timeout to Raft Apply call to prevent blocking. * Fix deadlock in legacy Raft listener - Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock). * Rename ClusterId to SystemId - Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry). - Regenerated telemetry.pb.go with new field. * Rename SystemId to TopologyId - Rename to SystemId was intermediate step. - Final name is TopologyId for the persistent cluster identifier. - Updated protobuf, topology, raft server, master server, and telemetry. * Optimize Hashicorp Raft listener - Integrated TopologyId generation into existing monitorLeaderLoop. - Removed extra goroutine in master_server.go. * Fix optimistic TopologyId update - Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go. - State is now solely updated via the Raft state machine Apply/Restore methods after consensus. * Add explicit log for recovered TopologyId - Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup. * Add Raft barrier to prevent TopologyId race condition - Implement ensureTopologyId helper method - Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId - Ensures persisted TopologyId is recovered before generating new one - Prevents race where generation happens during log replay * Serialize TopologyId generation with mutex - Add topologyIdGenLock mutex to MasterServer struct - Wrap ensureTopologyId method with lock to prevent concurrent generation - Fixes race where event listener and manual leadership check both generate IDs - Second caller waits for first to complete and sees the generated ID * Add TopologyId recovery logging to Apply method - Change log level from V(1) to V(0) for visibility - Log 'Recovered TopologyId' when applying from Raft log - Ensures recovery is visible whether from snapshot or log replay - Matches Recovery() method logging for consistency * Fix Raft barrier timing issue - Add 100ms delay after barrier command to ensure log application completes - Add debug logging to track barrier execution and TopologyId state - Return early if barrier command fails - Prevents TopologyId generation before old logs are fully applied * ensure leader * address comments * address comments * redundant * clean up * double check * refactoring * comment	2026-01-18 14:02:34 -08:00
Neo	aa61824442	master:fix empty target in Build() (#6069 )	2024-09-27 00:45:18 -07:00
chrislu	a4b25a642d	math/rand => math/rand/v2	2024-08-29 09:52:21 -07:00
wyang	4b1f539ab8	fix allocate reduplicated volumeId to different volume (#5811 ) * fix allocate reduplicated volumeId to different volume * only check barrier when read --------- Co-authored-by: Yang Wang <yangwang@weride.ai>	2024-07-26 21:48:36 -07:00
chrislu	d999f1f0e2	update raft version fix #4460	2023-05-09 22:54:23 -07:00
chrislu	4511edc871	update raft	2023-05-07 13:33:44 -07:00
Zachary Walters	ef2f741823	Updated the deprecated ioutil dependency (#4239 )	2023-02-21 19:47:33 -08:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
chrislu	3828b8ce87	"github.com/chrislusf/raft" => "github.com/seaweedfs/raft"	2022-07-27 12:12:40 -07:00
Konstantin Lebedev	c1c8dad677	avoid no such raft date directory https://github.com/chrislusf/seaweedfs/issues/3214	2022-06-21 13:47:51 +05:00
Konstantin Lebedev	1a1e5778c3	fix cluster status	2022-04-04 18:52:08 +05:00
Konstantin Lebedev	14dd971890	hashicorp raft with state machine	2022-04-04 17:51:51 +05:00
Konstantin Lebedev	c514710b7b	initial add hashicorp raft	2022-04-04 13:50:56 +05:00
Konstantin Lebedev	84b7b83517	fix permission mkdir snapshot avoid open file operation not permitted	2022-03-28 18:41:52 +05:00
chrislu	4ba7127ab1	refactor	2022-03-26 13:13:19 -07:00
chrislu	fba1cfc2d6	simplify a bit	2022-03-26 10:24:05 -07:00
chrislu	a3411dd9da	refactor	2022-03-26 10:21:26 -07:00
Konstantin Lebedev	ddd3945c26	fix remove deleted peers of raft server https://github.com/chrislusf/seaweedfs/issues/2804	2022-03-25 15:09:38 +05:00
Konstantin Lebedev	c1450bf9fe	always clear previous log to avoid server is promotable https://github.com/chrislusf/seaweedfs/issues/2804	2022-03-25 13:40:19 +05:00
garenchan	bd032eabe7	[UPDATE] Make heartbeat interval and election timeout of masters configurable.	2022-02-14 21:09:07 +08:00
Chris Lu	e5fc35ed0c	change server address from string to a type	2021-09-12 22:47:52 -07:00
Chris Lu	1b17f71939	adjust election timeout to 10 seconds	2020-10-23 23:06:44 -07:00
Chris Lu	da4edf3651	master: check peers for existing leader before starting a leader election fix https://github.com/chrislusf/seaweedfs/issues/1509	2020-10-07 01:25:39 -07:00
Устюжанин Антон Александрович	702b1cb876	fix: remove deleted peers if resumeState = true	2020-10-04 21:56:17 +05:00
Устюжанин Антон Александрович	dc31b19469	fix: restore raft state	2020-10-03 14:03:41 +05:00
Устюжанин Антон Александрович	8c82fb7e5f	fix: restore raft state	2020-10-02 23:01:20 +05:00
Chris Lu	044841c885	master: always clear previous master meta data directory	2020-06-19 20:42:16 -07:00
Chris Lu	f90c43635d	refactoring	2020-03-04 00:39:47 -08:00
Lei Liu	c2884cace2	misc updated Signed-off-by: Lei Liu <lei01.liu@horizon.ai>	2019-10-29 21:28:28 +08:00
Chris Lu	da871896c3	weed filer: set grpc port to port + 10000	2019-03-19 05:47:41 -07:00
Chris Lu	07af52cb6f	raft change from http to grpc master grpc port is fixed to http port + 10000	2019-02-18 22:38:14 -08:00
Chris Lu	7103c1ab7e	go fmt	2019-02-15 00:09:48 -08:00
Chris Lu	8afc632484	raft: use the first master to bootstrap the election	2019-02-09 12:52:09 -08:00
Chris Lu	1334507595	Revert "randomize based on self address" This reverts commit `6230eb28a6`.	2019-01-28 12:12:51 -08:00
Chris Lu	6230eb28a6	randomize based on self address fix #851	2019-01-28 11:55:33 -08:00
Chris Lu	221105eea3	Revert "use the first entry to bootstrap master cluster" This reverts commit `40c8725ffa`.	2019-01-28 11:46:46 -08:00
Chris Lu	40c8725ffa	use the first entry to bootstrap master cluster fix https://github.com/chrislusf/seaweedfs/issues/851	2019-01-28 10:35:28 -08:00
Chris Lu	834f414af9	add a timeout	2019-01-26 00:15:42 -06:00
Chris Lu	3f56b12ed4	raft: adding idle connection time out another attempt to fix https://github.com/chrislusf/seaweedfs/issues/825	2019-01-22 09:25:25 -08:00
Chris Lu	1d103e3ed5	timeout http connections possible fix for https://github.com/chrislusf/seaweedfs/issues/825	2019-01-17 23:38:33 -08:00
Chris Lu	75d63db60d	randomize raft server startup also some go fmt	2018-08-12 14:27:14 -07:00
Chris Lu	03f50180f3	simplifying the leader election by raft fixing https://github.com/chrislusf/seaweedfs/issues/629	2018-06-12 01:54:09 -07:00
Chris Lu	458ada173e	go fmt	2018-05-27 11:52:26 -07:00
Chris Lu	70f6740309	better fix for single master restart without peers changing	2017-12-06 00:14:14 -08:00
Chris Lu	9026b3e86e	always remember the max volume id	2017-11-28 17:08:59 -08:00
Chris Lu	72e89b615b	301 is reported as 404 for http post fix https://github.com/chrislusf/seaweedfs/issues/512	2017-06-15 21:21:32 -07:00
Chris Lu	c8f54aad8b	adjust timing of leader election	2017-01-18 09:54:43 -08:00

1 2

53 Commits