seaweedFS

Author	SHA1	Message	Date
Chris Lu	4efe0acaf5	fix(master): fast resume state and default resumeState to true (#8925 ) * fix(master): fast resume state and default resumeState to true When resumeState is enabled in single-master mode, the raft server had existing log entries so the self-join path couldn't promote to leader. The server waited the full election timeout (10-20s) before self-electing. Fix by temporarily setting election timeout to 1ms before Start() when in single-master + resumeState mode with existing log, then restoring the original timeout after leader election. This makes resume near-instant. Also change the default for resumeState from false to true across all CLI commands (master, mini, server) so state is preserved by default. * fix(master): prevent fastResume goroutine from hanging forever Use defer to guarantee election timeout is always restored, and bound the polling loop with a timeout so it cannot spin indefinitely if leader election never succeeds. * fix(master): use ticker instead of time.After in fastResume polling loop	2026-04-04 14:15:56 -07:00
Chris Lu	2eaf98a7a2	Use Unix sockets for gRPC in mini mode (#8856 ) * Use Unix sockets for gRPC between co-located services in mini mode In `weed mini`, all services run in one process. Previously, inter-service gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.) went through TCP loopback. This adds a gRPC Unix socket registry in the pb package: mini mode registers a socket path per gRPC port at startup, each gRPC server additionally listens on its socket, and GrpcDial transparently routes to the socket via WithContextDialer when a match is found. Standalone commands (weed master, weed filer, etc.) are unaffected since no sockets are registered. TCP listeners are kept for external clients. * Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket Log non-expected errors from grpcServer.Serve (ignoring grpc.ErrServerStopped) and always remove the Unix socket file when Serve returns, ensuring cleanup on Stop/GracefulStop.	2026-03-30 18:18:52 -07:00
Chris Lu	15f4a97029	fix: improve raft leader election reliability and failover speed (#8692 ) * fix: clear raft vote state file on non-resume startup The seaweedfs/raft library v1.1.7 added a persistent `state` file for currentTerm and votedFor. When RaftResumeState=false (the default), the log, conf, and snapshot directories are cleared but this state file was not. On repeated restarts, different masters accumulate divergent terms, causing AppendEntries rejections and preventing leader election. Fixes #8690 * fix: recover TopologyId from snapshot before clearing raft state When RaftResumeState=false clears log/conf/snapshot, the TopologyId (used for license validation) was lost. Now extract it from the latest snapshot before cleanup and restore it on the topology. Both seaweedfs/raft and hashicorp/raft paths are handled, with a shared recoverTopologyIdFromState helper in raft_common.go. * fix: stagger multi-master bootstrap delay by peer index Previously all masters used a fixed 1500ms delay before the bootstrap check. Now the delay is proportional to the peer's sorted index with randomization (matching the hashicorp raft path), giving the designated bootstrap node (peer 0) a head start while later peers wait for gRPC servers to be ready. Also adds diagnostic logging showing why DoJoinCommand was or wasn't called, making leader election issues easier to diagnose from logs. * fix: skip unreachable masters during leader reconnection When a master leader goes down, non-leader masters still redirect clients to the stale leader address. The masterClient would follow these redirects, fail, and retry — wasting round-trips each cycle. Now tryAllMasters tracks which masters failed within a cycle and skips redirects pointing to them, reducing log spam and connection overhead during leader failover. * fix: take snapshot after TopologyId generation for recovery After generating a new TopologyId on the leader, immediately take a raft snapshot so the ID can be recovered from the snapshot on future restarts with RaftResumeState=false. Without this, short-lived clusters would lose the TopologyId on restart since no automatic snapshot had been taken yet. * test: add multi-master raft failover integration tests Integration test framework and 5 test scenarios for 3-node master clusters: - TestLeaderConsistencyAcrossNodes: all nodes agree on leader and TopologyId - TestLeaderDownAndRecoverQuickly: leader stops, new leader elected, old leader rejoins as follower - TestLeaderDownSlowRecover: leader gone for extended period, cluster continues with 2/3 quorum - TestTwoMastersDownAndRestart: quorum lost (2/3 down), recovered when both restart - TestAllMastersDownAndRestart: full cluster restart, leader elected, all nodes agree on TopologyId * fix: address PR review comments - peerIndex: return -1 (not 0) when self not found, add warning log - recoverTopologyIdFromSnapshot: defer dir.Close() - tests: check GetTopologyId errors instead of discarding them * fix: address review comments on failover tests - Assert no leader after quorum loss (was only logging) - Verify follower cs.Leader matches expected leader via ServerAddress.ToHttpAddress() comparison - Check GetTopologyId error in TestTwoMastersDownAndRestart	2026-03-18 23:28:07 -07:00
Chris Lu	ba8e2aaae9	Fix master leader election when grpc ports change (#8272 ) * Fix master leader detection when grpc ports change * Canonicalize self peer entry to avoid raft self-alias panic * Normalize and deduplicate master peer addresses	2026-02-09 18:13:02 -08:00
Chris Lu	c106532b79	fix: prevent MiniClusterCtx race conditions in command shutdown Capture global MiniClusterCtx into local variables before goroutine/select evaluation to prevent nil dereference/data race when context is reset to nil after nil check. Applied to filer, master, volume, and s3 commands.	2026-01-28 19:42:16 -08:00
Chris Lu	01c17478ae	command: implement graceful shutdown for mini cluster - Introduce MiniClusterCtx to coordinate shutdown across mini services - Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation - Ensure all resources are cleaned up properly during test teardown - Integrate MiniClusterCtx in s3tables integration tests	2026-01-28 10:36:19 -08:00
Chris Lu	ed1da07665	Add consistent -debug and -debug.port flags to commands (#7816 ) * Add consistent -debug and -debug.port flags to commands Add -debug and -debug.port flags to weed master, weed volume, weed s3, weed mq.broker, and weed filer.sync commands for consistency with weed filer. When -debug is enabled, an HTTP server starts on the specified port (default 6060) serving runtime profiling data at /debug/pprof/. For mq.broker, replaced the older -port.pprof flag with the new -debug and -debug.port pattern for consistency. * Update weed/util/grace/pprof.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-18 17:44:36 -08:00
Chris Lu	f096b067fd	weed master add peers=none option for faster startup (#7419 ) * weed master -peers=none * single master mode only when peers is none * refactoring * revert duplicated code * revert * Update weed/command/master.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * preventing "none" passed to other components if master is not started --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-31 18:29:16 -07:00
Chris Lu	5ab49e2971	Adjust cli option (#7418 ) * adjust "weed benchmark" CLI to use readOnly/writeOnly * consistently use "-master" CLI option * If both -readOnly and -writeOnly are specified, the current logic silently allows it with -writeOnly taking precedence. This is confusing and could lead to unexpected behavior.	2025-10-31 17:08:00 -07:00
zuzuviewer	8fa1a69f8c	* Fix undefined http serve behaiver (#6943 )	2025-07-07 22:48:12 -07:00
chrislu	166e36bcd3	use telemetry.seaweedfs.com	2025-06-28 19:48:03 -07:00
Chris Lu	a1aab8a083	add telemetry (#6926 ) * add telemetry * fix go mod * add default telemetry server url * Update README.md * replace with broker count instead of s3 count * Update telemetry.pb.go * github action to deploy	2025-06-28 14:11:55 -07:00
chrislu	ab49540d2b	use master.toml value if not empty fix https://github.com/seaweedfs/seaweedfs/issues/6922	2025-06-25 17:54:56 -07:00
chrislu	bd4891a117	change version directory	2025-06-03 22:46:10 -07:00
Ethan Mosbaugh	9ebc132ffc	fix: bucket-hook fails with gnu wget (#6521 )	2025-02-06 23:11:17 -08:00
chrislu	ec155022e7	"golang.org/x/exp/slices" => "slices" and go fmt	2024-12-19 19:25:06 -08:00
zouyixiong	881c9a009e	[master] master missing start LoopPushingMetric routine fixed. (#6018 )	2024-09-13 20:01:34 -07:00
chrislu	4463296811	add parallel vacuuming	2024-08-21 22:53:54 -07:00
vadimartynov	b796c21fa9	Added loadSecurityConfigOnce (#5792 )	2024-07-16 09:15:55 -07:00
vadimartynov	ec9e7493b3	-metricsIp cmd flag (#5773 ) * Added/Updated: - Added metrics ip options for all servers; - Fixed a bug with the selection of the binIp or ip parameter for the metrics handler; * Fixed cmd flags	2024-07-12 10:56:26 -07:00
Konstantin Lebedev	5ffacbb6ea	refactor all methods strings to const (#5726 )	2024-07-01 01:00:39 -07:00
vadimartynov	8aae82dd71	Added context for the MasterClient's methods to avoid endless loops (#5628 ) * Added context for the MasterClient's methods to avoid endless loops * Returned WithClient function. Added WithClientCustomGetMaster function * Hid unused ctx arguments * Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions * Changed the context termination check in the tryConnectToMaster function * Added a child context to the tryConnectToMaster function * Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark	2024-06-14 11:40:34 -07:00
chrislu	645ae8c57b	Revert "Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs "" This reverts commit `8cb42c39`	2023-09-25 09:35:16 -07:00
chrislu	8cb42c39ad	Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs " This reverts commit `2e5aa06026`, reversing changes made to `4d414f54a2`.	2023-09-18 16:12:50 -07:00
dependabot[bot]	a04bd4d26f	Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850 ) * Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/rclone/rclone/releases) - [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md) - [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0) --- updated-dependencies: - dependency-name: github.com/rclone/rclone dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * API changes * go mod --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2023-09-18 14:43:05 -07:00
Stewart Miles	dd71f54c6b	Fix -raftHashicorp and -raftBootstrap flag propagation. (#4309 ) `weed server` was not correctly propagating `-master.raftHashicorp` and `-master.raftBootstrap` flags when starting the master server. Related to #4307	2023-03-15 13:03:20 -07:00
Jiffs Maverick	4b0430e71d	[metrics] Add the ability to control bind ip (#4012 )	2022-11-24 10:22:59 -08:00
Konstantin Lebedev	401315f337	master fix interruption through ctrl+c (#3834 )	2022-10-12 07:18:40 -07:00
Konstantin Lebedev	b9933d5589	master server graceful stop (#3797 )	2022-10-06 09:30:30 -07:00
Ryan Russell	8efe1db01a	refactor(various): `Listner` -> `Listener` readability improvements (#3672 ) * refactor(net_timeout): `listner` -> `listener` Signed-off-by: Ryan Russell <git@ryanrussell.org> * refactor(s3): `s3ApiLocalListner` -> `s3ApiLocalListener` Signed-off-by: Ryan Russell <git@ryanrussell.org> * refactor(filer): `localPublicListner` -> `localPublicListener` Signed-off-by: Ryan Russell <git@ryanrussell.org> * refactor(command): `masterLocalListner` -> `masterLocalListener` Signed-off-by: Ryan Russell <git@ryanrussell.org> * refactor(net_timeout): `ipListner` -> `ipListener` Signed-off-by: Ryan Russell <git@ryanrussell.org> Signed-off-by: Ryan Russell <git@ryanrussell.org>	2022-09-14 11:59:55 -07:00
Patrick Schmidt	7b424a54dc	Add raft server access mutex to avoid races (#3503 )	2022-08-24 09:49:05 -07:00
chrislu	67814a5c79	refactor and fix strings.Split	2022-08-07 01:34:32 -07:00
Patrick Schmidt	1a4a36d510	Add healthy indicator for raft status	2022-07-30 19:34:26 +02:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
chrislu	3828b8ce87	"github.com/chrislusf/raft" => "github.com/seaweedfs/raft"	2022-07-27 12:12:40 -07:00
chrislu	492da3dbce	master: put metadata under instance specific folder	2022-06-20 19:04:49 -07:00
leyou240	89eb87c1d1	Merge branch 'master' into slices.SortFunc	2022-04-18 10:39:29 +08:00
justin	3551ca2fcf	enhancement: replace sort.Slice with slices.SortFunc to reduce reflection	2022-04-18 10:35:43 +08:00
Konstantin Lebedev	17c6e8e39f	Merge branch 'new_master' into hashicorp_raft # Conflicts: # go.mod # go.sum	2022-04-05 13:29:46 +05:00
Konstantin Lebedev	622297f1a7	add stats raft handler	2022-04-04 19:16:06 +05:00
Konstantin Lebedev	14dd971890	hashicorp raft with state machine	2022-04-04 17:51:51 +05:00
Konstantin Lebedev	c514710b7b	initial add hashicorp raft	2022-04-04 13:50:56 +05:00
chrislu	daca2d22a5	use original server address string as map key	2022-04-01 17:34:42 -07:00
chrislu	21e0898631	refactor: change masters from a slice to a map	2022-03-26 13:33:17 -07:00
chrislu	4ba7127ab1	refactor	2022-03-26 13:13:19 -07:00
Berck Nash	7ee38fa3a4	The fixes for https://github.com/chrislusf/seaweedfs/issues/1937 had a few problems: (1) The help file says that in the absence of a ipBind being specified, that it will bind to the "ip" specified. Instead, it bound to localhost which broke the default configuration. This change implements the documented behavior instead. (2) The new IAM filer ip address has no default. This instantiates it to the same as the filer IP. I'm not sure if there should be a corresponding iam.ip or iam.ipBind option added to the filer command?	2022-03-17 15:30:23 -06:00
Berck Nash	9b14f0c81a	Add mTLS support for both master and volume http server.	2022-03-16 09:52:17 -06:00
chrislu	3639cad69c	master, filer, s3: also listen to "localhost" in addition to specific ip address related to https://github.com/chrislusf/seaweedfs/issues/1937	2022-03-15 22:28:18 -07:00
chrislu	3a6eb8ca5f	default bind to one ip address fix https://github.com/chrislusf/seaweedfs/issues/1937	2022-03-11 14:02:39 -08:00
garenchan	bd032eabe7	[UPDATE] Make heartbeat interval and election timeout of masters configurable.	2022-02-14 21:09:07 +08:00

1 2 3

115 Commits