seaweedFS

Author	SHA1	Message	Date
Chris Lu	b3620c7e14	admin: auto migrating master maintenance scripts to admin_script plugin config (#8509 ) * admin: seed admin_script plugin config from master maintenance scripts When the admin server starts, fetch the maintenance scripts configuration from the master via GetMasterConfiguration. If the admin_script plugin worker does not already have a saved config, use the master's scripts as the default value. This enables seamless migration from master.toml [master.maintenance] to the admin script plugin worker. Changes: - Add maintenance_scripts and maintenance_sleep_minutes fields to GetMasterConfigurationResponse in master.proto - Populate the new fields from viper config in master_grpc_server.go - On admin server startup, fetch the master config and seed the admin_script plugin config if no config exists yet - Strip lock/unlock commands from the master scripts since the admin script worker handles locking automatically Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review comments on admin_script seeding - Replace TOCTOU race (separate Load+Save) with atomic SaveJobTypeConfigIfNotExists on ConfigStore and Plugin - Replace ineffective polling loop with single GetMaster call using 30s context timeout, since GetMaster respects context cancellation - Add unit tests for SaveJobTypeConfigIfNotExists (in-memory + on-disk) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: apply maintenance script defaults in gRPC handler The gRPC handler for GetMasterConfiguration read maintenance scripts from viper without calling SetDefault, relying on startAdminScripts having run first. If the admin server calls GetMasterConfiguration before startAdminScripts sets the defaults, viper returns empty strings and the seeding is silently skipped. Apply SetDefault in the gRPC handler itself so it is self-contained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "fix: apply maintenance script defaults in gRPC handler" This reverts commit 068a5063303f6bc34825a07bb681adfa67e6f9de. * fix: use atomic save in ensureJobTypeConfigFromDescriptor ensureJobTypeConfigFromDescriptor used a separate Load + Save, racing with seedAdminScriptFromMaster. If the descriptor defaults (empty script) were saved first, SaveJobTypeConfigIfNotExists in the seeding goroutine would see an existing config and skip, losing the master's maintenance scripts. Switch to SaveJobTypeConfigIfNotExists so both paths are atomic. Whichever wins, the other is a safe no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: fetch master scripts inline during config bootstrap, not in goroutine Replace the seedAdminScriptFromMaster goroutine with a ConfigDefaultsProvider callback. When the plugin bootstraps admin_script defaults from the worker descriptor, it calls the provider which fetches maintenance scripts from the master synchronously. This eliminates the race between the seeding goroutine and the descriptor-based config bootstrap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * skip commented lock unlock Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com> * reduce grpc calls --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-04 22:11:07 -08:00
Chris Lu	45ce18266a	Disable master maintenance scripts when admin server runs (#8499 ) * Disable master maintenance scripts when admin server runs * Stop defaulting master maintenance scripts * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Clarify master scripts are disabled by default * Skip master maintenance scripts when admin server is connected * Restore default master maintenance scripts * Document admin server skip for master maintenance scripts --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 00:40:40 -08:00
Chris Lu	57ab99d13e	fix: generate topology uuid uniformly in single-master mode (#8405 ) * fix: ensure topology uuid is generated in single master setups * ensureTopologyId adds a Hashicorp-aware implementation * simplify	2026-02-22 23:45:48 -08:00
Chris Lu	3300874cb5	filer: add default log purging to master maintenance scripts (#8359 ) * filer: add default log purging to master maintenance scripts * filer: fix default maintenance scripts to include full set of tasks * filer: refactor maintenance scripts to avoid duplication	2026-02-16 16:58:15 -08:00
Chris Lu	cb9e21cdc5	Normalize hashicorp raft peer ids (#8253 ) * Normalize raft voter ids * 4.11 * Update raft_hashicorp.go	2026-02-09 07:46:34 -08:00
Chris Lu	753e1db096	Prevent split-brain: Persistent ClusterID and Join Validation (#8022 ) * Prevent split-brain: Persistent ClusterID and Join Validation - Persist ClusterId in Raft store to survive restarts. - Validate ClusterId on Raft command application (piggybacked on MaxVolumeId). - Prevent masters with conflicting ClusterIds from joining/operating together. - Update Telemetry to report the persistent ClusterId. * Refine ClusterID validation based on feedback - Improved error message in cluster_commands.go. - Added ClusterId mismatch check in RaftServer.Recovery. * Handle Raft errors and support Hashicorp Raft for ClusterId - Check for errors when persisting ClusterId in legacy Raft. - Implement ClusterId generation and persistence for Hashicorp Raft leader changes. - Ensure consistent error logging. * Refactor ClusterId validation - Centralize ClusterId mismatch check in Topology.SetClusterId. - Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId. * Fix goroutine leak and add timeout - Handle channel closure in Hashicorp Raft leader listener. - Add timeout to Raft Apply call to prevent blocking. * Fix deadlock in legacy Raft listener - Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock). * Rename ClusterId to SystemId - Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry). - Regenerated telemetry.pb.go with new field. * Rename SystemId to TopologyId - Rename to SystemId was intermediate step. - Final name is TopologyId for the persistent cluster identifier. - Updated protobuf, topology, raft server, master server, and telemetry. * Optimize Hashicorp Raft listener - Integrated TopologyId generation into existing monitorLeaderLoop. - Removed extra goroutine in master_server.go. * Fix optimistic TopologyId update - Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go. - State is now solely updated via the Raft state machine Apply/Restore methods after consensus. * Add explicit log for recovered TopologyId - Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup. * Add Raft barrier to prevent TopologyId race condition - Implement ensureTopologyId helper method - Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId - Ensures persisted TopologyId is recovered before generating new one - Prevents race where generation happens during log replay * Serialize TopologyId generation with mutex - Add topologyIdGenLock mutex to MasterServer struct - Wrap ensureTopologyId method with lock to prevent concurrent generation - Fixes race where event listener and manual leadership check both generate IDs - Second caller waits for first to complete and sees the generated ID * Add TopologyId recovery logging to Apply method - Change log level from V(1) to V(0) for visibility - Log 'Recovered TopologyId' when applying from Raft log - Ensures recovery is visible whether from snapshot or log replay - Matches Recovery() method logging for consistency * Fix Raft barrier timing issue - Add 100ms delay after barrier command to ensure log application completes - Add debug logging to track barrier execution and TopologyId state - Return early if barrier command fails - Prevents TopologyId generation before old logs are fully applied * ensure leader * address comments * address comments * redundant * clean up * double check * refactoring * comment	2026-01-18 14:02:34 -08:00
Chris Lu	07dc552e1c	master: Fix raft url (#7255 ) * fix signature * fix url scheme	2025-09-18 14:46:53 -07:00
Dmitriy Pavlov	cd78e653e1	add disable volume_growth flag (#7196 )	2025-09-04 05:39:56 -07:00
Chris Lu	e446234e9c	remove spoof-able request header (#7103 ) * remove spoof-able request header https://github.com/seaweedfs/seaweedfs/issues/7094#issuecomment-3158320497 * Update weed/security/guard.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-06 10:08:30 -07:00
Chris Lu	0703308270	remote address parsing should handle special cases (#7101 ) * remote address parsing should handle special cases * handling ipv6 * simplify * Update weed/security/guard.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * x-real-ip * Update guard.go * fixes Hostname Whitelisting: Fully restored - supports localhost, example.com, etc. IP Whitelisting: Still works - supports exact IPs and CIDR ranges Header Support: Consistent handling of X-Forwarded-For, X-Real-IP * simplify * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update guard.go * adjust function signature * Update weed/security/guard.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * indention * skip empty host --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-06 01:03:00 -07:00
chrislu	798f797158	use float for sleep seconds fix https://github.com/seaweedfs/seaweedfs/pull/6795	2025-07-06 14:16:41 -07:00
chrislu	1733d0ce68	remove features and deployments fields	2025-06-28 20:03:06 -07:00
Chris Lu	a1aab8a083	add telemetry (#6926 ) * add telemetry * fix go mod * add default telemetry server url * Update README.md * replace with broker count instead of s3 count * Update telemetry.pb.go * github action to deploy	2025-06-28 14:11:55 -07:00
Aleksey Kosov	5182d46e22	Added middleware for processing request_id grpc and http requests (#6805 )	2025-05-21 07:57:39 -07:00
Lisandro Pin	fc4df944a0	Remove rate limit semaphore on master's leader selection logic. (#6494 ) This was introduced by `054374c7` (2024-03-12) and serves no practical purpose, yet it caps the maximum QPS master servers can handle.	2025-01-30 13:08:36 -08:00
Konstantin Lebedev	b65eb2ec45	[security] reload whiteList on http seerver (#6302 ) * reload whiteList * white_list add to scaffold	2024-12-02 10:38:10 -08:00
Konstantin Lebedev	fec88e64eb	[master] update LastLeaderChangeTime for hashicorp raft (#6292 )	2024-11-26 08:02:45 -08:00
chrislu	ccf1795e6f	wait a bit before getting the next volume id if the leader is recently elected	2024-11-23 19:58:45 -08:00
chrislu	6564ceda91	skip resource heavy commands from running on master nodes	2024-09-29 10:51:17 -07:00
chrislu	4463296811	add parallel vacuuming	2024-08-21 22:53:54 -07:00
Riccardo Bertossa	6fe8639504	add http endpoint to get the size of a collection (#5910 )	2024-08-19 07:44:45 -07:00
wyang	4b1f539ab8	fix allocate reduplicated volumeId to different volume (#5811 ) * fix allocate reduplicated volumeId to different volume * only check barrier when read --------- Co-authored-by: Yang Wang <yangwang@weride.ai>	2024-07-26 21:48:36 -07:00
vadimartynov	86d92a42b4	Added tls for http clients (#5766 ) * Added global http client * Added Do func for global http client * Changed the code to use the global http client * Fix http client in volume uploader * Fixed pkg name * Fixed http util funcs * Fixed http client for bench_filer_upload * Fixed http client for stress_filer_upload * Fixed http client for filer_server_handlers_proxy * Fixed http client for command_fs_merge_volumes * Fixed http client for command_fs_merge_volumes and command_volume_fsck * Fixed http client for s3api_server * Added init global client for main funcs * Rename global_client to client * Changed: - fixed NewHttpClient; - added CheckIsHttpsClientEnabled func - updated security.toml in scaffold * Reduce the visibility of some functions in the util/http/client pkg * Added the loadSecurityConfig function * Use util.LoadSecurityConfiguration() in NewHttpClient func	2024-07-16 23:14:09 -07:00
Konstantin Lebedev	67edf1d014	[master] Do Automatic Volume Grow in background (#5781 ) * Do Automatic Volume Grow in backgound * pass lastGrowCount to master * fix build * fix count to uint64	2024-07-16 08:03:40 -07:00
vadimartynov	8aae82dd71	Added context for the MasterClient's methods to avoid endless loops (#5628 ) * Added context for the MasterClient's methods to avoid endless loops * Returned WithClient function. Added WithClientCustomGetMaster function * Hid unused ctx arguments * Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions * Changed the context termination check in the tryConnectToMaster function * Added a child context to the tryConnectToMaster function * Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark	2024-06-14 11:40:34 -07:00
shenxingwuying	ee25ada732	reduce ambiguity about use memory_sequencer (#5555 )	2024-04-29 21:51:00 -07:00
chrislu	55976ae04a	avoid repeated calls to heavy-weighted viper	2024-04-18 09:09:45 -07:00
chrislu	d9490c5e1f	rename	2024-04-18 08:47:45 -07:00
Nico D'Cotta	796b7508f3	Implement SRV lookups for filer (#4767 )	2023-08-24 07:08:56 -07:00
chrislu	a315490f7d	proxy to master uses http address fix https://github.com/seaweedfs/seaweedfs/issues/4607	2023-07-04 11:45:21 -07:00
chrislu	adb90bd252	avoid lower casing the command fix https://github.com/seaweedfs/seaweedfs/pull/4321	2023-03-19 21:20:46 -07:00
Konstantin Lebedev	b9933d5589	master server graceful stop (#3797 )	2022-10-06 09:30:30 -07:00
Konstantin Lebedev	e90ab4ac60	avoid race conditions for OnPeerUpdate (#3525 ) https://github.com/seaweedfs/seaweedfs/issues/3524	2022-08-26 10:18:49 -07:00
Patrick Schmidt	7b424a54dc	Add raft server access mutex to avoid races (#3503 )	2022-08-24 09:49:05 -07:00
chrislu	10414fd81c	ping timeout at 15 seconds this 72 minute timeout setting seems unreasonably long 15 seconds is around the time when a new raft leader should be elected.	2022-08-23 23:28:16 -07:00
askeipx	2e78a522ab	remove old raft servers if they don't answer to pings for too long (#3398 ) * remove old raft servers if they don't answer to pings for too long add ping durations as options rename ping fields fix some todos get masters through masterclient raft remove server from leader use raft servers to ping them CheckMastersAlive for hashicorp raft only * prepare blocking ping * pass waitForReady as param * pass waitForReady through all functions * waitForReady works * refactor * remove unneeded params * rollback unneeded changes * fix	2022-08-23 23:18:21 -07:00
Konstantin Lebedev	4d4cd0948d	avoid infinite loop WaitUntilConnected() (#3431 ) https://github.com/seaweedfs/seaweedfs/issues/3421	2022-08-11 15:03:26 -07:00
Chris Lu	b59bc607bf	Merge pull request #3338 from kmlebedev/issues/3083 rollback over onPeerUpdate implementation of automatic clean-up of failed servers in favor of synchronous ping	2022-08-01 08:23:10 -07:00
Konstantin Lebedev	a98f6d66a3	rollback over onPeerupdate implementation of automatic clean-up of failed servers in favor of synchronous ping	2022-08-01 12:51:41 +05:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
chrislu	bb01b68fa0	refactor	2022-07-28 23:24:38 -07:00
chrislu	68065128b8	add dc and rack	2022-07-28 23:22:51 -07:00
chrislu	3828b8ce87	"github.com/chrislusf/raft" => "github.com/seaweedfs/raft"	2022-07-27 12:12:40 -07:00
Konstantin Lebedev	c88ea31f62	fix RUnlock of unlocked RWMutex	2022-07-26 12:57:07 +05:00
Konstantin Lebedev	3c42814b58	avoid deadlock	2022-07-21 17:15:10 +05:00
Konstantin Lebedev	93ca87b7cb	use safe onPeerUpdateDoneCns	2022-07-21 15:51:14 +05:00
Konstantin Lebedev	7875470e74	onPeerUpdateGoroutineCount use int32	2022-07-20 18:40:50 +05:00
Konstantin Lebedev	6c390851e7	fix design	2022-07-20 18:08:12 +05:00
Konstantin Lebedev	f6a966b4fc	add waiting log message	2022-07-20 00:31:57 +05:00
Konstantin Lebedev	6cfbfb0849	check for ping before deleting raft server https://github.com/chrislusf/seaweedfs/issues/3083	2022-07-20 00:04:12 +05:00

1 2 3

145 Commits