seaweedFS

Author	SHA1	Message	Date
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:27:42 -07:00
Chris Lu	baae672b6f	feat: auto-disable master vacuum when plugin worker is active (#8624 ) * feat: auto-disable master vacuum when plugin vacuum worker is active When a vacuum-capable plugin worker connects to the admin server, the admin server calls DisableVacuum on the master to prevent the automatic scheduled vacuum from conflicting with the plugin worker's vacuum. When the worker disconnects, EnableVacuum is called to restore the default behavior. A safety net in the topology refresh loop re-enables vacuum if the admin server disconnects without cleanup. * rename isAdminServerConnected to isAdminServerConnectedFunc * add 5s timeout to DisableVacuum/EnableVacuum gRPC calls Prevents the monitor goroutine from blocking indefinitely if the master is unresponsive. * track plugin ownership of vacuum disable to avoid overriding operator - Add vacuumDisabledByPlugin flag to Topology, set when DisableVacuum is called while admin server is connected (i.e., by plugin monitor) - Safety net only re-enables vacuum when it was disabled by plugin, not when an operator intentionally disabled it via shell command - EnableVacuum clears the plugin flag * extract syncVacuumState for testability, add fake toggler tests Extract the single sync step into syncVacuumState() with a vacuumToggler interface. Add TestSyncVacuumState with a fake toggler that verifies disable/enable calls on state transitions. * use atomic.Bool for isDisableVacuum and vacuumDisabledByPlugin Both fields are written by gRPC handlers and read by the vacuum goroutine, causing a data race. Use atomic.Bool with Store/Load for thread-safe access. * use explicit by_plugin field instead of connection heuristic Add by_plugin bool to DisableVacuumRequest proto so the caller declares intent explicitly. The admin server monitor sets it to true; shell commands leave it false. This prevents an operator's intentional disable from being auto-reversed by the safety net. * use setter for admin server callback instead of function parameter Move isAdminServerConnected from StartRefreshWritableVolumes parameter to Topology.SetAdminServerConnectedFunc() setter. Keeps the function signature stable and decouples the topology layer from the admin server concept. * suppress repeated log messages on persistent sync failures Add retrying parameter to syncVacuumState so the initial state transition is logged at V(0) but subsequent retries of the same transition are silent until the call succeeds. * clear plugin ownership flag on manual DisableVacuum Prevents stale plugin flag from causing incorrect auto-enable when an operator manually disables vacuum after a plugin had previously disabled it. * add by_plugin to EnableVacuumRequest for symmetric ownership tracking Plugin-driven EnableVacuum now only re-enables if the plugin was the one that disabled it. If an operator manually disabled vacuum after the plugin, the plugin's EnableVacuum is a no-op. This prevents the plugin monitor from overriding operator intent on worker disconnect. * use cancellable context for monitorVacuumWorker goroutine Replace context.Background() with a cancellable context stored as bgCancel on AdminServer. Shutdown() calls bgCancel() so monitorVacuumWorker exits cleanly via ctx.Done(). * track operator and plugin vacuum disables independently Replace single isDisableVacuum flag with two independent flags: vacuumDisabledByOperator and vacuumDisabledByPlugin. Each caller only flips its own flag. The effective disabled state is the OR of both. This prevents a plugin connect/disconnect cycle from overriding an operator's manual disable, and vice versa. * fix safety net to clear plugin flag, not operator flag The safety net should call EnableVacuumByPlugin() to clear only the plugin disable flag when the admin server disconnects. The previous call to EnableVacuum() incorrectly cleared the operator flag instead.	2026-03-13 22:49:12 -07:00
Chris Lu	2ec0a67ee3	master: return 503/Unavailable during topology warmup after leader change (#8529 ) * master: return 503/Unavailable during topology warmup after leader change After a master restart or leader change, the topology is empty until volume servers reconnect and send heartbeats. During this warmup window (3 heartbeat intervals = 15 seconds), volume lookups that fail now return 503 Service Unavailable (HTTP) or gRPC Unavailable instead of 404 Not Found, signaling clients to retry with other masters. * master: skip warmup 503 on fresh start and single-master setups - Check MaxVolumeId > 0 to distinguish restart from fresh start (MaxVolumeId is Raft-persisted, so 0 means no prior data) - Check peer count > 1 so single-master deployments aren't affected (no point suggesting "retry with other masters" if there are none) * master: address review feedback and block assigns during warmup - Protect LastLeaderChangeTime with dedicated mutex (fix data race) - Extract warmup multiplier as WarmupPulseMultiplier constant - Derive Retry-After header from pulse config instead of hardcoding - Only trigger warmup 503 for "not found" errors, not parse errors - Return nil response (not partial) on gRPC Unavailable - Add doc comments to IsWarmingUp, getter/setter, WarmupDuration - Block volume assign requests (HTTP and gRPC) during warmup, since the topology is incomplete and assignments would be unreliable - Skip warmup behavior for single-master setups (no peers to retry) * master: apply warmup to all setups, skip only on fresh start Single-master restarts still have an empty topology until heartbeats arrive, so warmup protection should apply there too. The only case to skip is a fresh cluster start (MaxVolumeId == 0), which already has no volumes to look up. - Remove GetMasterCount() > 1 guard from all warmup checks - Remove now-unused GetMasterCount helper - Update error messages to "topology is still loading" (not "retry with other masters" which doesn't apply to single-master) * master: add client-side retry on Unavailable for lookup and assign The server-side 503/Unavailable during warmup needs client cooperation. Previously, LookupVolumeIds and Assign would immediately propagate the error without retry. Now both paths retry with exponential backoff (1s -> 1.5s -> ... up to 6s) when receiving Unavailable, respecting context cancellation. This covers the warmup window where the master's topology is still loading after a restart or leader change. * master: seed warmup timestamp in legacy raft path at setup The legacy raft path only set lastLeaderChangeTime inside the event listener callback, which could fire after IsLeader() was already observed as true in SetRaftServer. Seed the timestamp at setup time (matching the hashicorp path) so IsWarmingUp() is active immediately. * master: fix assign retry loop to cover full warmup window The retry loop used waitTime <= maxWaitTime as a stop condition, causing it to give up after ~13s while warmup lasts 15s. Now cap each individual sleep at maxWaitTime but keep retrying until the context is cancelled. * master: preserve gRPC status in lookup retry and fix retry window Return the raw gRPC error instead of wrapping with fmt.Errorf so status.FromError() can extract the status code. Use proper gRPC status check (codes.Unavailable) instead of string matching. Also cap individual sleep at maxWaitTime while retrying until ctx is done. * master: use gRPC status code instead of string matching in assign retry Use status.FromError/codes.Unavailable instead of brittle strings.Contains for detecting retriable gRPC errors in the assign retry loop. * master: use remaining warmup duration for Retry-After header Set Retry-After to the remaining warmup time instead of the full warmup duration, so clients don't wait longer than necessary. * master: reset ret.Replicas before populating from assign response Clear Replicas slice before appending to prevent duplicate entries when the assign response is retried or when alternative requests are attempted. * master: add unit tests for warmup retry behavior Test that Assign() and LookupVolumeIds() retry on codes.Unavailable and stop promptly when the context is cancelled. * master: record leader change time before initialization work Move SetLastLeaderChangeTime() to fire immediately when the leader change event is received, before DoBarrier(), EnsureTopologyId(), and updatePeers(), so the warmup clock starts at the true moment of leadership transition. * master: use topology warmup duration in volume growth wait loop Replace hardcoded constants.VolumePulsePeriod * 2 with topo.IsWarmingUp() and topo.WarmupDuration() so the growth wait stays in sync with the configured warmup window. Remove unused constants import. * master: resolve master before creating RPC timeout context Move GetMaster() call before context.WithTimeout() so master resolution blocking doesn't consume the gRPC call timeout. * master: use NotFound flag instead of string matching for volume lookup Add a NotFound field to LookupResult and set it in findVolumeLocation when a volume is genuinely missing. Update HTTP and gRPC warmup checks to use this flag instead of strings.Contains on the error message. * master: bound assign retry loop to 30s for deadline-free contexts Without a context deadline, the Unavailable retry loop could spin forever. Add a maxRetryDuration of 30s so the loop gives up even when no context deadline is set. * master: strengthen assign retry cancellation test Verify the retry loop actually retried (callCount > 1) and that the returned error is context.DeadlineExceeded, not just any error. * master: extract shared retry-with-backoff utility Add util.RetryWithBackoff for context-aware, bounded retry with exponential backoff. Refactor both Assign() and LookupVolumeIds() to use it instead of duplicating the retry/sleep/backoff logic. * master: cap waitTime in RetryWithBackoff to prevent unbounded growth Cap the backoff waitTime at maxWaitTime so it doesn't grow indefinitely in long-running retry scenarios. * master: only return Unavailable during warmup when all lookups failed For batched LookupVolume requests, return partial results when some volumes are found. Only return codes.Unavailable when no volumes were successfully resolved, so clients benefit from partial results instead of retrying unnecessarily. * master: set retriable error message in 503 response body When returning 503 during warmup, replace the "not found" error in the JSON body with "service warming up, please retry" so clients don't treat it as a permanent error. * master: guard empty master address in LookupVolumeIds If GetMaster() returns empty (no master found or ctx cancelled), return an appropriate error instead of dialing an empty address. Returns ctx.Err() if context is done, otherwise codes.Unavailable to trigger retry. * master: add comprehensive tests for RetryWithBackoff Test success after retries, non-retryable error handling, context cancellation, and maxDuration cap with context.Background(). * master: enforce hard maxDuration bound in RetryWithBackoff Use a deadline instead of elapsed-time check so the last sleep is capped to remaining time. This prevents the total retry duration from overshooting maxDuration by up to one full backoff interval. * master: respect fresh-start bypass in RemainingWarmupDuration Check IsWarmingUp() first (which returns false when MaxVolumeId==0) so RemainingWarmupDuration returns 0 on fresh clusters. * master: round up Retry-After seconds to avoid underestimating Use math.Ceil so fractional remaining seconds (e.g. 1.9s) round up to the next integer (2) instead of flooring down (1). * master: tighten batch lookup warmup to all-NotFound only Only return codes.Unavailable when every requested volume ID was a transient not-found. Mixed cases with non-NotFound errors now return the response with per-volume error details preserved. * master: reduce retry log noise and fix timer leak Lower per-attempt retry log from V(0) to V(1) to reduce noise during warmup. Replace time.After with time.NewTimer to avoid lingering timers when context is cancelled. * master: add per-attempt timeout for assign RPC Use a 10s per-attempt timeout so a single slow RPC can't consume the entire 30s retry budget when ctx has no deadline. * master: share single 30s retry deadline across assign request entries The Assign() function iterates over primary and fallback requests, previously giving each its own 30s RetryWithBackoff budget. With a primary + fallback, the total could reach 60s. Compute one deadline up front and pass the remaining budget to each RetryWithBackoff call so the entire Assign() call stays within a single 30s cap. * master: strengthen context-cancel test with DeadlineExceeded and retry assertions Assert errors.Is(err, context.DeadlineExceeded) to verify the error is specifically from the context deadline, and check callCount > 1 to prove retries actually occurred before cancellation. Mirrors the pattern used in TestAssignStopsOnContextCancel. * master: bound GetMaster with per-attempt timeout in LookupVolumeIds GetMaster() calls WaitUntilConnected() which can block indefinitely if no master is available. Previously it used the outer ctx, so a slow master resolution could consume the entire RetryWithBackoff budget in a single attempt. Move the per-attempt timeoutCtx creation before the GetMaster call so both master resolution and the gRPC LookupVolume RPC share one grpcTimeout-bounded attempt. * master: use deadline-aware context for assign retry budget The shared 30s deadline only limited RetryWithBackoff's internal wall-clock tracking, but per-attempt contexts were still derived from the original ctx and could run for up to 10s even when the budget was nearly exhausted. Create a deadlineCtx from the computed deadline and derive both RetryWithBackoff and per-attempt timeouts from it so all operations honor the shared 30s cap. * master: skip warmup gate for empty lookup requests When VolumeOrFileIds is empty, notFoundCount == len(req.VolumeOrFileIds) is 0 == 0 which is true, causing empty lookup batches during warmup to return codes.Unavailable and be retried endlessly. Add a len(req.VolumeOrFileIds) > 0 guard so empty requests pass through. * master: validate request fields before warmup gate in Assign Move Replication and Ttl parsing before the IsWarmingUp() check so invalid inputs get a proper validation error instead of being masked by codes.Unavailable during warmup. Pure syntactic validation does not depend on topology state and should run first. * master: check deadline and context before starting retry attempt RetryWithBackoff only checked the deadline and context after an attempt completed or during the sleep select. If the deadline expired or context was canceled during sleep, the next iteration would still call operation() before detecting it. Add pre-operation checks so no new attempt starts after the budget is exhausted. * master: always return ctx.Err() on context cancellation in RetryWithBackoff When ctx.Err() is non-nil, the pre-operation check was returning lastErr instead of ctx.Err(). This broke callers checking errors.Is(err, context.DeadlineExceeded) and contradicted the documented contract. Always return ctx.Err() so the cancellation reason is properly surfaced. * master: handle warmup errors in StreamAssign without killing the stream StreamAssign was returning codes.Unavailable errors from Assign directly, which terminates the gRPC stream and breaks pooled connections. Instead, return transient errors as in-band error responses so the stream survives warmup periods. Also reset assignClient in doAssign on Send/Recv failures so a broken stream doesn't leave the proxy permanently dead. * master: wait for warmup before slot search in findAndGrow findEmptySlotsForOneVolume was called before the warmup wait loop, selecting slots from an incomplete topology. Move the warmup wait before slot search so volume placement uses the fully warmed-up topology with all servers registered. * master: add Retry-After header to /dir/assign warmup response The /dir/lookup handler already sets Retry-After during warmup but /dir/assign did not, leaving HTTP clients without guidance on when to retry. Add the same header using RemainingWarmupDuration(). * master: only seed warmup timestamp on leader at startup SetLastLeaderChangeTime was called unconditionally for both leader and follower nodes. Followers don't need warmup state, and the leader change event listener handles real elections. Move the seed into the IsLeader() block so only the startup leader gets warmup initialized. * master: preserve codes.Unavailable for StreamAssign warmup errors in doAssign StreamAssign returns transient warmup errors as in-band AssignResponse.Error messages. doAssign was converting these to plain fmt.Errorf, losing the codes.Unavailable classification needed for the caller's retry logic. Detect warmup error messages and wrap them as status.Error(codes.Unavailable) so RetryWithBackoff can retry.	2026-03-08 16:05:45 -07:00
Chris Lu	b08bb8237c	Fix master leader election startup issue (#8340 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * Fix master leader election startup issue This change improves server address comparison using the 'Equals' method and handles recursion in topology leader lookup, resolving the 'leader not selected yet' error during master startup. * Merge user improvements: use MaybeLeader for non-blocking checks * not useful test * Address code review: optimize Equals, fix deadlock in IsLeader, safe access in Leader	2026-02-13 15:39:39 -08:00
Chris Lu	753e1db096	Prevent split-brain: Persistent ClusterID and Join Validation (#8022 ) * Prevent split-brain: Persistent ClusterID and Join Validation - Persist ClusterId in Raft store to survive restarts. - Validate ClusterId on Raft command application (piggybacked on MaxVolumeId). - Prevent masters with conflicting ClusterIds from joining/operating together. - Update Telemetry to report the persistent ClusterId. * Refine ClusterID validation based on feedback - Improved error message in cluster_commands.go. - Added ClusterId mismatch check in RaftServer.Recovery. * Handle Raft errors and support Hashicorp Raft for ClusterId - Check for errors when persisting ClusterId in legacy Raft. - Implement ClusterId generation and persistence for Hashicorp Raft leader changes. - Ensure consistent error logging. * Refactor ClusterId validation - Centralize ClusterId mismatch check in Topology.SetClusterId. - Simplify MaxVolumeIdCommand.Apply and RaftServer.Recovery to rely on SetClusterId. * Fix goroutine leak and add timeout - Handle channel closure in Hashicorp Raft leader listener. - Add timeout to Raft Apply call to prevent blocking. * Fix deadlock in legacy Raft listener - Wrap ClusterId generation/persistence in a goroutine to avoid blocking the Raft event loop (deadlock). * Rename ClusterId to SystemId - Renamed ClusterId to SystemId across the codebase (protobuf, topology, server, telemetry). - Regenerated telemetry.pb.go with new field. * Rename SystemId to TopologyId - Rename to SystemId was intermediate step. - Final name is TopologyId for the persistent cluster identifier. - Updated protobuf, topology, raft server, master server, and telemetry. * Optimize Hashicorp Raft listener - Integrated TopologyId generation into existing monitorLeaderLoop. - Removed extra goroutine in master_server.go. * Fix optimistic TopologyId update - Removed premature local state update of TopologyId in master_server.go and raft_hashicorp.go. - State is now solely updated via the Raft state machine Apply/Restore methods after consensus. * Add explicit log for recovered TopologyId - Added glog.V(0) info log in RaftServer.Recovery to print the recovered TopologyId on startup. * Add Raft barrier to prevent TopologyId race condition - Implement ensureTopologyId helper method - Send no-op MaxVolumeIdCommand to sync Raft log before checking TopologyId - Ensures persisted TopologyId is recovered before generating new one - Prevents race where generation happens during log replay * Serialize TopologyId generation with mutex - Add topologyIdGenLock mutex to MasterServer struct - Wrap ensureTopologyId method with lock to prevent concurrent generation - Fixes race where event listener and manual leadership check both generate IDs - Second caller waits for first to complete and sees the generated ID * Add TopologyId recovery logging to Apply method - Change log level from V(1) to V(0) for visibility - Log 'Recovered TopologyId' when applying from Raft log - Ensures recovery is visible whether from snapshot or log replay - Matches Recovery() method logging for consistency * Fix Raft barrier timing issue - Add 100ms delay after barrier command to ensure log application completes - Add debug logging to track barrier execution and TopologyId state - Return early if barrier command fails - Prevents TopologyId generation before old logs are fully applied * ensure leader * address comments * address comments * redundant * clean up * double check * refactoring * comment	2026-01-18 14:02:34 -08:00
Chris Lu	7acebf11ea	Master: volume assignment concurrency (#7159 ) * volume assginment concurrency * accurate tests * ensure uniqness * reserve atomically * address comments * atomic * ReserveOneVolumeForReservation * duplicated * Update weed/topology/node.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update weed/topology/node.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * atomic counter * dedup * select the appropriate functions based on the useReservations flag --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-23 21:02:30 -07:00
Lisandro Pin	cea34dc21a	Fix implementation of `master_pb.CollectionList` RPC call (#6715 )	2025-04-16 14:28:58 -07:00
Konstantin Lebedev	e2e97db917	[master] avoid timeout when assigning for main request with filter by DC or rack (#6291 ) * avoid timeout when assigning for main request with filter by DC or rack https://github.com/seaweedfs/seaweedfs/issues/6290 * use constant NoWritableVolumes	2024-11-26 08:33:31 -08:00
Konstantin Lebedev	8836fa19b6	use ShouldGrowVolumesByDcAndRack (#6280 )	2024-11-25 09:30:37 -08:00
chrislu	ccf1795e6f	wait a bit before getting the next volume id if the leader is recently elected	2024-11-23 19:58:45 -08:00
Konstantin Lebedev	67a252ee8a	[master] refactor func ShouldGrowVolumes (#5884 )	2024-09-04 08:16:44 -07:00
chrislu	a4b25a642d	math/rand => math/rand/v2	2024-08-29 09:52:21 -07:00
Konstantin Lebedev	b2ffcdaab2	[master] do sync grow request only if absolutely necessary (#5821 ) * do sync grow request only if absolutely necessary https://github.com/seaweedfs/seaweedfs/pull/5819 * remove check VolumeGrowStrategy Threshold on PickForWrite * fix fmt.Errorf	2024-07-30 13:21:35 -07:00
wyang	4b1f539ab8	fix allocate reduplicated volumeId to different volume (#5811 ) * fix allocate reduplicated volumeId to different volume * only check barrier when read --------- Co-authored-by: Yang Wang <yangwang@weride.ai>	2024-07-26 21:48:36 -07:00
Konstantin Lebedev	04f4b10884	fix: avoid timeout if datacenter does not exist in topology (#5772 ) * fix: avoid timeout if datacenter does not exist in topology * fix: error msg * fix: rm dublicate check * fix: compare * revert minor change	2024-07-12 11:19:08 -07:00
Konstantin Lebedev	0f8e76bbd6	fix: clean metric MasterReplicaPlacementMismatch for unregister volume (#5239 )	2024-01-25 00:23:24 -08:00
chrislu	bebbc9fe44	create volume grow request if the selected volume is close to full	2023-12-27 11:45:44 -08:00
chrislu	c6b1dc7058	remove unused code	2023-12-24 11:11:41 -08:00
Konstantin Lebedev	5ee04d20fa	Healthz check for deadlocks (#4558 )	2023-06-09 09:42:48 -07:00
Stewart Miles	264be0d2d4	Retry until a leader is selected. (#4318 ) Fixes regression introduced in https://github.com/seaweedfs/seaweedfs/pull/4313 Related to #4307	2023-03-16 20:50:38 -07:00
Stewart Miles	57ab1f8516	Use exponential backoff to query leader. (#4313 ) `topology.Leader()` was using a backoff that typically resulted in at least a 5s delay when initially starting a master and raft server. This changes the backoff algorithm to use exponential backoff starting with 100ms and waiting up to 20s for leader selection. Related to #4307	2023-03-15 17:49:46 -07:00
zemul	0bf56298d5	fix chunk.ModifiedTsNs (#4264 ) * fix * fix mtime s > ns --------- Co-authored-by: zemul <zhouzemiao@ihuman.com>	2023-03-02 08:24:36 -08:00
Guo Lei	d8cfa1552b	support enable/disable vacuum (#4087 ) * stop vacuum * suspend/resume vacuum * remove unused code * rename * rename param	2022-12-28 01:36:44 -08:00
chrislu	3cb914f7e1	avoid dead lock	2022-09-10 11:26:19 -07:00
chrislu	576c113c59	replace PR https://github.com/seaweedfs/seaweedfs/pull/3621 replace https://github.com/seaweedfs/seaweedfs/pull/3621	2022-09-10 11:22:16 -07:00
Patrick Schmidt	7b424a54dc	Add raft server access mutex to avoid races (#3503 )	2022-08-24 09:49:05 -07:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
chrislu	3828b8ce87	"github.com/chrislusf/raft" => "github.com/seaweedfs/raft"	2022-07-27 12:12:40 -07:00
guol-fnst	b12944f9c6	fix naming convention notify volume server of duplicate directoris improve searching efficiency	2022-05-17 15:41:49 +08:00
guol-fnst	de6aa9cce8	avoid duplicated volume directory	2022-05-16 19:33:51 +08:00
chrislu	00c1dfec4f	go fmt	2022-05-01 23:16:29 -07:00
Chris Lu	a87f57e47c	Merge pull request #2868 from kmlebedev/hashicorp_raft hashicorp raft	2022-04-10 23:00:05 -07:00
shibinbin	c20e1edd99	fix: master lose some volumes	2022-04-07 15:18:28 +08:00
Konstantin Lebedev	14dd971890	hashicorp raft with state machine	2022-04-04 17:51:51 +05:00
chrislu	5eacff9d4f	log message adds server name address https://github.com/chrislusf/seaweedfs/issues/2514#issuecomment-995925733	2021-12-16 10:46:26 -08:00
Chris Lu	e5fc35ed0c	change server address from string to a type	2021-09-12 22:47:52 -07:00
Chris Lu	7a13816e94	refactor	2021-09-05 23:17:15 -07:00
Chris Lu	d474ce6fe3	master: avoid repeated leader redirection fix https://github.com/chrislusf/seaweedfs/issues/2146	2021-06-21 22:56:07 -07:00
Chris Lu	d2d36a3f9d	master: avoid creating too many volumes fix https://github.com/chrislusf/seaweedfs/issues/2062	2021-05-11 10:05:31 -07:00
qieqieplus	c4d32f6937	ahead of time volume assignment	2021-05-06 18:55:44 +08:00
Chris Lu	f8446b42ab	this can compile now!!!	2021-02-16 02:47:02 -08:00
Chris Lu	4bd8a692d8	disk type can be generic tags	2021-02-13 13:50:14 -08:00
Chris Lu	0d2ec832e2	rename from volumeType to diskType	2020-12-13 11:59:32 -08:00
Chris Lu	715b199eeb	fix tests	2020-12-13 04:14:50 -08:00
Chris Lu	d156c74ec0	volume server set volume type and heartbeat to the master	2020-12-13 03:11:24 -08:00
Chris Lu	e9cd798bd3	adding volume type	2020-12-13 00:58:58 -08:00
Chris Lu	c7ebadc25d	avoid possible concurrent access inside ensureCorrectWritables()	2020-11-22 17:15:59 -08:00
Chris Lu	720b1d9b88	adding locking to avoid nil VolumeLocationList fix panic: runtime error: invalid memory address or nil pointer dereference Oct 22 00:53:44 bedb-master1 weed[8055]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x17658da] Oct 22 00:53:44 bedb-master1 weed[8055]: goroutine 310 [running]: Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(VolumeLocationList).Length(...) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_location_list.go:35 Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(VolumeLayout).enoughCopies(...) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_layout.go:376 Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(VolumeLayout).ensureCorrectWritables(0xc000111d50, 0xc000b55438) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/volume_layout.go:202 +0x5a Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/topology.(Topology).SyncDataNodeRegistration(0xc00042ac60, 0xc001454d30, 0x1, 0x1, 0xc0005fc000, 0xc00135de40, 0x4, 0xc00135de50, 0x10, 0x10d, ...) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/topology/topology.go:224 +0x616 Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/server.(MasterServer).SendHeartbeat(0xc000162700, 0x23b97c0, 0xc000ae2c90, 0x0, 0x0) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/server/master_grpc_server.go:106 +0x325 Oct 22 00:53:44 bedb-master1 weed[8055]: github.com/chrislusf/seaweedfs/weed/pb/master_pb._Seaweed_SendHeartbeat_Handler(0x1f8e7c0, 0xc000162700, 0x23b0a60, 0xc00024b440, 0x3172c38, 0xc000ab7100) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/seaweedfs/weed/pb/master_pb/master.pb.go:4250 +0xad Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(Server).processStreamingRPC(0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100, 0xc0001fea80, 0x311fec0, 0x0, 0x0, 0x0) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1329 +0xcd8 Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(Server).handleStream(0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100, 0x0) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1409 +0xc5c Oct 22 00:53:44 bedb-master1 weed[8055]: google.golang.org/grpc.(Server).serveStreams.func1.1(0xc0001ce8b0, 0xc0001f31e0, 0x23bb800, 0xc000ac5500, 0xc000ab7100) Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xa5 Oct 22 00:53:44 bedb-master1 weed[8055]: created by google.golang.org/grpc.(*Server).serveStreams.func1 Oct 22 00:53:44 bedb-master1 weed[8055]: #011/root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xa5 Oct 22 00:53:44 bedb-master1 systemd[1]: weedmaster.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Oct 22 00:53:44 bedb-master1 systemd[1]: weedmaster.service: Failed with result 'exit-code'.	2020-10-21 23:15:48 -07:00
Chris Lu	152a6cbc2b	minor adjustments	2020-08-10 20:42:27 -07:00
cheng.li01	25fbff5d52	fix bug: two same volumeId in different collections 1, there will be two leader when master server startup in a few seconds 2, raft server will get a leader even there is only one master, so there is no need to do hard code to set the server to be leader	2020-08-10 16:37:47 +08:00

1 2

83 Commits