seaweedFS

Author	SHA1	Message	Date
Chris Lu	2eaf98a7a2	Use Unix sockets for gRPC in mini mode (#8856 ) * Use Unix sockets for gRPC between co-located services in mini mode In `weed mini`, all services run in one process. Previously, inter-service gRPC traffic (volume↔master, filer↔master, S3↔filer, worker↔admin, etc.) went through TCP loopback. This adds a gRPC Unix socket registry in the pb package: mini mode registers a socket path per gRPC port at startup, each gRPC server additionally listens on its socket, and GrpcDial transparently routes to the socket via WithContextDialer when a match is found. Standalone commands (weed master, weed filer, etc.) are unaffected since no sockets are registered. TCP listeners are kept for external clients. * Handle Serve error and clean up socket file in ServeGrpcOnLocalSocket Log non-expected errors from grpcServer.Serve (ignoring grpc.ErrServerStopped) and always remove the Unix socket file when Serve returns, ensuring cleanup on Stop/GracefulStop.	2026-03-30 18:18:52 -07:00
Chris Lu	8cde3d4486	Add data file compaction to iceberg maintenance (Phase 2) (#8503 ) * Add iceberg_maintenance plugin worker handler (Phase 1) Implement automated Iceberg table maintenance as a new plugin worker job type. The handler scans S3 table buckets for tables needing maintenance and executes operations in the correct Iceberg order: expire snapshots, remove orphan files, and rewrite manifests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add data file compaction to iceberg maintenance handler (Phase 2) Implement bin-packing compaction for small Parquet data files: - Enumerate data files from manifests, group by partition - Merge small files using parquet-go (read rows, write merged output) - Create new manifest with ADDED/DELETED/EXISTING entries - Commit new snapshot with compaction metadata Add 'compact' operation to maintenance order (runs before expire_snapshots), configurable via target_file_size_bytes and min_input_files thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix memory exhaustion in mergeParquetFiles by processing files sequentially Previously all source Parquet files were loaded into memory simultaneously, risking OOM when a compaction bin contained many small files. Now each file is loaded, its rows are streamed into the output writer, and its data is released before the next file is loaded — keeping peak memory proportional to one input file plus the output buffer. * Validate bucket/namespace/table names against path traversal Reject names containing '..', '/', or '\' in Execute to prevent directory traversal via crafted job parameters. * Add filer address failover in iceberg maintenance handler Try each filer address from cluster context in order instead of only using the first one. This improves resilience when the primary filer is temporarily unreachable. * Add separate MinManifestsToRewrite config for manifest rewrite threshold The rewrite_manifests operation was reusing MinInputFiles (meant for compaction bin file counts) as its manifest count threshold. Add a dedicated MinManifestsToRewrite field with its own config UI section and default value (5) so the two thresholds can be tuned independently. * Fix risky mtime fallback in orphan removal that could delete new files When entry.Attributes is nil, mtime defaulted to Unix epoch (1970), which would always be older than the safety threshold, causing the file to be treated as eligible for deletion. Skip entries with nil Attributes instead, matching the safer logic in operations.go. * Fix undefined function references in iceberg_maintenance_handler.go Use the exported function names (ShouldSkipDetectionByInterval, BuildDetectorActivity, BuildExecutorActivity) matching their definitions in vacuum_handler.go. * Remove duplicated iceberg maintenance handler in favor of iceberg/ subpackage The IcebergMaintenanceHandler and its compaction code in the parent pluginworker package duplicated the logic already present in the iceberg/ subpackage (which self-registers via init()). The old code lacked stale-plan guards, proper path normalization, CAS-based xattr updates, and error-returning parseOperations. Since the registry pattern (default "all") makes the old handler unreachable, remove it entirely. All functionality is provided by iceberg.Handler with the reviewed improvements. * Fix MinManifestsToRewrite clamping to match UI minimum of 2 The clamp reset values below 2 to the default of 5, contradicting the UI's advertised MinValue of 2. Clamp to 2 instead. * Sort entries by size descending in splitOversizedBin for better packing Entries were processed in insertion order which is non-deterministic from map iteration. Sorting largest-first before the splitting loop improves bin packing efficiency by filling bins more evenly. * Add context cancellation check to drainReader loop The row-streaming loop in drainReader did not check ctx between iterations, making long compaction merges uncancellable. Check ctx.Done() at the top of each iteration. * Fix splitOversizedBin to always respect targetSize limit The minFiles check in the split condition allowed bins to grow past targetSize when they had fewer than minFiles entries, defeating the OOM protection. Now bins always split at targetSize, and a trailing runt with fewer than minFiles entries is merged into the previous bin. * Add integration tests for iceberg table maintenance plugin worker Tests start a real weed mini cluster, create S3 buckets and Iceberg table metadata via filer gRPC, then exercise the iceberg.Handler operations (ExpireSnapshots, RemoveOrphans, RewriteManifests) against the live filer. A full maintenance cycle test runs all operations in sequence and verifies metadata consistency. Also adds exported method wrappers (testing_api.go) so the integration test package can call the unexported handler methods. * Fix splitOversizedBin dropping files and add source path to drainReader errors The runt-merge step could leave leading bins with fewer than minFiles entries (e.g. [80,80,10,10] with targetSize=100, minFiles=2 would drop the first 80-byte file). Replace the filter-based approach with an iterative merge that folds any sub-minFiles bin into its smallest neighbor, preserving all eligible files. Also add the source file path to drainReader error messages so callers can identify which Parquet file caused a read/write failure. * Harden integration test error handling - s3put: fail immediately on HTTP 4xx/5xx instead of logging and continuing - lookupEntry: distinguish NotFound (return nil) from unexpected RPC errors (fail the test) - writeOrphan and orphan creation in FullMaintenanceCycle: check CreateEntryResponse.Error in addition to the RPC error * go fmt --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:27:42 -07:00
Chris Lu	f5c35240be	Add volume dir tags and EC placement priority (#8472 ) * Add volume dir tags to topology Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add preferred tag config for EC Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prioritize EC destinations by tags Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add EC placement planner tag tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor EC placement tests to reuse buildActiveTopology Remove buildActiveTopologyWithDiskTags helper function and consolidate tag setup inline in test cases. Tests now use UpdateTopology to apply tags after topology creation, reusing the existing buildActiveTopology function rather than duplicating its logic. All tag scenario tests pass: - TestECPlacementPlannerPrefersTaggedDisks - TestECPlacementPlannerFallsBackWhenTagsInsufficient Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate normalizeTagList into shared util package Extract normalizeTagList from three locations (volume.go, detection.go, erasure_coding_handler.go) into new weed/util/tag.go as exported NormalizeTagList function. Replace all duplicate implementations with imports and calls to util.NormalizeTagList. This improves code reuse and maintainability by centralizing tag normalization logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PreferredTags to EC config persistence Add preferred_tags field to ErasureCodingTaskConfig protobuf with field number 5. Update GetConfigSpec to include preferred_tags field in the UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize from protobuf with defensive copy to prevent external mutation. This allows EC preferred tags to be persisted and restored across worker restarts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add defensive copy for Tags slice in DiskLocation Copy the incoming tags slice in NewDiskLocation instead of storing by reference. This prevents external callers from mutating the DiskLocation.Tags slice after construction, improving encapsulation and preventing unexpected changes to disk metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add doc comment to buildCandidateSets method Document the tiered candidate selection and fallback behavior. Explain that for a planner with preferredTags, it accumulates disks matching each tag in order into progressively larger tiers, emits a candidate set once a tier reaches shardsNeeded, and finally falls back to the full candidates set if preferred-tag tiers are insufficient. This clarifies the intended semantics for future maintainers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Apply final PR review fixes 1. Update parseVolumeTags to replicate single tag entry to all folders instead of leaving some folders with nil tags. This prevents nil pointer dereferences when processing folders without explicit tags. 2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match the pattern used in FromTaskPolicy, preventing external mutation of the returned TaskPolicy. 3. Add clarifying comment in buildCandidateSets explaining that the shardsNeeded <= 0 branch is a defensive check for direct callers, since selectDestinations guarantees shardsNeeded > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nil pointer dereference in parseVolumeTags Ensure all folder tags are initialized to either normalized tags or empty slices, not nil. When multiple tag entries are provided and there are more folders than entries, remaining folders now get empty slices instead of nil, preventing nil pointer dereference in downstream code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NormalizeTagList to return empty slice instead of nil Change NormalizeTagList to always return a non-nil slice. When all tags are empty or whitespace after normalization, return an empty slice instead of nil. This prevents nil pointer dereferences in downstream code that expects a valid (possibly empty) slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nil safety check for v.tags pointer Add a safety check to handle the case where v.tags might be nil, preventing a nil pointer dereference. If v.tags is nil, use an empty string instead. This is defensive programming to prevent panics in edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add volume.tags flag to weed server and weed mini commands Add the volume.tags CLI option to both the 'weed server' and 'weed mini' commands. This allows users to specify disk tags when running the combined server modes, just like they can with 'weed volume'. The flag uses the same format and description as the volume command: comma-separated tag groups per data dir with ':' separators (e.g. fast:ssd,archive). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-01 10:22:00 -08:00
Chris Lu	c106532b79	fix: prevent MiniClusterCtx race conditions in command shutdown Capture global MiniClusterCtx into local variables before goroutine/select evaluation to prevent nil dereference/data race when context is reset to nil after nil check. Applied to filer, master, volume, and s3 commands.	2026-01-28 19:42:16 -08:00
Chris Lu	01c17478ae	command: implement graceful shutdown for mini cluster - Introduce MiniClusterCtx to coordinate shutdown across mini services - Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation - Ensure all resources are cleaned up properly during test teardown - Integrate MiniClusterCtx in s3tables integration tests	2026-01-28 10:36:19 -08:00
Chris Lu	ed1da07665	Add consistent -debug and -debug.port flags to commands (#7816 ) * Add consistent -debug and -debug.port flags to commands Add -debug and -debug.port flags to weed master, weed volume, weed s3, weed mq.broker, and weed filer.sync commands for consistency with weed filer. When -debug is enabled, an HTTP server starts on the specified port (default 6060) serving runtime profiling data at /debug/pprof/. For mq.broker, replaced the older -port.pprof flag with the new -debug and -debug.port pattern for consistency. * Update weed/util/grace/pprof.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-18 17:44:36 -08:00
Chris Lu	2d06ddab41	Remove default concurrent upload/download limits for best performance (#7712 ) Change all concurrentUploadLimitMB and concurrentDownloadLimitMB defaults from fixed values (64, 128, 256 MB) to 0 (unlimited). This removes artificial throttling that can limit throughput on high-performance systems, especially on all-flash setups with many cores. Files changed: - volume.go: concurrentUploadLimitMB 256->0, concurrentDownloadLimitMB 256->0 - server.go: filer/volume/s3 concurrent limits 64/128->0 - s3.go: concurrentUploadLimitMB 128->0 - filer.go: concurrentUploadLimitMB 128->0, s3.concurrentUploadLimitMB 128->0 Users can still set explicit limits if needed for resource management.	2025-12-10 14:12:51 -08:00
msementsov	c0dad091f1	Separate vacuum speed from replication speed (#7632 )	2025-12-05 12:24:38 -08:00
Chris Lu	5ed0b00fb9	Support separate volume server ID independent of RPC bind address (#7609 ) * pb: add id field to Heartbeat message for stable volume server identification This adds an 'id' field to the Heartbeat protobuf message that allows volume servers to identify themselves independently of their IP:port address. Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * storage: add Id field to Store struct Add Id field to Store struct and include it in CollectHeartbeat(). The Id field provides a stable volume server identity independent of IP:port. Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * topology: support id-based DataNode identification Update GetOrCreateDataNode to accept an id parameter for stable node identification. When id is provided, the DataNode can maintain its identity even when its IP address changes (e.g., in Kubernetes pod reschedules). For backward compatibility: - If id is provided, use it as the node ID - If id is empty, fall back to ip:port Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * volume: add -id flag for stable volume server identity Add -id command line flag to volume server that allows specifying a stable identifier independent of the IP address. This is useful for Kubernetes deployments with hostPath volumes where pods can be rescheduled to different nodes while the persisted data remains on the original node. Usage: weed volume -id=node-1 -ip=10.0.0.1 ... If -id is not specified, it defaults to ip:port for backward compatibility. Fixes https://github.com/seaweedfs/seaweedfs/issues/7487 * server: add -volume.id flag to weed server command Support the -volume.id flag in the all-in-one 'weed server' command, consistent with the standalone 'weed volume' command. Usage: weed server -volume.id=node-1 ... Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * topology: add test for id-based DataNode identification Test the key scenarios: 1. Create DataNode with explicit id 2. Same id with different IP returns same DataNode (K8s reschedule) 3. IP/PublicUrl are updated when node reconnects with new address 4. Different id creates new DataNode 5. Empty id falls back to ip:port (backward compatibility) Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * pb: add address field to DataNodeInfo for proper node addressing Previously, DataNodeInfo.Id was used as the node address, which worked when Id was always ip:port. Now that Id can be an explicit string, we need a separate Address field for connection purposes. Changes: - Add 'address' field to DataNodeInfo protobuf message - Update ToDataNodeInfo() to populate the address field - Update NewServerAddressFromDataNode() to use Address (with Id fallback) - Fix LookupEcVolume to use dn.Url() instead of dn.Id() Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * fix: trim whitespace from volume server id and fix test - Trim whitespace from -id flag to treat ' ' as empty - Fix store_load_balancing_test.go to include id parameter in NewStore call Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * refactor: extract GetVolumeServerId to util package Move the volume server ID determination logic to a shared utility function to avoid code duplication between volume.go and rack.go. Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * fix: improve transition logic for legacy nodes - Use exact ip:port match instead of net.SplitHostPort heuristic - Update GrpcPort and PublicUrl during transition for consistency - Remove unused net import Ref: https://github.com/seaweedfs/seaweedfs/issues/7487 * fix: add id normalization and address change logging - Normalize id parameter at function boundary (trim whitespace) - Log when DataNode IP:Port changes (helps debug K8s pod rescheduling) Ref: https://github.com/seaweedfs/seaweedfs/issues/7487	2025-12-02 22:08:11 -08:00
chrislu	6201cd099e	fix help messages	2025-11-10 17:31:11 -08:00
Lisandro Pin	f466ff1412	Nit: use `time.Duration`s instead of constants in seconds. (#7438 ) Nit: use `time.Durations` instead of constants in seconds. Makes for slightly more readable code.	2025-11-04 13:02:22 -08:00
Chris Lu	5ab49e2971	Adjust cli option (#7418 ) * adjust "weed benchmark" CLI to use readOnly/writeOnly * consistently use "-master" CLI option * If both -readOnly and -writeOnly are specified, the current logic silently allows it with -writeOnly taking precedence. This is confusing and could lead to unexpected behavior.	2025-10-31 17:08:00 -07:00
zuzuviewer	8fa1a69f8c	* Fix undefined http serve behaiver (#6943 )	2025-07-07 22:48:12 -07:00
Konstantin Lebedev	93007c1842	[volume] refactor and add metrics for flight upload and download data limit condition (#6920 ) * refactor concurrentDownloadLimit * fix loop * fix cmdServer * fix: resolve conversation pr 6920 * Changes logging function (#6919) * updated logging methods for stores * updated logging methods for stores * updated logging methods for filer * updated logging methods for uploader and http_util * updated logging methods for weed server --------- Co-authored-by: akosov <a.kosov@kryptonite.ru> * Improve lock ring (#6921) * fix flaky lock ring test * add more tests * fix: build * fix: rm import util/version * fix: serverOptions * refactoring --------- Co-authored-by: Aleksey Kosov <rusyak777@list.ru> Co-authored-by: akosov <a.kosov@kryptonite.ru> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2025-07-02 18:03:49 -07:00
chrislu	bd4891a117	change version directory	2025-06-03 22:46:10 -07:00
Konstantin Lebedev	b65eb2ec45	[security] reload whiteList on http seerver (#6302 ) * reload whiteList * white_list add to scaffold	2024-12-02 10:38:10 -08:00
vadimartynov	b796c21fa9	Added loadSecurityConfigOnce (#5792 )	2024-07-16 09:15:55 -07:00
vadimartynov	ec9e7493b3	-metricsIp cmd flag (#5773 ) * Added/Updated: - Added metrics ip options for all servers; - Fixed a bug with the selection of the binIp or ip parameter for the metrics handler; * Fixed cmd flags	2024-07-12 10:56:26 -07:00
Konstantin Lebedev	2b3e39397e	fix: skipping checking active volumes with the same number of files at the moment (#4893 ) * fix: skipping checking active volumes with the same number of files at the moment https://github.com/seaweedfs/seaweedfs/issues/4140 * refactor with comments https://github.com/seaweedfs/seaweedfs/issues/4140 * add TestShouldSkipVolume --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	2023-10-09 09:57:26 -07:00
Jiffs Maverick	4b0430e71d	[metrics] Add the ability to control bind ip (#4012 )	2022-11-24 10:22:59 -08:00
Guo Lei	5b905fb2b7	Lazy loading (#3958 ) * types packages is imported more than onece * lazy-loading * fix bugs * fix bugs * fix unit tests * fix test error * rename function * unload ldb after initial startup * Don't load ldb when starting volume server if ldbtimeout is set. * remove uncessary unloadldb * Update weed/command/server.go Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> * Update weed/command/volume.go Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: guol-fnst <goul-fnst@fujitsu.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2022-11-14 00:19:27 -08:00
famosss	cacc3e883b	volume server:set the default value of "hasSlowRead" to true (#3710 ) * simplify a bit * feat: volume: add "readBufSize" option to customize read optimization * refactor : redbufSIze -> readBufferSize * simplify a bit * simplify a bit * volume server:set the default value of "hasSlowRead" to true	2022-10-12 21:13:26 -07:00
Konstantin Lebedev	301b678147	[volume] Add new volumes to HUP(reload) signal (#3755 ) Add new volumes to HUP(reload) signal	2022-09-28 12:44:13 -07:00
chrislu	10d5b4b32b	volume server: rename readBufferSize to readBufferSizeMB	2022-09-17 10:56:28 -07:00
famosss	d949a238b8	volume: add "readBufSize" option to customize read optimization (#3702 ) * simplify a bit * feat: volume: add "readBufSize" option to customize read optimization * refactor : redbufSIze -> readBufferSize * simplify a bit * simplify a bit	2022-09-16 00:30:40 -07:00
chrislu	cf90f76a35	mark "hasSlowRead" as experimental	2022-09-15 23:33:46 -07:00
chrislu	896a85d6e4	volume: add "hasSlowRead" option to customize read optimization	2022-09-15 03:11:32 -07:00
chrislu	9b084d4c88	purge tcp implementation	2022-09-08 18:03:43 -07:00
chrislu	3f3a1341d8	make CodeQL happy	2022-08-26 17:09:11 -07:00
chrislu	67814a5c79	refactor and fix strings.Split	2022-08-07 01:34:32 -07:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
liubaojiang	076e48a676	add inflight upload data wait timeout	2022-05-21 10:38:08 +08:00
guol-fnst	076595fbdd	just exit in case of duplicated volume directories were loaded	2022-05-17 15:41:49 +08:00
Berck Nash	9b14f0c81a	Add mTLS support for both master and volume http server.	2022-03-16 09:52:17 -06:00
chrislu	3a6eb8ca5f	default bind to one ip address fix https://github.com/chrislusf/seaweedfs/issues/1937	2022-03-11 14:02:39 -08:00
chrislu	18543c6e8b	minor	2022-02-27 23:11:09 -08:00
Konstantin Lebedev	526094d2da	StopTimeout 30 sec	2022-02-14 21:42:27 +05:00
Konstantin Lebedev	275e9a4e86	reduce to default http server KillTimeout and StopTimeout	2022-02-14 21:38:24 +05:00
chrislu	8907e6a40a	add more help messages	2022-01-13 13:03:04 -08:00
Chris Lu	52fe86df45	use default 10000 for grpc port	2021-09-20 14:05:59 -07:00
Chris Lu	e5fc35ed0c	change server address from string to a type	2021-09-12 22:47:52 -07:00
Chris Lu	e690a2be16	custom grpc port: volume server	2021-09-12 02:25:15 -07:00
Chris Lu	574485ec69	better IP v6 support	2021-09-07 19:29:42 -07:00
Chris Lu	734c980040	volume: support concurrent download data size limit	2021-08-08 23:25:16 -07:00
Chris Lu	49c66e88a0	volume: change all writes to fsync during graceful stopping fix https://github.com/chrislusf/seaweedfs/issues/2193	2021-07-13 01:29:57 -07:00
Chris Lu	5bcc77b46c	volume: default readMode to proxy	2021-07-03 15:55:56 -07:00
Chris Lu	b624090398	go fmt	2021-07-01 01:21:14 -07:00
zhangsong	20d33ae025	add proxy mode to read non-local volumes	2021-06-30 18:33:18 +08:00
zhangsong	7566782c2e	add proxy mode to read non-local volumes	2021-06-30 17:28:37 +08:00
bingoohuang	cf552417a7	minFreeSpace refactored	2021-04-27 10:37:24 +08:00

1 2 3

136 Commits