seaweedFS

Author	SHA1	Message	Date
Chris Lu	e10f11b480	opt: reduce ShardsInfo memory usage with bitmap and sorted slice (#7974 ) * opt: reduce ShardsInfo memory usage with bitmap and sorted slice - Replace map[ShardId]ShardInfo with sorted []ShardInfo slice - Add ShardBits (uint32) bitmap for O(1) existence checks - Use binary search for O(log n) lookups by shard ID - Maintain sorted order for efficient iteration - Add comprehensive unit tests and benchmarks Memory savings: - Map overhead: ~48 bytes per entry eliminated - Pointers: 8 bytes per entry eliminated - Total: ~56 bytes per shard saved Performance improvements: - Has(): O(1) using bitmap - Size(): O(log n) using binary search (was O(1), acceptable tradeoff) - Count(): O(1) using popcount on bitmap - Iteration: Faster due to cache locality refactor: add methods to ShardBits type - Add Has(), Set(), Clear(), and Count() methods to ShardBits - Simplify ShardsInfo methods by using ShardBits methods - Improves code readability and encapsulation * opt: use ShardBits directly in ShardsCountFromVolumeEcShardInformationMessage Avoid creating a full ShardsInfo object just to count shards. Directly cast vi.EcIndexBits to ShardBits and use Count() method. * opt: use strings.Builder in ShardsInfo.String() for efficiency * refactor: change AsSlice to return []ShardInfo (values instead of pointers) This completes the memory optimization by avoiding unnecessary pointer slices and potential allocations. * refactor: rename ShardsCountFromVolumeEcShardInformationMessage to GetShardCount * fix: prevent deadlock in Add and Subtract methods Copy shards data from 'other' before releasing its lock to avoid potential deadlock when a.Add(b) and b.Add(a) are called concurrently. The previous implementation held other's lock while calling si.Set/Delete, which acquires si's lock. This could deadlock if two goroutines tried to add/subtract each other concurrently. * opt: avoid unnecessary locking in constructor functions ShardsInfoFromVolume and ShardsInfoFromVolumeEcShardInformationMessage now build shards slice and bitmap directly without calling Set(), which acquires a lock on every call. Since the object is local and not yet shared, locking is unnecessary and adds overhead. This improves performance during object construction. * fix: rename 'copy' variable to avoid shadowing built-in function The variable name 'copy' in TestShardsInfo_Copy shadowed the built-in copy() function, which is confusing and bad practice. Renamed to 'siCopy'. * opt: use math/bits.OnesCount32 and reorganize types 1. Replace manual popcount loop with math/bits.OnesCount32 for better performance and idiomatic Go code 2. Move ShardSize type definition to ec_shards_info.go for better code organization since it's primarily used there * refactor: Set() now accepts ShardInfo for future extensibility Changed Set(id ShardId, size ShardSize) to Set(shard ShardInfo) to support future additions to ShardInfo without changing the API. This makes the code more extensible as new fields can be added to ShardInfo (e.g., checksum, location, etc.) without breaking the Set API. * refactor: move ShardInfo and ShardSize to separate file Created ec_shard_info.go to hold the basic shard types (ShardInfo and ShardSize) for better code organization and separation of concerns. * refactor: add ShardInfo constructor and helper functions Added NewShardInfo() constructor and IsValid() method to better encapsulate ShardInfo creation and validation. Updated code to use the constructor for cleaner, more maintainable code. * fix: update remaining Set() calls to use NewShardInfo constructor Fixed compilation errors in storage and shell packages where Set() calls were not updated to use the new NewShardInfo() constructor. * fix: remove unreachable code in filer backup commands Removed unreachable return statements after infinite loops in filer_backup.go and filer_meta_backup.go to fix compilation errors. * fix: rename 'new' variable to avoid shadowing built-in Renamed 'new' to 'result' in MinusParityShards, Plus, and Minus methods to avoid shadowing Go's built-in new() function. * fix: update remaining test files to use NewShardInfo constructor Fixed Set() calls in command_volume_list_test.go and ec_rebalance_slots_test.go to use NewShardInfo() constructor.	2026-01-06 00:09:52 -08:00
Lisandro Pin	6b98b52acc	Fix reporting of EC shard sizes from nodes to masters. (#7835 ) SeaweedFS tracks EC shard sizes on topology data stuctures, but this information is never relayed to master servers :( The end result is that commands reporting disk usage, such as `volume.list` and `cluster.status`, yield incorrect figures when EC shards are present. As an example for a simple 5-node test cluster, before... ``` > volume.list Topology volumeSizeLimit:30000 MB hdd(volume:6/40 active:6 free:33 remote:0) DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9001 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:3 size:88967096 file_count:172 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[1 5] Disk hdd total size:88967096 file_count:172 DataNode 192.168.10.111:9001 total size:88967096 file_count:172 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9002 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:77267536 file_count:166 replica_placement:2 version:3 modified_at_second:1766349617 volume id:3 size:88967096 file_count:172 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[0 4] Disk hdd total size:166234632 file_count:338 DataNode 192.168.10.111:9002 total size:166234632 file_count:338 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9003 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:2 size:77267536 file_count:166 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[2 6] Disk hdd total size:77267536 file_count:166 DataNode 192.168.10.111:9003 total size:77267536 file_count:166 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9004 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:77267536 file_count:166 replica_placement:2 version:3 modified_at_second:1766349617 volume id:3 size:88967096 file_count:172 replica_placement:2 version:3 modified_at_second:1766349617 ec volume id:1 collection: shards:[3 7] Disk hdd total size:166234632 file_count:338 DataNode 192.168.10.111:9004 total size:166234632 file_count:338 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9005 hdd(volume:0/8 active:0 free:8 remote:0) Disk hdd(volume:0/8 active:0 free:8 remote:0) id:0 ec volume id:1 collection: shards:[8 9 10 11 12 13] Disk hdd total size:0 file_count:0 Rack DefaultRack total size:498703896 file_count:1014 DataCenter DefaultDataCenter total size:498703896 file_count:1014 total size:498703896 file_count:1014 ``` ...and after: ``` > volume.list Topology volumeSizeLimit:30000 MB hdd(volume:6/40 active:6 free:33 remote:0) DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9001 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:2 size:81761800 file_count:161 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[1 5 9] sizes:[1:8.00 MiB 5:8.00 MiB 9:8.00 MiB] total:24.00 MiB Disk hdd total size:81761800 file_count:161 DataNode 192.168.10.111:9001 total size:81761800 file_count:161 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9002 hdd(volume:1/8 active:1 free:7 remote:0) Disk hdd(volume:1/8 active:1 free:7 remote:0) id:0 volume id:3 size:88678712 file_count:170 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[11 12 13] sizes:[11:8.00 MiB 12:8.00 MiB 13:8.00 MiB] total:24.00 MiB Disk hdd total size:88678712 file_count:170 DataNode 192.168.10.111:9002 total size:88678712 file_count:170 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9003 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:81761800 file_count:161 replica_placement:2 version:3 modified_at_second:1766349495 volume id:3 size:88678712 file_count:170 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[0 4 8] sizes:[0:8.00 MiB 4:8.00 MiB 8:8.00 MiB] total:24.00 MiB Disk hdd total size:170440512 file_count:331 DataNode 192.168.10.111:9003 total size:170440512 file_count:331 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9004 hdd(volume:2/8 active:2 free:6 remote:0) Disk hdd(volume:2/8 active:2 free:6 remote:0) id:0 volume id:2 size:81761800 file_count:161 replica_placement:2 version:3 modified_at_second:1766349495 volume id:3 size:88678712 file_count:170 replica_placement:2 version:3 modified_at_second:1766349495 ec volume id:1 collection: shards:[2 6 10] sizes:[2:8.00 MiB 6:8.00 MiB 10:8.00 MiB] total:24.00 MiB Disk hdd total size:170440512 file_count:331 DataNode 192.168.10.111:9004 total size:170440512 file_count:331 DataCenter DefaultDataCenter hdd(volume:6/40 active:6 free:33 remote:0) Rack DefaultRack hdd(volume:6/40 active:6 free:33 remote:0) DataNode 192.168.10.111:9005 hdd(volume:0/8 active:0 free:8 remote:0) Disk hdd(volume:0/8 active:0 free:8 remote:0) id:0 ec volume id:1 collection: shards:[3 7] sizes:[3:8.00 MiB 7:8.00 MiB] total:16.00 MiB Disk hdd total size:0 file_count:0 Rack DefaultRack total size:511321536 file_count:993 DataCenter DefaultDataCenter total size:511321536 file_count:993 total size:511321536 file_count:993 ```	2025-12-28 19:30:42 -08:00
Chris Lu	32a9a1f46f	fix: sync EC volume files before copying to fix deleted needles not being marked when decoding (#7755 ) * fix: sync EC volume files before copying to fix deleted needles not being marked when decoding (#7751) When a file is deleted from an EC volume, the deletion is written to both the .ecx and .ecj files. However, these writes were not synced to disk before the files were copied during ec.decode. This caused the copied files to miss the deletion markers, resulting in 'leaked' space where deleted files were not properly tracked after decoding. This fix: 1. Adds a Sync() method to EcVolume that flushes .ecx and .ecj files to disk without closing them 2. Calls Sync() in CopyFile before copying EC volume files to ensure all deletions are visible to the copy operation Fixes #7751 * test: add integration tests for EC volume deletion sync (issue #7751) Add comprehensive tests to verify that deleted needles are properly visible after EcVolume.Sync() is called. These tests cover: 1. TestWriteIdxFileFromEcIndex_PreservesDeletedNeedles - Verifies that WriteIdxFileFromEcIndex preserves deletion markers from .ecx files when generating .idx files 2. TestWriteIdxFileFromEcIndex_ProcessesEcjJournal - Verifies that deletions from .ecj journal file are correctly appended to the generated .idx file 3. TestEcxFileDeletionVisibleAfterSync - Verifies that MarkNeedleDeleted changes are visible after Sync() 4. TestEcxFileDeletionWithSeparateHandles - Tests that synced changes are visible across separate file handles 5. TestEcVolumeSyncEnsuresDeletionsVisible - Integration test for the full EcVolume.DeleteNeedleFromEcx + Sync() workflow that validates the fix for issue #7751 * refactor: log sync errors in EcVolume.Sync() instead of ignoring them Per code review feedback: sync errors could reintroduce the bug this PR fixes, so logging warnings helps with debugging.	2025-12-14 21:26:05 -08:00
chrislu	b7ba6785a2	go fmt	2025-10-27 23:04:55 -07:00
Chris Lu	208d7f24f4	Erasure Coding: Ec refactoring (#7396 ) * refactor: add ECContext structure to encapsulate EC parameters - Create ec_context.go with ECContext struct - NewDefaultECContext() creates context with default 10+4 configuration - Helper methods: CreateEncoder(), ToExt(), String() - Foundation for cleaner function signatures - No behavior change, still uses hardcoded 10+4 * refactor: update ec_encoder.go to use ECContext - Add WriteEcFilesWithContext() and RebuildEcFilesWithContext() functions - Keep old functions for backward compatibility (call new versions) - Update all internal functions to accept ECContext parameter - Use ctx.DataShards, ctx.ParityShards, ctx.TotalShards consistently - Use ctx.CreateEncoder() instead of hardcoded reedsolomon.New() - Use ctx.ToExt() for shard file extensions - No behavior change, still uses default 10+4 configuration * refactor: update ec_volume.go to use ECContext - Add ECContext field to EcVolume struct - Initialize ECContext with default configuration in NewEcVolume() - Update LocateEcShardNeedleInterval() to use ECContext.DataShards - Phase 1: Always uses default 10+4 configuration - No behavior change * refactor: add EC shard count fields to VolumeInfo protobuf - Add data_shards_count field (field 8) to VolumeInfo message - Add parity_shards_count field (field 9) to VolumeInfo message - Fields are optional, 0 means use default (10+4) - Backward compatible: fields added at end - Phase 1: Foundation for future customization * refactor: regenerate protobuf Go files with EC shard count fields - Regenerated volume_server_pb/.go with new EC fields - DataShardsCount and ParityShardsCount accessors added to VolumeInfo - No behavior change, fields not yet used refactor: update VolumeEcShardsGenerate to use ECContext - Create ECContext with default configuration in VolumeEcShardsGenerate - Use ecCtx.TotalShards and ecCtx.ToExt() in cleanup - Call WriteEcFilesWithContext() instead of WriteEcFiles() - Save EC configuration (DataShardsCount, ParityShardsCount) to VolumeInfo - Log EC context being used - Phase 1: Always uses default 10+4 configuration - No behavior change * fmt * refactor: update ec_test.go to use ECContext - Update TestEncodingDecoding to create and use ECContext - Update validateFiles() to accept ECContext parameter - Update removeGeneratedFiles() to use ctx.TotalShards and ctx.ToExt() - Test passes with default 10+4 configuration * refactor: use EcShardConfig message instead of separate fields * optimize: pre-calculate row sizes in EC encoding loop * refactor: replace TotalShards field with Total() method - Remove TotalShards field from ECContext to avoid field drift - Add Total() method that computes DataShards + ParityShards - Update all references to use ctx.Total() instead of ctx.TotalShards - Read EC config from VolumeInfo when loading EC volumes - Read data shard count from .vif in VolumeEcShardsToVolume - Use >= instead of > for exact boundary handling in encoding loops * optimize: simplify VolumeEcShardsToVolume to use existing EC context - Remove redundant CollectEcShards call - Remove redundant .vif file loading - Use v.ECContext.DataShards directly (already loaded by NewEcVolume) - Slice tempShards instead of collecting again * refactor: rename MaxShardId to MaxShardCount for clarity - Change from MaxShardId=31 to MaxShardCount=32 - Eliminates confusing +1 arithmetic (MaxShardId+1) - More intuitive: MaxShardCount directly represents the limit fix: support custom EC ratios beyond 14 shards in VolumeEcShardsToVolume - Add MaxShardId constant (31, since ShardBits is uint32) - Use MaxShardId+1 (32) instead of TotalShardsCount (14) for tempShards buffer - Prevents panic when slicing for volumes with >14 total shards - Critical fix for custom EC configurations like 20+10 * fix: add validation for EC shard counts from VolumeInfo - Validate DataShards/ParityShards are positive and within MaxShardCount - Prevent zero or invalid values that could cause divide-by-zero - Fallback to defaults if validation fails, with warning log - VolumeEcShardsGenerate now preserves existing EC config when regenerating - Critical safety fix for corrupted or legacy .vif files * fix: RebuildEcFiles now loads EC config from .vif file - Critical: RebuildEcFiles was always using default 10+4 config - Now loads actual EC config from .vif file when rebuilding shards - Validates config before use (positive shards, within MaxShardCount) - Falls back to default if .vif missing or invalid - Prevents data corruption when rebuilding custom EC volumes * add: defensive validation for dataShards in VolumeEcShardsToVolume - Validate dataShards > 0 and <= MaxShardCount before use - Prevents panic from corrupted or uninitialized ECContext - Returns clear error message instead of panic - Defense-in-depth: validates even though upstream should catch issues * fix: replace TotalShardsCount with MaxShardCount for custom EC ratio support Critical fixes to support custom EC ratios > 14 shards: disk_location_ec.go: - validateEcVolume: Check shards 0-31 instead of 0-13 during validation - removeEcVolumeFiles: Remove shards 0-31 instead of 0-13 during cleanup ec_volume_info.go ShardBits methods: - ShardIds(): Iterate up to MaxShardCount (32) instead of TotalShardsCount (14) - ToUint32Slice(): Iterate up to MaxShardCount (32) - IndexToShardId(): Iterate up to MaxShardCount (32) - MinusParityShards(): Remove shards 10-31 instead of 10-13 (added note about Phase 2) - Minus() shard size copy: Iterate up to MaxShardCount (32) - resizeShardSizes(): Iterate up to MaxShardCount (32) Without these changes: - Custom EC ratios > 14 total shards would fail validation on startup - Shards 14-31 would never be discovered or cleaned up - ShardBits operations would miss shards >= 14 These changes are backward compatible - MaxShardCount (32) includes the default TotalShardsCount (14), so existing 10+4 volumes work as before. * fix: replace TotalShardsCount with MaxShardCount in critical data structures Critical fixes for buffer allocations and loops that must support custom EC ratios up to 32 shards: Data Structures: - store_ec.go:354: Buffer allocation for shard recovery (bufs array) - topology_ec.go:14: EcShardLocations.Locations fixed array size - command_ec_rebuild.go:268: EC shard map allocation - command_ec_common.go:626: Shard-to-locations map allocation Shard Discovery Loops: - ec_task.go:378: Loop to find generated shard files - ec_shard_management.go: All 8 loops that check/count EC shards These changes are critical because: 1. Buffer allocations sized to 14 would cause index-out-of-bounds panics when accessing shards 14-31 2. Fixed arrays sized to 14 would truncate shard location data 3. Loops limited to 0-13 would never discover/manage shards 14-31 Note: command_ec_encode.go:208 intentionally NOT changed - it creates shard IDs to mount after encoding. In Phase 1 we always generate 14 shards, so this remains TotalShardsCount and will be made dynamic in Phase 2 based on actual EC context. Without these fixes, custom EC ratios > 14 total shards would cause: - Runtime panics (array index out of bounds) - Data loss (shards 14-31 never discovered/tracked) - Incomplete shard management (missing shards not detected) * refactor: move MaxShardCount constant to ec_encoder.go Moved MaxShardCount from ec_volume_info.go to ec_encoder.go to group it with other shard count constants (DataShardsCount, ParityShardsCount, TotalShardsCount). This improves code organization and makes it easier to understand the relationship between these constants. Location: ec_encoder.go line 22, between TotalShardsCount and MinTotalDisks * improve: add defensive programming and better error messages for EC Code review improvements from CodeRabbit: 1. ShardBits Guardrails (ec_volume_info.go): - AddShardId, RemoveShardId: Reject shard IDs >= MaxShardCount - HasShardId: Return false for out-of-range shard IDs - Prevents silent no-ops from bit shifts with invalid IDs 2. Future-Proof Regex (disk_location_ec.go): - Updated regex from \.ec[0-9][0-9] to \.ec\d{2,3} - Now matches .ec00 through .ec999 (currently .ec00-.ec31 used) - Supports future increases to MaxShardCount beyond 99 3. Better Error Messages (volume_grpc_erasure_coding.go): - Include valid range (1..32) in dataShards validation error - Helps operators quickly identify the problem 4. Validation Before Save (volume_grpc_erasure_coding.go): - Validate ECContext (DataShards > 0, ParityShards > 0, Total <= MaxShardCount) - Log EC config being saved to .vif for debugging - Prevents writing invalid configs to disk These changes improve robustness and debuggability without changing core functionality. * fmt * fix: critical bugs from code review + clean up comments Critical bug fixes: 1. command_ec_rebuild.go: Fixed indentation causing compilation error - Properly nested if/for blocks in registerEcNode 2. ec_shard_management.go: Fixed isComplete logic incorrectly using MaxShardCount - Changed from MaxShardCount (32) back to TotalShardsCount (14) - Default 10+4 volumes were being incorrectly reported as incomplete - Missing shards 14-31 were being incorrectly reported as missing - Fixed in 4 locations: volume completeness checks and getMissingShards 3. ec_volume_info.go: Fixed MinusParityShards removing too many shards - Changed from MaxShardCount (32) back to TotalShardsCount (14) - Was incorrectly removing shard IDs 10-31 instead of just 10-13 Comment cleanup: - Removed Phase 1/Phase 2 references (development plan context) - Replaced with clear statements about default 10+4 configuration - SeaweedFS repo uses fixed 10+4 EC ratio, no phases needed Root cause: Over-aggressive replacement of TotalShardsCount with MaxShardCount. MaxShardCount (32) is the limit for buffer allocations and shard ID loops, but TotalShardsCount (14) must be used for default EC configuration logic. * fix: add defensive bounds checks and compute actual shard counts Critical fixes from code review: 1. topology_ec.go: Add defensive bounds checks to AddShard/DeleteShard - Prevent panic when shardId >= MaxShardCount (32) - Return false instead of crashing on out-of-range shard IDs 2. command_ec_common.go: Fix doBalanceEcShardsAcrossRacks - Was using hardcoded TotalShardsCount (14) for all volumes - Now computes actual totalShardsForVolume from rackToShardCount - Fixes incorrect rebalancing for volumes with custom EC ratios - Example: 5+2=7 shards would incorrectly use 14 as average These fixes improve robustness and prepare for future custom EC ratios without changing current behavior for default 10+4 volumes. Note: MinusParityShards and ec_task.go intentionally NOT changed for seaweedfs repo - these will be enhanced in seaweed-enterprise repo where custom EC ratio configuration is added. * fmt * style: make MaxShardCount type casting explicit in loops Improved code clarity by explicitly casting MaxShardCount to the appropriate type when used in loop comparisons: - ShardId comparisons: Cast to ShardId(MaxShardCount) - uint32 comparisons: Cast to uint32(MaxShardCount) Changed in 5 locations: - Minus() loop (line 90) - ShardIds() loop (line 143) - ToUint32Slice() loop (line 152) - IndexToShardId() loop (line 219) - resizeShardSizes() loop (line 248) This makes the intent explicit and improves type safety readability. No functional changes - purely a style improvement.	2025-10-27 22:13:31 -07:00
Chris Lu	b4d9618efc	volume server UI: fix ec volume ui (#7104 ) * fix ec volume ui * Update weed/storage/erasure_coding/ec_volume.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-07 00:07:03 -07:00
Chris Lu	9d013ea9b8	Admin UI: include ec shard sizes into volume server info (#7071 ) * show ec shards on dashboard, show max in its own column * master collect shard size info * master send shard size via VolumeList * change to more efficient shard sizes slice * include ec shard sizes into volume server info * Eliminated Redundant gRPC Calls * much more efficient * Efficient Counting: bits.OnesCount32() uses CPU-optimized instructions to count set bits in O(1) * avoid extra volume list call * simplify * preserve existing shard sizes * avoid hard coded value * Update weed/storage/erasure_coding/ec_volume_info.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/admin/dash/volume_management.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update ec_volume_info.go * address comments * avoid duplicated functions * Update weed/admin/dash/volume_management.go Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * simplify * refactoring * fix compilation --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-02 02:16:49 -07:00
Chris Lu	891a2fb6eb	Admin: misc improvements on admin server and workers. EC now works. (#7055 ) * initial design * added simulation as tests * reorganized the codebase to move the simulation framework and tests into their own dedicated package * integration test. ec worker task * remove "enhanced" reference * start master, volume servers, filer Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready * generate write load * tasks are assigned * admin start wtih grpc port. worker has its own working directory * Update .gitignore * working worker and admin. Task detection is not working yet. * compiles, detection uses volumeSizeLimitMB from master * compiles * worker retries connecting to admin * build and restart * rendering pending tasks * skip task ID column * sticky worker id * test canScheduleTaskNow * worker reconnect to admin * clean up logs * worker register itself first * worker can run ec work and report status but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted. * move ec task logic * listing ec shards * local copy, ec. Need to distribute. * ec is mostly working now * distribution of ec shards needs improvement * need configuration to enable ec * show ec volumes * interval field UI component * rename * integration test with vauuming * garbage percentage threshold * fix warning * display ec shard sizes * fix ec volumes list * Update ui.go * show default values * ensure correct default value * MaintenanceConfig use ConfigField * use schema defined defaults * config * reduce duplication * refactor to use BaseUIProvider * each task register its schema * checkECEncodingCandidate use ecDetector * use vacuumDetector * use volumeSizeLimitMB * remove remove * remove unused * refactor * use new framework * remove v2 reference * refactor * left menu can scroll now * The maintenance manager was not being initialized when no data directory was configured for persistent storage. * saving config * Update task_config_schema_templ.go * enable/disable tasks * protobuf encoded task configurations * fix system settings * use ui component * remove logs * interface{} Reduction * reduce interface{} * reduce interface{} * avoid from/to map * reduce interface{} * refactor * keep it DRY * added logging * debug messages * debug level * debug * show the log caller line * use configured task policy * log level * handle admin heartbeat response * Update worker.go * fix EC rack and dc count * Report task status to admin server * fix task logging, simplify interface checking, use erasure_coding constants * factor in empty volume server during task planning * volume.list adds disk id * track disk id also * fix locking scheduled and manual scanning * add active topology * simplify task detector * ec task completed, but shards are not showing up * implement ec in ec_typed.go * adjust log level * dedup * implementing ec copying shards and only ecx files * use disk id when distributing ec shards 🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned * Delete original volume from all locations * clean up existing shard locations * local encoding and distributing * Update docker/admin_integration/EC-TESTING-README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * check volume id range * simplify * fix tests * fix types * clean up logs and tests --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-30 12:38:03 -07:00
Chris Lu	69553e5ba6	convert error fromating to %w everywhere (#6995 )	2025-07-16 23:39:27 -07:00
Lisandro Pin	dddb0f0ae5	Fix update of `SeaweedFS_volumeServer_volumes` gauge metrics when EC shards are unmounted (#6776 )	2025-05-09 10:15:34 -07:00
Quentin D.	2ae5b480a6	Use the correct constant when computing the offset in SearchNeedleFromSortedIndex (#6771 ) NeedleHeaderSize happen to have the same size as NeedleMapEntrySize, except when running the 5 bytes offset variant of Seaweedfs, because it does not contain OffsetSize. This causes ECX corruption on deletes, due to the drifting offset computation (offset is always computed on a basis of 16 bytes per record instead of 17 bytes) Signed-off-by: Quentin Devos <4972091+Okhoshi@users.noreply.github.com>	2025-05-09 08:47:53 -07:00
chrislu	ec155022e7	"golang.org/x/exp/slices" => "slices" and go fmt	2024-12-19 19:25:06 -08:00
chrislu	c9f3448692	ReadAt may return io.EOF t end of file related to https://github.com/seaweedfs/seaweedfs/issues/6219	2024-11-21 00:37:38 -08:00
chrislu	ae5bd0667a	rename proto field from DestroyTime to expire_at_sec For TTL volume converted into EC volume, this change may leave the volumes staying.	2024-10-24 21:35:11 -07:00
augustazz	0b00706454	EC volume supports expiration and displays expiration message when executing volume.list (#5895 ) * ec volume expire * volume.list show DestroyTime * comments * code optimization --------- Co-authored-by: xuwenfeng <xuwenfeng1@zto.com>	2024-08-16 00:20:00 -07:00
chrislu	fdf7193ae7	rename	2024-08-13 13:59:24 -07:00
chrislu	07f4998188	add dat file size into vif for EC	2024-08-13 13:56:00 -07:00
chrislu	645ae8c57b	Revert "Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs "" This reverts commit `8cb42c39`	2023-09-25 09:35:16 -07:00
chrislu	8cb42c39ad	Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs " This reverts commit `2e5aa06026`, reversing changes made to `4d414f54a2`.	2023-09-18 16:12:50 -07:00
dependabot[bot]	a04bd4d26f	Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850 ) * Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/rclone/rclone/releases) - [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md) - [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0) --- updated-dependencies: - dependency-name: github.com/rclone/rclone dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * API changes * go mod --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>	2023-09-18 14:43:05 -07:00
Nikita Mochalov	e6a49dc533	Fix resource leaks (#4737 ) * Fix division by zero * Fix file handle leak * Fix file handle leak * Fix file handle leak * Fix goroutine leak	2023-08-09 15:30:36 -07:00
Konstantin Lebedev	1f7e52c63e	vacuum metrics and force sync dst files (#3832 )	2022-10-13 00:51:20 -07:00
Eric Yang	b324a6536c	ADHOC: add read needle meta grpc (#3581 ) * ADHOC: add read needle meta grpc * add test * nit Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>	2022-09-06 23:51:27 -07:00
chrislu	26dbc6c905	move to https://github.com/seaweedfs/seaweedfs	2022-07-29 00:17:28 -07:00
justin	3551ca2fcf	enhancement: replace sort.Slice with slices.SortFunc to reduce reflection	2022-04-18 10:35:43 +08:00
Chris Lu	e5fc35ed0c	change server address from string to a type	2021-09-12 22:47:52 -07:00
Chris Lu	05a648bb96	refactor: separating out remote.proto	2021-08-26 15:18:34 -07:00
Chris Lu	828f6e9f4d	volume: auto add missing vif files fix https://github.com/chrislusf/seaweedfs/issues/1878	2021-03-09 12:09:32 -08:00
Chris Lu	f8446b42ab	this can compile now!!!	2021-02-16 02:47:02 -08:00
bingoohuang	7256902fb0	fix typo offset.ToAcutalOffset to offset.ToActualOffset	2021-02-07 12:11:51 +08:00
Chris Lu	6d30b21b10	volume: add "-dir.idx" option for separate index storage fix https://github.com/chrislusf/seaweedfs/issues/1265	2020-11-27 03:17:10 -08:00
Chris Lu	4ff2ceee33	UI fix on rendering EC volumes addressing UI problem with https://github.com/chrislusf/seaweedfs/issues/1551	2020-10-21 22:05:58 -07:00
Chris Lu	6a92f0bc7a	refactoring to typed Size Go is amazing with refactoring!	2020-08-18 17:04:28 -07:00
Chris Lu	3137777d83	volume: automatically detect max volume count	2020-03-22 16:21:42 -07:00
Chris Lu	40ae533fa3	shell: add volume.configure.replication to change replication for a volume fix https://github.com/chrislusf/seaweedfs/issues/1192	2020-02-02 15:37:23 -08:00
Chris Lu	672868b460	always create .vif file	2019-12-28 21:52:06 -08:00
Chris Lu	37b64a50b4	ec: generate and copy .vif file	2019-12-28 12:44:59 -08:00
Chris Lu	c06f7eb48a	load volume info from .vif file, use superblock as a backup	2019-12-28 12:28:58 -08:00
Chris Lu	58f88e530c	volume: use sorted index map for readonly volumes	2019-12-18 01:21:21 -08:00
Chris Lu	f88a8bda7b	ec deletion works	2019-06-21 01:14:10 -07:00
Chris Lu	4cea8aefd0	add grpc VolumeEcBlobDelete	2019-06-20 00:17:11 -07:00
Chris Lu	856da7aae2	ec volume support deletes	2019-06-19 22:57:14 -07:00
Chris Lu	d344e0a035	fix ec related bugs	2019-06-05 23:20:26 -07:00
Chris Lu	2215e81be7	ui add ec shard statuses	2019-06-04 21:52:37 -07:00
Chris Lu	ba18314aab	ec shard delete also check ec volumes, in addition to volumes	2019-06-01 01:41:22 -07:00
Chris Lu	47f1901843	ask for the ec volume version	2019-05-31 00:58:51 -07:00
Chris Lu	40ca2f2903	add collection.delete	2019-05-30 09:47:54 -07:00
Chris Lu	3f9ecee40f	working with reading remote intervals	2019-05-28 21:29:07 -07:00
Chris Lu	4f76342cbc	WIP no errors, but not returning file content * the interval needs to use actual file zie * need to read the actual version instead of the current version	2019-05-28 00:51:01 -07:00
Chris Lu	713596e781	caching ec shard locations	2019-05-27 22:54:58 -07:00

1 2

58 Commits