seaweedFS

Author	SHA1	Message	Date
Lisandro Pin	e5cf2d2a19	Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only. (#8360 ) * Give the `ScrubVolume()` RPC an option to flag found broken volumes as read-only. Also exposes this option in the shell `volume.scrub` command. * Remove redundant test in `TestVolumeMarkReadonlyWritableErrorPaths`. 417051bb slightly rearranges the logic for `VolumeMarkReadonly()` and `VolumeMarkWritable()`, so calling them for invalid volume IDs will actually yield that error, instead of checking maintnenance mode first.	2026-03-26 10:20:57 -07:00
Chris Lu	81369b8a83	improve: large file sync throughput for remote.cache and filer.sync (#8676 ) * improve large file sync throughput for remote.cache and filer.sync Three main throughput improvements: 1. Adaptive chunk sizing for remote.cache: targets ~32 chunks per file instead of always starting at 5MB. A 500MB file now uses ~16MB chunks (32 chunks) instead of 5MB chunks (100 chunks), reducing per-chunk overhead (volume assign, gRPC call, needle write) by 3x. 2. Configurable concurrency at every layer: - remote.cache chunk concurrency: -chunkConcurrency flag (default 8) - remote.cache S3 download concurrency: -downloadConcurrency flag (default raised from 1 to 5 per chunk) - filer.sync chunk concurrency: -chunkConcurrency flag (default 32) 3. S3 multipart download concurrency raised from 1 to 5: the S3 manager downloader was using Concurrency=1, serializing all part downloads within each chunk. This alone can 5x per-chunk download speed. The concurrency values flow through the gRPC request chain: shell command → CacheRemoteObjectToLocalClusterRequest → FetchAndWriteNeedleRequest → S3 downloader Zero values in the request mean "use server defaults", maintaining full backward compatibility with existing callers. Ref #8481 * fix: use full maxMB for chunk size cap and remove loop guard Address review feedback: - Use full maxMB instead of maxMB/2 for maxChunkSize to avoid unnecessarily limiting chunk size for very large files. - Remove chunkSize < maxChunkSize guard from the safety loop so it can always grow past maxChunkSize when needed to stay under 1000 chunks (e.g., extremely large files with small maxMB). * address review feedback: help text, validation, naming, docs - Fix help text for -chunkConcurrency and -downloadConcurrency flags to say "0 = server default" instead of advertising specific numeric defaults that could drift from the server implementation. - Validate chunkConcurrency and downloadConcurrency are within int32 range before narrowing, returning a user-facing error if out of range. - Rename ReadRemoteErr to readRemoteErr to follow Go naming conventions. - Add doc comment to SetChunkConcurrency noting it must be called during initialization before replication goroutines start. - Replace doubling loop in chunk size safety check with direct ceil(remoteSize/1000) computation to guarantee the 1000-chunk cap. * address Copilot review: clamp concurrency, fix chunk count, clarify proto docs - Use ceiling division for chunk count check to avoid overcounting when file size is an exact multiple of chunk size. - Clamp chunkConcurrency (max 1024) and downloadConcurrency (max 1024 at filer, max 64 at volume server) to prevent excessive goroutines. - Always use ReadFileWithConcurrency when the client supports it, falling back to the implementation's default when value is 0. - Clarify proto comments that download_concurrency only applies when the remote storage client supports it (currently S3). - Include specific server defaults in help text (e.g., "0 = server default 8") so users see the actual values in -h output. * fix data race on executionErr and use %w for error wrapping - Protect concurrent writes to executionErr in remote.cache worker goroutines with a sync.Mutex to eliminate the data race. - Use %w instead of %v in volume_grpc_remote.go error formatting to preserve the error chain for errors.Is/errors.As callers.	2026-03-17 16:49:56 -07:00
Chris Lu	f44e25b422	fix(iam): ensure access key status is persisted and defaulted to Active (#8341 ) * Fix master leader election startup issue Fixes #error-log-leader-not-selected-yet * not useful test * fix(iam): ensure access key status is persisted and defaulted to Active * make pb * update tests * using constants	2026-02-13 20:28:41 -08:00
Lisandro Pin	1a5679a5eb	Implement a `VolumeEcStatus()` RPC for volume servers. (#8006 ) Just like `VolumeStatus()`, this call allows inspecting details for a given EC volume - including number of files and their total size.	2026-02-09 11:52:08 -08:00
Lisandro Pin	2cda4289f4	Add a version token on RPCs to read/update volume server states. (#8191 ) * Add a version token on `GetState()`/`SetState()` RPCs for volume server states. * Make state version a property ov `VolumeServerState` instead of an in-memory counter. Also extend state atomicity to reads, instead of just writes.	2026-02-06 10:58:43 -08:00
Lisandro Pin	9d751a7b61	Contrib/volume scrub local (#8226 )	2026-02-05 14:44:12 -08:00
Lisandro Pin	ff5a8f0579	Implement RPC skeleton for regular/EC volumes scrubbing. (#8187 ) * Implement RPC skeleton for regular/EC volumes scrubbing. See https://github.com/seaweedfs/seaweedfs/issues/8018 for details. * Minor proto improvements for `ScrubVolume()`, `ScrubEcVolume()`: - Add fields for scrubbing details in `ScrubVolumeResponse` and `ScrubEcVolumeResponse`, instead of reporting these through RPC errors. - Return a list of broken shards when scrubbing EC volumes, via `EcShardInfo'.	2026-02-02 17:55:04 -08:00
Lisandro Pin	345ac950b6	Add volume server RPCs to read and update state flags. (#8186 ) * Boostrap persistent state for volume servers. This PR implements logic load/save persistent state information for storages associated with volume servers, and reporting state changes back to masters via heartbeat messages. More work ensues! See https://github.com/seaweedfs/seaweedfs/issues/7977 for details. * Add volume server RPCs to read and update state flags.	2026-02-02 16:22:17 -08:00
Lisandro Pin	9638d37fe2	Block RPC write operations on volume servers when maintenance mode is enabled (#8115 ) * Boostrap persistent state for volume servers. This PR implements logic load/save persistent state information for storages associated with volume servers, and reporting state changes back to masters via heartbeat messages. More work ensues! See https://github.com/seaweedfs/seaweedfs/issues/7977 for details. * Block RPC operations writing to volume servers when maintenance mode is on.	2026-02-02 13:21:02 -08:00
Chris Lu	6bf088cec9	IAM Policy Management via gRPC (#8109 ) * Add IAM gRPC service definition - Add GetConfiguration/PutConfiguration for config management - Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management - Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management - Methods mirror existing IAM HTTP API functionality * Add IAM gRPC handlers on filer server - Implement IamGrpcServer with CredentialManager integration - Handle configuration get/put operations - Handle user CRUD operations - Handle access key create/delete operations - All methods delegate to CredentialManager for actual storage * Wire IAM gRPC service to filer server - Add CredentialManager field to FilerOption and FilerServer - Import credential store implementations in filer command - Initialize CredentialManager from credential.toml if available - Register IAM gRPC service on filer gRPC server - Enable credential management via gRPC alongside existing filer services * Regenerate IAM protobuf with gRPC service methods * iam_pb: add Policy Management to protobuf definitions * credential: implement PolicyManager in credential stores * filer: implement IAM Policy Management RPCs * shell: add s3.policy command * test: add integration test for s3.policy * test: fix compilation errors in policy_test * pb * fmt * test * weed shell: add -policies flag to s3.configure This allows linking/unlinking IAM policies to/from identities directly from the s3.configure command. * test: verify s3.configure policy linking and fix port allocation - Added test case for linking policies to users via s3.configure - Implemented findAvailablePortPair to ensure HTTP and gRPC ports are both available, avoiding conflicts with randomized port assignments. - Updated assertion to match jsonpb output (policyNames) * credential: add StoreTypeGrpc constant * credential: add IAM gRPC store boilerplate * credential: implement identity methods in gRPC store * credential: implement policy methods in gRPC store * admin: use gRPC credential store for AdminServer This ensures that all IAM and policy changes made through the Admin UI are persisted via the Filer's IAM gRPC service instead of direct file manipulation. * shell: s3.configure use granular IAM gRPC APIs instead of full config patching * shell: s3.configure use granular IAM gRPC APIs * shell: replace deprecated ioutil with os in s3.policy * filer: use gRPC FailedPrecondition for unconfigured credential manager * test: improve s3.policy integration tests and fix error checks * ci: add s3 policy shell integration tests to github workflow * filer: fix LoadCredentialConfiguration error handling * credential/grpc: propagate unmarshal errors in GetPolicies * filer/grpc: improve error handling and validation * shell: use gRPC status codes in s3.configure * credential: document PutPolicy as create-or-replace * credential/postgres: reuse CreatePolicy in PutPolicy to deduplicate logic * shell: add timeout context and strictly enforce flags in s3.policy * iam: standardize policy content field naming in gRPC and proto * shell: extract slice helper functions in s3.configure * filer: map credential store errors to gRPC status codes * filer: add input validation for UpdateUser and CreateAccessKey * iam: improve validation in policy and config handlers * filer: ensure IAM service registration by defaulting credential manager * credential: add GetStoreName method to manager * test: verify policy deletion in integration test	2026-01-25 13:39:30 -08:00
Lisandro Pin	59d40f7186	Return volume server state flags via `VolumeServerStatus()` RPCs. (#8016 )	2026-01-24 21:45:23 -08:00
Lisandro Pin	2af293ce60	Boostrap persistent state for volume servers. (#7984 ) This PR implements logic load/save persistent state information for storages associated with volume servers, and reporting state changes back to masters via heartbeat messages. More work ensues! See https://github.com/seaweedfs/seaweedfs/issues/7977 for details.	2026-01-12 10:49:59 -08:00
promalert	9012069bd7	chore: execute goimports to format the code (#7983 ) * chore: execute goimports to format the code Signed-off-by: promalert <promalert@outlook.com> * goimports -w . --------- Signed-off-by: promalert <promalert@outlook.com> Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-01-07 13:06:08 -08:00
Chris Lu	e67973dc53	Support Policy Attachment for Object Store Users (#7981 ) * Implement Policy Attachment support for Object Store Users - Added policy_names field to iam.proto and regenerated protos. - Updated S3 API and IAM integration to support direct policy evaluation for users. - Enhanced Admin UI to allow attaching policies to users via modals. - Renamed 'policies' to 'policy_names' to clarify that it stores identifiers. - Fixed syntax error in user_management.go. * Fix policy dropdown not populating The API returns {policies: [...]} but JavaScript was treating response as direct array. Updated loadPolicies() to correctly access data.policies property. * Add null safety checks for policy dropdowns Added checks to prevent "undefined" errors when: - Policy select elements don't exist - Policy dropdowns haven't loaded yet - User is being edited before policies are loaded * Fix policy dropdown by using correct JSON field name JSON response has lowercase 'name' field but JavaScript was accessing 'Name'. Changed policy.Name to policy.name to match the IAMPolicy JSON structure. * Fix policy names not being saved on user update Changed condition from len(req.PolicyNames) > 0 to req.PolicyNames != nil to ensure policy names are always updated when present in the request, even if it's an empty array (to allow clearing policies). * Add debug logging for policy names update flow Added console.log in frontend and glog in backend to trace policy_names data through the update process. * Temporarily disable auto-reload for debugging Commented out window.location.reload() so console logs are visible when updating a user. * Add detailed debug logging and alert for policy selection Added console.log for each step and an alert to show policy_names value to help diagnose why it's not being included in the request. * Regenerate templ files for object_store_users Ran templ generate to ensure _templ.go files are up to date with the latest .templ changes including debug logging. * Remove debug logging and restore normal functionality Cleaned up temporary debug code (console.log and alert statements) and re-enabled automatic page reload after user update. * Add step-by-step alert debugging for policy update Added 5 alert checkpoints to trace policy data through the update flow: 1. Check if policiesSelect element exists 2. Show selected policy values 3. Show userData.policy_names 4. Show full request body 5. Confirm server response Temporarily disabled auto-reload to see alerts. * Add version check alert on page load Added alert on DOMContentLoaded to verify new JavaScript is being executed and not cached by the browser. * Compile templates using make Ran make to compile all template files and install the weed binary. * Add button click detection and make handleUpdateUser global - Added inline alert on button click to verify click is detected - Made handleUpdateUser a window-level function to ensure it's accessible - Added alert at start of handleUpdateUser function * Fix handleUpdateUser scope issue - remove duplicate definition Removed duplicate function definition that was inside DOMContentLoaded. Now handleUpdateUser is defined only once in global scope (line 383) making it accessible when button onclick fires. * Remove all duplicate handleUpdateUser definitions Now handleUpdateUser is defined only once at the very top of the script block (line 352), before DOMContentLoaded, ensuring it's available when the button onclick fires. * Add function existence check and error catching Added alerts to check if handleUpdateUser is defined and wrapped the function call in try-catch to capture any JavaScript errors. Also added console.log statements to verify function definition. * Simplify handleUpdateUser to non-async for testing Removed async/await and added early return to test if function can be called at all. This will help identify if async is causing the issue. * Add cache-control headers to prevent browser caching Added no-cache headers to ShowObjectStoreUsers handler to prevent aggressive browser caching of inline JavaScript in the HTML page. * Fix syntax error - make handleUpdateUser async Changed function back to async to fix 'await is only valid in async functions' error. The cache-control headers are working - browser is now loading new code. * Update version check to v3 to verify cache busting Changed version alert to 'v3 - WITH EARLY RETURN' to confirm the new code with early return statement is being loaded. * Remove all debug code - clean implementation Removed all alerts, console.logs, and test code. Implemented clean policy update functionality with proper error handling. * Add ETag header for cache-busting and update walkthrough * Fix policy pre-selection in Edit User modal - Updated admin.js editUser function to pre-select policies - Root cause: duplicate editUser in admin.js overwrote inline version - Added policy pre-selection logic to match inline template - Verified working in browser: policies now pre-select correctly * Fix policy persistence in handleUpdateUser - Added policy_names field to userData payload in handleUpdateUser - Policies were being lost because handleUpdateUser only sent email and actions - Now collects selected policies from editPolicies dropdown - Verified working: policies persist correctly across updates * Fix XSS vulnerability in access keys display - Escape HTML in access key display using escapeHtml utility - Replace inline onclick handlers with data attributes - Add event delegation for delete access key buttons - Prevents script injection via malicious access key values * Fix additional XSS vulnerabilities in user details display - Escape HTML in actions badges (line 626) - Escape HTML in policy_names badges (line 636) - Prevents script injection via malicious action or policy names * Fix XSS vulnerability in loadPolicies function - Replace innerHTML string concatenation with DOM API - Use createElement and textContent for safe policy name insertion - Prevents script injection via malicious policy names - Apply same pattern to both create and edit select elements * Remove debug logging from UpdateObjectStoreUser - Removed glog.V(0) debug statements - Clean up temporary debugging code before production * Remove duplicate handleUpdateUser function - Removed inline handleUpdateUser that duplicated admin.js logic - Removed debug console.log statement - admin.js version is now the single source of truth - Eliminates maintenance burden of keeping two versions in sync * Refine user management and address code review feedback - Preserve PolicyNames in UpdateUserPolicies - Allow clearing actions in UpdateObjectStoreUser by checking for nil - Remove version comment from object_store_users.templ - Refactor loadPolicies for DRYness using cloneNode while keeping DOM API security * IAM Authorization for Static Access Keys * verified XSS Fixes in Templates * fix div	2026-01-06 21:53:28 -08:00
Chris Lu	208d7f24f4	Erasure Coding: Ec refactoring (#7396 ) * refactor: add ECContext structure to encapsulate EC parameters - Create ec_context.go with ECContext struct - NewDefaultECContext() creates context with default 10+4 configuration - Helper methods: CreateEncoder(), ToExt(), String() - Foundation for cleaner function signatures - No behavior change, still uses hardcoded 10+4 * refactor: update ec_encoder.go to use ECContext - Add WriteEcFilesWithContext() and RebuildEcFilesWithContext() functions - Keep old functions for backward compatibility (call new versions) - Update all internal functions to accept ECContext parameter - Use ctx.DataShards, ctx.ParityShards, ctx.TotalShards consistently - Use ctx.CreateEncoder() instead of hardcoded reedsolomon.New() - Use ctx.ToExt() for shard file extensions - No behavior change, still uses default 10+4 configuration * refactor: update ec_volume.go to use ECContext - Add ECContext field to EcVolume struct - Initialize ECContext with default configuration in NewEcVolume() - Update LocateEcShardNeedleInterval() to use ECContext.DataShards - Phase 1: Always uses default 10+4 configuration - No behavior change * refactor: add EC shard count fields to VolumeInfo protobuf - Add data_shards_count field (field 8) to VolumeInfo message - Add parity_shards_count field (field 9) to VolumeInfo message - Fields are optional, 0 means use default (10+4) - Backward compatible: fields added at end - Phase 1: Foundation for future customization * refactor: regenerate protobuf Go files with EC shard count fields - Regenerated volume_server_pb/.go with new EC fields - DataShardsCount and ParityShardsCount accessors added to VolumeInfo - No behavior change, fields not yet used refactor: update VolumeEcShardsGenerate to use ECContext - Create ECContext with default configuration in VolumeEcShardsGenerate - Use ecCtx.TotalShards and ecCtx.ToExt() in cleanup - Call WriteEcFilesWithContext() instead of WriteEcFiles() - Save EC configuration (DataShardsCount, ParityShardsCount) to VolumeInfo - Log EC context being used - Phase 1: Always uses default 10+4 configuration - No behavior change * fmt * refactor: update ec_test.go to use ECContext - Update TestEncodingDecoding to create and use ECContext - Update validateFiles() to accept ECContext parameter - Update removeGeneratedFiles() to use ctx.TotalShards and ctx.ToExt() - Test passes with default 10+4 configuration * refactor: use EcShardConfig message instead of separate fields * optimize: pre-calculate row sizes in EC encoding loop * refactor: replace TotalShards field with Total() method - Remove TotalShards field from ECContext to avoid field drift - Add Total() method that computes DataShards + ParityShards - Update all references to use ctx.Total() instead of ctx.TotalShards - Read EC config from VolumeInfo when loading EC volumes - Read data shard count from .vif in VolumeEcShardsToVolume - Use >= instead of > for exact boundary handling in encoding loops * optimize: simplify VolumeEcShardsToVolume to use existing EC context - Remove redundant CollectEcShards call - Remove redundant .vif file loading - Use v.ECContext.DataShards directly (already loaded by NewEcVolume) - Slice tempShards instead of collecting again * refactor: rename MaxShardId to MaxShardCount for clarity - Change from MaxShardId=31 to MaxShardCount=32 - Eliminates confusing +1 arithmetic (MaxShardId+1) - More intuitive: MaxShardCount directly represents the limit fix: support custom EC ratios beyond 14 shards in VolumeEcShardsToVolume - Add MaxShardId constant (31, since ShardBits is uint32) - Use MaxShardId+1 (32) instead of TotalShardsCount (14) for tempShards buffer - Prevents panic when slicing for volumes with >14 total shards - Critical fix for custom EC configurations like 20+10 * fix: add validation for EC shard counts from VolumeInfo - Validate DataShards/ParityShards are positive and within MaxShardCount - Prevent zero or invalid values that could cause divide-by-zero - Fallback to defaults if validation fails, with warning log - VolumeEcShardsGenerate now preserves existing EC config when regenerating - Critical safety fix for corrupted or legacy .vif files * fix: RebuildEcFiles now loads EC config from .vif file - Critical: RebuildEcFiles was always using default 10+4 config - Now loads actual EC config from .vif file when rebuilding shards - Validates config before use (positive shards, within MaxShardCount) - Falls back to default if .vif missing or invalid - Prevents data corruption when rebuilding custom EC volumes * add: defensive validation for dataShards in VolumeEcShardsToVolume - Validate dataShards > 0 and <= MaxShardCount before use - Prevents panic from corrupted or uninitialized ECContext - Returns clear error message instead of panic - Defense-in-depth: validates even though upstream should catch issues * fix: replace TotalShardsCount with MaxShardCount for custom EC ratio support Critical fixes to support custom EC ratios > 14 shards: disk_location_ec.go: - validateEcVolume: Check shards 0-31 instead of 0-13 during validation - removeEcVolumeFiles: Remove shards 0-31 instead of 0-13 during cleanup ec_volume_info.go ShardBits methods: - ShardIds(): Iterate up to MaxShardCount (32) instead of TotalShardsCount (14) - ToUint32Slice(): Iterate up to MaxShardCount (32) - IndexToShardId(): Iterate up to MaxShardCount (32) - MinusParityShards(): Remove shards 10-31 instead of 10-13 (added note about Phase 2) - Minus() shard size copy: Iterate up to MaxShardCount (32) - resizeShardSizes(): Iterate up to MaxShardCount (32) Without these changes: - Custom EC ratios > 14 total shards would fail validation on startup - Shards 14-31 would never be discovered or cleaned up - ShardBits operations would miss shards >= 14 These changes are backward compatible - MaxShardCount (32) includes the default TotalShardsCount (14), so existing 10+4 volumes work as before. * fix: replace TotalShardsCount with MaxShardCount in critical data structures Critical fixes for buffer allocations and loops that must support custom EC ratios up to 32 shards: Data Structures: - store_ec.go:354: Buffer allocation for shard recovery (bufs array) - topology_ec.go:14: EcShardLocations.Locations fixed array size - command_ec_rebuild.go:268: EC shard map allocation - command_ec_common.go:626: Shard-to-locations map allocation Shard Discovery Loops: - ec_task.go:378: Loop to find generated shard files - ec_shard_management.go: All 8 loops that check/count EC shards These changes are critical because: 1. Buffer allocations sized to 14 would cause index-out-of-bounds panics when accessing shards 14-31 2. Fixed arrays sized to 14 would truncate shard location data 3. Loops limited to 0-13 would never discover/manage shards 14-31 Note: command_ec_encode.go:208 intentionally NOT changed - it creates shard IDs to mount after encoding. In Phase 1 we always generate 14 shards, so this remains TotalShardsCount and will be made dynamic in Phase 2 based on actual EC context. Without these fixes, custom EC ratios > 14 total shards would cause: - Runtime panics (array index out of bounds) - Data loss (shards 14-31 never discovered/tracked) - Incomplete shard management (missing shards not detected) * refactor: move MaxShardCount constant to ec_encoder.go Moved MaxShardCount from ec_volume_info.go to ec_encoder.go to group it with other shard count constants (DataShardsCount, ParityShardsCount, TotalShardsCount). This improves code organization and makes it easier to understand the relationship between these constants. Location: ec_encoder.go line 22, between TotalShardsCount and MinTotalDisks * improve: add defensive programming and better error messages for EC Code review improvements from CodeRabbit: 1. ShardBits Guardrails (ec_volume_info.go): - AddShardId, RemoveShardId: Reject shard IDs >= MaxShardCount - HasShardId: Return false for out-of-range shard IDs - Prevents silent no-ops from bit shifts with invalid IDs 2. Future-Proof Regex (disk_location_ec.go): - Updated regex from \.ec[0-9][0-9] to \.ec\d{2,3} - Now matches .ec00 through .ec999 (currently .ec00-.ec31 used) - Supports future increases to MaxShardCount beyond 99 3. Better Error Messages (volume_grpc_erasure_coding.go): - Include valid range (1..32) in dataShards validation error - Helps operators quickly identify the problem 4. Validation Before Save (volume_grpc_erasure_coding.go): - Validate ECContext (DataShards > 0, ParityShards > 0, Total <= MaxShardCount) - Log EC config being saved to .vif for debugging - Prevents writing invalid configs to disk These changes improve robustness and debuggability without changing core functionality. * fmt * fix: critical bugs from code review + clean up comments Critical bug fixes: 1. command_ec_rebuild.go: Fixed indentation causing compilation error - Properly nested if/for blocks in registerEcNode 2. ec_shard_management.go: Fixed isComplete logic incorrectly using MaxShardCount - Changed from MaxShardCount (32) back to TotalShardsCount (14) - Default 10+4 volumes were being incorrectly reported as incomplete - Missing shards 14-31 were being incorrectly reported as missing - Fixed in 4 locations: volume completeness checks and getMissingShards 3. ec_volume_info.go: Fixed MinusParityShards removing too many shards - Changed from MaxShardCount (32) back to TotalShardsCount (14) - Was incorrectly removing shard IDs 10-31 instead of just 10-13 Comment cleanup: - Removed Phase 1/Phase 2 references (development plan context) - Replaced with clear statements about default 10+4 configuration - SeaweedFS repo uses fixed 10+4 EC ratio, no phases needed Root cause: Over-aggressive replacement of TotalShardsCount with MaxShardCount. MaxShardCount (32) is the limit for buffer allocations and shard ID loops, but TotalShardsCount (14) must be used for default EC configuration logic. * fix: add defensive bounds checks and compute actual shard counts Critical fixes from code review: 1. topology_ec.go: Add defensive bounds checks to AddShard/DeleteShard - Prevent panic when shardId >= MaxShardCount (32) - Return false instead of crashing on out-of-range shard IDs 2. command_ec_common.go: Fix doBalanceEcShardsAcrossRacks - Was using hardcoded TotalShardsCount (14) for all volumes - Now computes actual totalShardsForVolume from rackToShardCount - Fixes incorrect rebalancing for volumes with custom EC ratios - Example: 5+2=7 shards would incorrectly use 14 as average These fixes improve robustness and prepare for future custom EC ratios without changing current behavior for default 10+4 volumes. Note: MinusParityShards and ec_task.go intentionally NOT changed for seaweedfs repo - these will be enhanced in seaweed-enterprise repo where custom EC ratio configuration is added. * fmt * style: make MaxShardCount type casting explicit in loops Improved code clarity by explicitly casting MaxShardCount to the appropriate type when used in loop comparisons: - ShardId comparisons: Cast to ShardId(MaxShardCount) - uint32 comparisons: Cast to uint32(MaxShardCount) Changed in 5 locations: - Minus() loop (line 90) - ShardIds() loop (line 143) - ToUint32Slice() loop (line 152) - IndexToShardId() loop (line 219) - resizeShardSizes() loop (line 248) This makes the intent explicit and improves type safety readability. No functional changes - purely a style improvement.	2025-10-27 22:13:31 -07:00
Chris Lu	891a2fb6eb	Admin: misc improvements on admin server and workers. EC now works. (#7055 ) * initial design * added simulation as tests * reorganized the codebase to move the simulation framework and tests into their own dedicated package * integration test. ec worker task * remove "enhanced" reference * start master, volume servers, filer Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready * generate write load * tasks are assigned * admin start wtih grpc port. worker has its own working directory * Update .gitignore * working worker and admin. Task detection is not working yet. * compiles, detection uses volumeSizeLimitMB from master * compiles * worker retries connecting to admin * build and restart * rendering pending tasks * skip task ID column * sticky worker id * test canScheduleTaskNow * worker reconnect to admin * clean up logs * worker register itself first * worker can run ec work and report status but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted. * move ec task logic * listing ec shards * local copy, ec. Need to distribute. * ec is mostly working now * distribution of ec shards needs improvement * need configuration to enable ec * show ec volumes * interval field UI component * rename * integration test with vauuming * garbage percentage threshold * fix warning * display ec shard sizes * fix ec volumes list * Update ui.go * show default values * ensure correct default value * MaintenanceConfig use ConfigField * use schema defined defaults * config * reduce duplication * refactor to use BaseUIProvider * each task register its schema * checkECEncodingCandidate use ecDetector * use vacuumDetector * use volumeSizeLimitMB * remove remove * remove unused * refactor * use new framework * remove v2 reference * refactor * left menu can scroll now * The maintenance manager was not being initialized when no data directory was configured for persistent storage. * saving config * Update task_config_schema_templ.go * enable/disable tasks * protobuf encoded task configurations * fix system settings * use ui component * remove logs * interface{} Reduction * reduce interface{} * reduce interface{} * avoid from/to map * reduce interface{} * refactor * keep it DRY * added logging * debug messages * debug level * debug * show the log caller line * use configured task policy * log level * handle admin heartbeat response * Update worker.go * fix EC rack and dc count * Report task status to admin server * fix task logging, simplify interface checking, use erasure_coding constants * factor in empty volume server during task planning * volume.list adds disk id * track disk id also * fix locking scheduled and manual scanning * add active topology * simplify task detector * ec task completed, but shards are not showing up * implement ec in ec_typed.go * adjust log level * dedup * implementing ec copying shards and only ecx files * use disk id when distributing ec shards 🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned * Delete original volume from all locations * clean up existing shard locations * local encoding and distributing * Update docker/admin_integration/EC-TESTING-README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * check volume id range * simplify * fix tests * fix types * clean up logs and tests --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-30 12:38:03 -07:00
chrislu	c602f53a6e	tail-volume-uses-the-source-volume-version	2025-06-16 22:46:13 -07:00
chrislu	96632a34b1	add version to volume proto	2025-06-16 22:05:06 -07:00
Chris Lu	cc05874d06	Add message queue agent (#6463 ) * scaffold message queue agent * adjust proto, add mq_agent * add agent client implementation * remove unused function * agent publish server implementation * adding agent	2025-01-20 22:19:27 -08:00
chrislu	9873b033d1	backward compatible vif loading	2024-10-28 19:44:30 -07:00
chrislu	2f3d820f52	rename proto field This should not have any impact.	2024-10-24 21:36:56 -07:00
chrislu	ae5bd0667a	rename proto field from DestroyTime to expire_at_sec For TTL volume converted into EC volume, this change may leave the volumes staying.	2024-10-24 21:35:11 -07:00
Max Denushev	d056c0ddf2	fix(volume): don't persist RO state in specific cases (#6058 ) * fix(volume): don't persist RO state in specific cases * fix(volume): writable always persist	2024-09-24 16:15:54 -07:00
Guang Jiong Lou	6c986e9d70	improve worm support (#5983 ) * improve worm support Signed-off-by: lou <alex1988@outlook.com> * worm mode in filer Signed-off-by: lou <alex1988@outlook.com> * update after review Signed-off-by: lou <alex1988@outlook.com> * update after review Signed-off-by: lou <alex1988@outlook.com> * move to fs configure Signed-off-by: lou <alex1988@outlook.com> * remove flag Signed-off-by: lou <alex1988@outlook.com> * update after review Signed-off-by: lou <alex1988@outlook.com> * support worm hardlink Signed-off-by: lou <alex1988@outlook.com> * update after review Signed-off-by: lou <alex1988@outlook.com> * typo Signed-off-by: lou <alex1988@outlook.com> * sync filer conf Signed-off-by: lou <alex1988@outlook.com> --------- Signed-off-by: lou <alex1988@outlook.com>	2024-09-16 21:02:21 -07:00
Bruce	f9e141a412	persist readonly state to volume info (#5977 )	2024-09-05 07:58:24 -07:00
augustazz	0b00706454	EC volume supports expiration and displays expiration message when executing volume.list (#5895 ) * ec volume expire * volume.list show DestroyTime * comments * code optimization --------- Co-authored-by: xuwenfeng <xuwenfeng1@zto.com>	2024-08-16 00:20:00 -07:00
chrislu	fdf7193ae7	rename	2024-08-13 13:59:24 -07:00
chrislu	07f4998188	add dat file size into vif for EC	2024-08-13 13:56:00 -07:00
chrislu	7e443ef0a1	latest protoc-gen-go	2024-02-29 10:06:23 -08:00
chrislu	1b4484bf0a	go fmt	2024-02-29 09:38:52 -08:00
Chris Lu	e27deed4bc	upgrade protoc	2024-02-05 18:39:08 -08:00
chrislu	deb86ff4a6	upgrading grpc	2023-10-12 21:38:34 -07:00
chrislu	de0b969b36	Revert "rename" This reverts commit `35b5264ab7`.	2023-10-12 20:28:11 -07:00
chrislu	35b5264ab7	rename	2023-10-11 21:44:56 -07:00
Konstantin Lebedev	2b3e39397e	fix: skipping checking active volumes with the same number of files at the moment (#4893 ) * fix: skipping checking active volumes with the same number of files at the moment https://github.com/seaweedfs/seaweedfs/issues/4140 * refactor with comments https://github.com/seaweedfs/seaweedfs/issues/4140 * add TestShouldSkipVolume --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	2023-10-09 09:57:26 -07:00
chrislu	358cba43ef	update proto generated files	2023-09-30 13:19:25 -07:00
chrislu	9d589b48e6	rename function	2023-09-26 15:17:33 -07:00
chrislu	504ae8383a	protoc version	2023-08-28 09:01:25 -07:00
chrislu	dbcba75271	rename to lookup	2023-08-27 18:59:04 -07:00
chrislu	8ec1bc2c99	remove unused cluster node leader	2023-06-19 18:19:13 -07:00
Konstantin Lebedev	25535e9c36	Delete volume is empty (#4561 ) * use onlyEmpty for deleteVolume https://github.com/seaweedfs/seaweedfs/issues/4559 * fix IsEmpty * fix test --------- Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>	2023-06-12 10:42:44 -07:00
wusong	26f15d0079	Fix no more writable volumes by delay judgment (#4548 ) * fix nomore writables volumes while disk free space is sufficient by time delay * reset --------- Co-authored-by: wang wusong <wangwusong@virtaitech.com>	2023-06-05 10:17:21 -07:00
Muhammad Hallaj bin Subery	9bd422d2c9	adding support for B2 region (#4177 ) Co-authored-by: Muhammad Hallaj bin Subery <hallaj@tuta.io>	2023-02-05 21:24:21 -08:00
Guo Lei	d8cfa1552b	support enable/disable vacuum (#4087 ) * stop vacuum * suspend/resume vacuum * remove unused code * rename * rename param	2022-12-28 01:36:44 -08:00
chrislu	e1ca6308cb	add chunk etag when downloading from remote storage fix https://github.com/seaweedfs/seaweedfs/issues/3987	2022-12-10 21:49:07 -08:00
James Hartig	81624de27b	Include name/mime in ReadAllNeedles (#4005 )	2022-11-23 15:59:38 -08:00
James Hartig	4c85da7844	Include meta in ReadAllNeedles (#3991 ) This is useful for doing backups on the data so we can accurately store the last modified time, the compression state, and verify the crc. Previously we were doing VolumeNeedleStatus and then an HTTP request which needlessly read from the dat file twice.	2022-11-20 20:19:41 -08:00
chrislu	ea2637734a	refactor filer proto chunk variable from mtime to modified_ts_ns	2022-10-28 12:53:19 -07:00
Eric Yang	51d462f204	ADHOC: volume fsck using append at ns (#3906 ) * ADHOC: volume fsck using append at ns * nit * nit Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>	2022-10-24 22:09:38 -07:00
chrislu	de286fe662	shell: volume.move handles volume moved to cloud tier fix https://github.com/seaweedfs/seaweedfs/issues/3803	2022-10-16 17:52:22 -07:00

1 2 3 4

166 Commits