* Fix issue #7880: Tasks use Volume IDs instead of ip:port
When volume servers are registered with custom IDs, tasks were attempting
to connect using the ID instead of the actual ip:port address, causing
connection failures.
Modified task detection logic in balance, erasure coding, and vacuum tasks
to resolve volume server IDs to their actual ip:port addresses using
ActiveTopology information.
* Use server addresses directly instead of translating from IDs
Modified VolumeHealthMetrics to include ServerAddress field populated
directly from topology DataNodeInfo.Address. Updated task detection
logic to use addresses directly without runtime lookups.
Changes:
- Added ServerAddress field to VolumeHealthMetrics
- Updated maintenance scanner to populate ServerAddress
- Modified task detection to use ServerAddress for Node fields
- Updated DestinationPlan to include TargetAddress
- Removed runtime address lookups in favor of direct address usage
* Address PR comments: add ServerAddress field, improve error handling
- Add missing ServerAddress field to VolumeHealthMetrics struct
- Add warning in vacuum detection when server not found in topology
- Improve error handling in erasure coding to abort task if sources missing
- Make vacuum task stricter by skipping if server not found in topology
* Refactor: Extract common address resolution logic into shared utility
- Created weed/worker/tasks/util/address.go with ResolveServerAddress function
- Updated balance, erasure_coding, and vacuum detection to use the shared utility
- Removed code duplication and improved maintainability
- Consistent error handling across all task types
* Fix critical issues in task address resolution
- Vacuum: Require topology availability and fail if server not found (no fallback to ID)
- Ensure all task types consistently fail early when topology is incomplete
- Prevent creation of tasks that would fail due to missing server addresses
* Address additional PR feedback
- Add validation for empty addresses in ResolveServerAddress
- Remove redundant serverAddress variable in vacuum detection
- Improve robustness of address resolution
* Improve error logging in vacuum detection
- Include actual error details in log message for better diagnostics
- Make error messages consistent with other task types
* refactoring
* add ec shard size
* address comments
* passing task id
There seems to be a disconnect between the pending tasks created in ActiveTopology and the TaskDetectionResult returned by this function. A taskID is generated locally and used to create pending tasks via AddPendingECShardTask, but this taskID is not stored in the TaskDetectionResult or passed along in any way.
This makes it impossible for the worker that eventually executes the task to know which pending task in ActiveTopology it corresponds to. Without the correct taskID, the worker cannot call AssignTask or CompleteTask on the master, breaking the entire task lifecycle and capacity management feature.
A potential solution is to add a TaskID field to TaskDetectionResult and worker_pb.TaskParams, ensuring the ID is propagated from detection to execution.
* 1 source multiple destinations
* task supports multi source and destination
* ec needs to clean up previous shards
* use erasure coding constants
* getPlanningCapacityUnsafe getEffectiveAvailableCapacityUnsafe should return StorageSlotChange for calculation
* use CanAccommodate to calculate
* remove dead code
* address comments
* fix Mutex Copying in Protobuf Structs
* use constants
* fix estimatedSize
The calculation for estimatedSize only considers source.EstimatedSize and dest.StorageChange, but omits dest.EstimatedSize. The TaskDestination struct has an EstimatedSize field, which seems to be ignored here. This could lead to an incorrect estimation of the total size of data involved in tasks on a disk. The loop should probably also include estimatedSize += dest.EstimatedSize.
* at.assignTaskToDisk(task)
* refactoring
* Update weed/admin/topology/internal.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* fail fast
* fix compilation
* Update weed/worker/tasks/erasure_coding/detection.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* indexes for volume and shard locations
* dedup with ToVolumeSlots
* return an additional boolean to indicate success, or an error
* Update abstract_sql_store.go
* fix
* Update weed/worker/tasks/erasure_coding/detection.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update weed/admin/topology/task_management.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* faster findVolumeDisk
* Update weed/worker/tasks/erasure_coding/detection.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update weed/admin/topology/storage_slot_test.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* refactor
* simplify
* remove unused GetDiskStorageImpact function
* refactor
* add comments
* Update weed/admin/topology/storage_impact.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update weed/admin/topology/storage_slot_test.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update storage_impact.go
* AddPendingTask
The unified AddPendingTask function now serves as the single entry point for all task creation, successfully consolidating the previously separate functions while maintaining full functionality and improving code organization.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>