Enhance EC balancing to separate parity and data shards (#8038)

* Enhance EC balancing to separate parity and data shards across racks

* Rename avoidRacks to antiAffinityRacks for clarity

* Implement server-level EC separation for parity/data shards

* Optimize EC balancing: consolidate helpers and extract two-pass selection logic

* Add comprehensive edge case tests for EC balancing logic

* Apply code review feedback: rename select_(), add divide-by-zero guard, fix comment

* Remove unused parameters from doBalanceEcShardsWithinOneRack and add explicit anti-affinity check

* Add disk-level anti-affinity for data/parity shard separation

- Modified pickBestDiskOnNode to accept shardId and dataShardCount
- Implemented explicit anti-affinity: 1000-point penalty for placing data shards on disks with parity (and vice versa)
- Updated all call sites including balancing and evacuation
- For evacuation, disabled anti-affinity by passing dataShardCount=0
This commit is contained in:
Chris Lu
2026-01-15 12:43:44 -08:00
committed by GitHub
parent 905e7e72d9
commit 7eb90fdfd7
3 changed files with 749 additions and 66 deletions

View File

@@ -227,7 +227,8 @@ func (c *commandVolumeServerEvacuate) moveAwayOneEcVolume(commandEnv *CommandEnv
}
vid := needle.VolumeId(ecShardInfo.Id)
// For evacuation, prefer same disk type but allow fallback to other types
destDiskId := pickBestDiskOnNode(emptyNode, vid, diskType, false)
// No anti-affinity needed for evacuation (dataShardCount=0)
destDiskId := pickBestDiskOnNode(emptyNode, vid, diskType, false, shardId, 0)
if destDiskId > 0 {
fmt.Fprintf(writer, "moving ec volume %s%d.%d %s => %s (disk %d)\n", collectionPrefix, ecShardInfo.Id, shardId, thisNode.info.Id, emptyNode.info.Id, destDiskId)
} else {