feat(balance): replica placement validation for volume moves (#8622)

* feat(balance): add replica placement validation for volume moves

When the volume balance detection proposes moving a volume, validate
that the move does not violate the volume's replication policy (e.g.,
ReplicaPlacement=010 requires replicas on different racks). If the
preferred destination violates the policy, fall back to score-based
planning; if that also violates, skip the volume entirely.

- Add ReplicaLocation type and VolumeReplicaMap to ClusterInfo
- Build replica map from all volumes before collection filtering
- Port placement validation logic from command_volume_fix_replication.go
- Thread replica map through collectVolumeMetrics call chain
- Add IsGoodMove check in createBalanceTask before destination use

* address PR review: extract validation closure, add defensive checks

- Extract validateMove closure to eliminate duplicated ReplicaLocation
  construction and IsGoodMove calls
- Add defensive check for empty replica map entries (len(replicas) == 0)
- Add bounds check for int-to-byte cast on ExpectedReplicas (0-255)

* address nitpick: rp test helper accepts *testing.T and fails on error

Prevents silent failures from typos in replica placement codes.

* address review: add composite replica placement tests (011, 110)

Test multi-constraint placement policies where both rack and DC
rules must be satisfied simultaneously.

* address review: use struct keys instead of string concatenation

Replace string-concatenated map keys with typed rackKey/nodeKey
structs to eliminate allocations and avoid ambiguity if IDs
contain spaces.

* address review: simplify bounds check, log fallback error, guard source

- Remove unreachable ExpectedReplicas < 0 branch (outer condition
  already guarantees > 0), fold bounds check into single condition
- Log error from planBalanceDestination in replica validation fallback
- Return false from IsGoodMove when sourceNodeID not found in
  existing replicas (inconsistent cluster state)

* address review: use slices.Contains instead of hand-rolled helpers

Replace isAmongDC and isAmongRack with slices.Contains from the
standard library, reducing boilerplate.
This commit is contained in:
Chris Lu
2026-03-13 17:39:25 -07:00
committed by GitHub
parent 47ddf05d95
commit 8056b702ba
9 changed files with 364 additions and 30 deletions

View File

@@ -39,7 +39,7 @@ func TestBuildVolumeMetricsEmptyFilter(t *testing.T) {
&master_pb.VolumeInformationMessage{Id: 1, Collection: "photos", Size: 100},
&master_pb.VolumeInformationMessage{Id: 2, Collection: "videos", Size: 200},
)
metrics, _, err := buildVolumeMetrics(resp, "")
metrics, _, _, err := buildVolumeMetrics(resp, "")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
@@ -53,7 +53,7 @@ func TestBuildVolumeMetricsAllCollections(t *testing.T) {
&master_pb.VolumeInformationMessage{Id: 1, Collection: "photos", Size: 100},
&master_pb.VolumeInformationMessage{Id: 2, Collection: "videos", Size: 200},
)
metrics, _, err := buildVolumeMetrics(resp, collectionFilterAll)
metrics, _, _, err := buildVolumeMetrics(resp, collectionFilterAll)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
@@ -68,7 +68,7 @@ func TestBuildVolumeMetricsEachCollection(t *testing.T) {
&master_pb.VolumeInformationMessage{Id: 2, Collection: "videos", Size: 200},
)
// EACH_COLLECTION passes all volumes through; filtering happens in the handler
metrics, _, err := buildVolumeMetrics(resp, collectionFilterEach)
metrics, _, _, err := buildVolumeMetrics(resp, collectionFilterEach)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
@@ -83,7 +83,7 @@ func TestBuildVolumeMetricsRegexFilter(t *testing.T) {
&master_pb.VolumeInformationMessage{Id: 2, Collection: "videos", Size: 200},
&master_pb.VolumeInformationMessage{Id: 3, Collection: "photos-backup", Size: 300},
)
metrics, _, err := buildVolumeMetrics(resp, "^photos$")
metrics, _, _, err := buildVolumeMetrics(resp, "^photos$")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
@@ -99,7 +99,7 @@ func TestBuildVolumeMetricsInvalidRegex(t *testing.T) {
resp := makeTestVolumeListResponse(
&master_pb.VolumeInformationMessage{Id: 1, Collection: "photos", Size: 100},
)
_, _, err := buildVolumeMetrics(resp, "[invalid")
_, _, _, err := buildVolumeMetrics(resp, "[invalid")
if err == nil {
t.Fatal("expected error for invalid regex")
}