fix: EC rebalance fails with replica placement 000 (#7812)

* fix: EC rebalance fails with replica placement 000 This PR fixes several issues with EC shard distribution: 1. Pre-flight check before EC encoding - Verify target disk type has capacity before encoding starts - Prevents encoding shards only to fail during rebalance - Shows helpful error when wrong diskType is specified (e.g., ssd when volumes are on hdd) 2. Fix EC rebalance with replica placement 000 - When DiffRackCount=0, shards should be distributed freely across racks - The '000' placement means 'no volume replication needed' because EC provides redundancy - Previously all racks were skipped with error 'shards X > replica placement limit (0)' 3. Add unit tests for EC rebalance slot calculation - TestECRebalanceWithLimitedSlots: documents the limited slots scenario - TestECRebalanceZeroFreeSlots: reproduces the 0 free slots error 4. Add Makefile for manual EC testing - make setup: start cluster and populate data - make shell: open weed shell for EC commands - make clean: stop cluster and cleanup * fix: default -rebalance to true for ec.encode The -rebalance flag was defaulting to false, which meant ec.encode would only print shard moves but not actually execute them. This is a poor default since the whole point of EC encoding is to distribute shards across servers for fault tolerance. Now -rebalance defaults to true, so shards are actually distributed after encoding. Users can use -rebalance=false if they only want to see what would happen without making changes. * test/erasure_coding: improve Makefile safety and docs - Narrow pkill pattern for volume servers to use TEST_DIR instead of port pattern, avoiding accidental kills of unrelated SeaweedFS processes - Document external dependencies (curl, jq) in header comments * shell: refactor buildRackWithEcShards to reuse buildEcShards Extract common shard bit construction logic to avoid duplication between buildEcShards and buildRackWithEcShards helper functions. * shell: update test for EC replication 000 behavior When DiffRackCount=0 (replication "000"), EC shards should be distributed freely across racks since erasure coding provides its own redundancy. Update test expectation to reflect this behavior. * erasure_coding: add distribution package for proportional EC shard placement Add a new reusable package for EC shard distribution that: - Supports configurable EC ratios (not hard-coded 10+4) - Distributes shards proportionally based on replication policy - Provides fault tolerance analysis - Prefers moving parity shards to keep data shards spread out Key components: - ECConfig: Configurable data/parity shard counts - ReplicationConfig: Parsed XYZ replication policy - ECDistribution: Target shard counts per DC/rack/node - Rebalancer: Plans shard moves with parity-first strategy This enables seaweed-enterprise custom EC ratios and weed worker integration while maintaining a clean, testable architecture. * shell: integrate distribution package for EC rebalancing Add shell wrappers around the distribution package: - ProportionalECRebalancer: Plans moves using distribution.Rebalancer - NewProportionalECRebalancerWithConfig: Supports custom EC configs - GetDistributionSummary/GetFaultToleranceAnalysis: Helper functions The shell layer converts between EcNode types and the generic TopologyNode types used by the distribution package. * test setup * ec: improve data and parity shard distribution across racks - Add shardsByTypePerRack helper to track data vs parity shards - Rewrite doBalanceEcShardsAcrossRacks for two-pass balancing: 1. Balance data shards (0-9) evenly, max ceil(10/6)=2 per rack 2. Balance parity shards (10-13) evenly, max ceil(4/6)=1 per rack - Add balanceShardTypeAcrossRacks for generic shard type balancing - Add pickRackForShardType to select destination with room for type - Add unit tests for even data/parity distribution verification This ensures even read load during normal operation by spreading both data and parity shards across all available racks. * ec: make data/parity shard counts configurable in ecBalancer - Add dataShardCount and parityShardCount fields to ecBalancer struct - Add getDataShardCount() and getParityShardCount() methods with defaults - Replace direct constant usage with configurable methods - Fix unused variable warning for parityPerRack This allows seaweed-enterprise to use custom EC ratios while defaulting to standard 10+4 scheme. * Address PR 7812 review comments Makefile improvements: - Save PIDs for each volume server for precise termination - Use PID-based killing in stop target with pkill fallback - Use more specific pkill patterns with TEST_DIR paths Documentation: - Document jq dependency in README.md Rebalancer fix: - Fix duplicate shard count updates in applyMovesToAnalysis - All planners (DC/rack/node) update counts inline during planning - Remove duplicate updates from applyMovesToAnalysis to avoid double-counting * test/erasure_coding: use mktemp for test file template Use mktemp instead of hardcoded /tmp/testfile_template.bin path to provide better isolation for concurrent test runs.
2025-12-19 13:29:12 -08:00
parent 77a56c2857
commit 4aa50bfa6a
16 changed files with 3127 additions and 28 deletions
--- a/weed/storage/erasure_coding/distribution/README.md
+++ b/weed/storage/erasure_coding/distribution/README.md
@@ -0,0 +1,209 @@
+# EC Distribution Package
+
+This package provides erasure coding (EC) shard distribution algorithms that are:
+
+- **Configurable**: Works with any EC ratio (e.g., 10+4, 8+4, 6+3)
+- **Reusable**: Used by shell commands, worker tasks, and seaweed-enterprise
+- **Topology-aware**: Distributes shards across data centers, racks, and nodes proportionally
+
+## Usage
+
+### Basic Usage with Default 10+4 EC
+
+```go
+import (
+    "github.com/seaweedfs/seaweedfs/weed/storage/erasure_coding/distribution"
+)
+
+// Parse replication policy
+rep, _ := distribution.NewReplicationConfigFromString("110")
+
+// Use default 10+4 EC configuration
+ec := distribution.DefaultECConfig()
+
+// Calculate distribution plan
+dist := distribution.CalculateDistribution(ec, rep)
+
+fmt.Println(dist.Summary())
+// Output:
+// EC Configuration: 10+4 (total: 14, can lose: 4)
+// Replication: replication=110 (DCs:2, Racks/DC:2, Nodes/Rack:1)
+// Distribution Plan:
+//   Data Centers: 2 (target 7 shards each, max 9)
+//   Racks per DC: 2 (target 4 shards each, max 6)
+//   Nodes per Rack: 1 (target 4 shards each, max 6)
+```
+
+### Custom EC Ratios (seaweed-enterprise)
+
+```go
+// Create custom 8+4 EC configuration
+ec, err := distribution.NewECConfig(8, 4)
+if err != nil {
+    log.Fatal(err)
+}
+
+rep, _ := distribution.NewReplicationConfigFromString("200")
+dist := distribution.CalculateDistribution(ec, rep)
+
+// Check fault tolerance
+fmt.Println(dist.FaultToleranceAnalysis())
+// Output:
+// Fault Tolerance Analysis for 8+4:
+//   DC Failure: SURVIVABLE ✓
+//     - Losing one DC loses ~4 shards
+//     - Remaining: 8 shards (need 8)
+```
+
+### Planning Shard Moves
+
+```go
+// Build topology analysis
+analysis := distribution.NewTopologyAnalysis()
+
+// Add nodes and their shard locations
+for _, node := range nodes {
+    analysis.AddNode(&distribution.TopologyNode{
+        NodeID:     node.ID,
+        DataCenter: node.DC,
+        Rack:       node.Rack,
+        FreeSlots:  node.FreeSlots,
+    })
+    for _, shardID := range node.ShardIDs {
+        analysis.AddShardLocation(distribution.ShardLocation{
+            ShardID:    shardID,
+            NodeID:     node.ID,
+            DataCenter: node.DC,
+            Rack:       node.Rack,
+        })
+    }
+}
+analysis.Finalize()
+
+// Create rebalancer and plan moves
+rebalancer := distribution.NewRebalancer(ec, rep)
+plan, err := rebalancer.PlanRebalance(analysis)
+
+for _, move := range plan.Moves {
+    fmt.Printf("Move shard %d from %s to %s\n", 
+        move.ShardID, move.SourceNode.NodeID, move.DestNode.NodeID)
+}
+```
+
+## Algorithm
+
+### Proportional Distribution
+
+The replication policy `XYZ` is interpreted as a ratio:
+
+| Replication | DCs | Racks/DC | Nodes/Rack | 14 Shards Distribution |
+|-------------|-----|----------|------------|------------------------|
+| `000` | 1 | 1 | 1 | All in one place |
+| `001` | 1 | 1 | 2 | 7 per node |
+| `010` | 1 | 2 | 1 | 7 per rack |
+| `100` | 2 | 1 | 1 | 7 per DC |
+| `110` | 2 | 2 | 1 | 7/DC, 4/rack |
+| `200` | 3 | 1 | 1 | 5 per DC |
+
+### Rebalancing Process
+
+1. **DC-level balancing**: Move shards to achieve target shards per DC
+2. **Rack-level balancing**: Within each DC, balance across racks
+3. **Node-level balancing**: Within each rack, balance across nodes
+
+### Shard Priority: Data First, Parity Moves First
+
+When rebalancing, the algorithm prioritizes keeping data shards spread out:
+
+- **Data shards (0 to DataShards-1)**: Serve read requests directly
+- **Parity shards (DataShards to TotalShards-1)**: Only used for reconstruction
+
+**Rebalancing Strategy**:
+- When moving shards FROM an overloaded node, **parity shards are moved first**
+- This keeps data shards in place on well-distributed nodes
+- Result: Data shards remain spread out for optimal read performance
+
+```go
+// Check shard type
+if ec.IsDataShard(shardID) {
+    // Shard serves read requests
+}
+if ec.IsParityShard(shardID) {
+    // Shard only used for reconstruction
+}
+
+// Sort shards for placement (data first for initial distribution)
+sorted := ec.SortShardsDataFirst(shards)
+
+// Sort shards for rebalancing (parity first to move them away)
+sorted := ec.SortShardsParityFirst(shards)
+```
+
+### Fault Tolerance
+
+The package provides fault tolerance analysis:
+
+- **DC Failure**: Can the data survive complete DC loss?
+- **Rack Failure**: Can the data survive complete rack loss?
+- **Node Failure**: Can the data survive single node loss?
+
+For example, with 10+4 EC (can lose 4 shards):
+- Need 4+ DCs for DC-level fault tolerance
+- Need 4+ racks for rack-level fault tolerance
+- Usually survivable at node level
+
+## API Reference
+
+### Types
+
+- `ECConfig`: EC configuration (data shards, parity shards)
+- `ReplicationConfig`: Parsed replication policy
+- `ECDistribution`: Calculated distribution plan
+- `TopologyAnalysis`: Current shard distribution analysis
+- `Rebalancer`: Plans shard moves
+- `RebalancePlan`: List of planned moves
+- `ShardMove`: Single shard move operation
+
+### Key Functions
+
+- `NewECConfig(data, parity int)`: Create EC configuration
+- `DefaultECConfig()`: Returns 10+4 configuration
+- `CalculateDistribution(ec, rep)`: Calculate distribution plan
+- `NewRebalancer(ec, rep)`: Create rebalancer
+- `PlanRebalance(analysis)`: Generate rebalancing plan
+
+## Integration
+
+### Shell Commands
+
+The shell package wraps this distribution package for `ec.balance`:
+
+```go
+import "github.com/seaweedfs/seaweedfs/weed/shell"
+
+rebalancer := shell.NewProportionalECRebalancer(nodes, rp, diskType)
+moves, _ := rebalancer.PlanMoves(volumeId, locations)
+```
+
+### Worker Tasks
+
+Worker tasks can use the distribution package directly:
+
+```go
+import "github.com/seaweedfs/seaweedfs/weed/storage/erasure_coding/distribution"
+
+ec := distribution.ECConfig{DataShards: 8, ParityShards: 4}
+rep := distribution.NewReplicationConfig(rp)
+dist := distribution.CalculateDistribution(ec, rep)
+```
+
+### seaweed-enterprise
+
+Enterprise features can provide custom EC configurations:
+
+```go
+// Custom EC ratio from license/config
+ec, _ := distribution.NewECConfig(customData, customParity)
+rebalancer := distribution.NewRebalancer(ec, rep)
+```
+
--- a/weed/storage/erasure_coding/distribution/analysis.go
+++ b/weed/storage/erasure_coding/distribution/analysis.go
@@ -0,0 +1,241 @@
+package distribution
+
+import (
+	"fmt"
+	"slices"
+)
+
+// ShardLocation represents where a shard is located in the topology
+type ShardLocation struct {
+	ShardID    int
+	NodeID     string
+	DataCenter string
+	Rack       string
+}
+
+// TopologyNode represents a node in the topology that can hold EC shards
+type TopologyNode struct {
+	NodeID     string
+	DataCenter string
+	Rack       string
+	FreeSlots  int      // Available slots for new shards
+	ShardIDs   []int    // Shard IDs currently on this node for a specific volume
+	TotalShards int     // Total shards on this node (for all volumes)
+}
+
+// TopologyAnalysis holds the current shard distribution analysis for a volume
+type TopologyAnalysis struct {
+	// Shard counts at each level
+	ShardsByDC   map[string]int
+	ShardsByRack map[string]int
+	ShardsByNode map[string]int
+
+	// Detailed shard locations
+	DCToShards   map[string][]int // DC -> list of shard IDs
+	RackToShards map[string][]int // Rack -> list of shard IDs
+	NodeToShards map[string][]int // NodeID -> list of shard IDs
+
+	// Topology structure
+	DCToRacks   map[string][]string            // DC -> list of rack IDs
+	RackToNodes map[string][]*TopologyNode     // Rack -> list of nodes
+	AllNodes    map[string]*TopologyNode       // NodeID -> node info
+
+	// Statistics
+	TotalShards int
+	TotalNodes  int
+	TotalRacks  int
+	TotalDCs    int
+}
+
+// NewTopologyAnalysis creates a new empty analysis
+func NewTopologyAnalysis() *TopologyAnalysis {
+	return &TopologyAnalysis{
+		ShardsByDC:   make(map[string]int),
+		ShardsByRack: make(map[string]int),
+		ShardsByNode: make(map[string]int),
+		DCToShards:   make(map[string][]int),
+		RackToShards: make(map[string][]int),
+		NodeToShards: make(map[string][]int),
+		DCToRacks:    make(map[string][]string),
+		RackToNodes:  make(map[string][]*TopologyNode),
+		AllNodes:     make(map[string]*TopologyNode),
+	}
+}
+
+// AddShardLocation adds a shard location to the analysis
+func (a *TopologyAnalysis) AddShardLocation(loc ShardLocation) {
+	// Update counts
+	a.ShardsByDC[loc.DataCenter]++
+	a.ShardsByRack[loc.Rack]++
+	a.ShardsByNode[loc.NodeID]++
+
+	// Update shard lists
+	a.DCToShards[loc.DataCenter] = append(a.DCToShards[loc.DataCenter], loc.ShardID)
+	a.RackToShards[loc.Rack] = append(a.RackToShards[loc.Rack], loc.ShardID)
+	a.NodeToShards[loc.NodeID] = append(a.NodeToShards[loc.NodeID], loc.ShardID)
+
+	a.TotalShards++
+}
+
+// AddNode adds a node to the topology (even if it has no shards)
+func (a *TopologyAnalysis) AddNode(node *TopologyNode) {
+	if _, exists := a.AllNodes[node.NodeID]; exists {
+		return // Already added
+	}
+
+	a.AllNodes[node.NodeID] = node
+	a.TotalNodes++
+
+	// Update topology structure
+	if !slices.Contains(a.DCToRacks[node.DataCenter], node.Rack) {
+		a.DCToRacks[node.DataCenter] = append(a.DCToRacks[node.DataCenter], node.Rack)
+	}
+	a.RackToNodes[node.Rack] = append(a.RackToNodes[node.Rack], node)
+
+	// Update counts
+	if _, exists := a.ShardsByDC[node.DataCenter]; !exists {
+		a.TotalDCs++
+	}
+	if _, exists := a.ShardsByRack[node.Rack]; !exists {
+		a.TotalRacks++
+	}
+}
+
+// Finalize computes final statistics after all data is added
+func (a *TopologyAnalysis) Finalize() {
+	// Ensure we have accurate DC and rack counts
+	dcSet := make(map[string]bool)
+	rackSet := make(map[string]bool)
+	for _, node := range a.AllNodes {
+		dcSet[node.DataCenter] = true
+		rackSet[node.Rack] = true
+	}
+	a.TotalDCs = len(dcSet)
+	a.TotalRacks = len(rackSet)
+	a.TotalNodes = len(a.AllNodes)
+}
+
+// String returns a summary of the analysis
+func (a *TopologyAnalysis) String() string {
+	return fmt.Sprintf("TopologyAnalysis{shards:%d, nodes:%d, racks:%d, dcs:%d}",
+		a.TotalShards, a.TotalNodes, a.TotalRacks, a.TotalDCs)
+}
+
+// DetailedString returns a detailed multi-line summary
+func (a *TopologyAnalysis) DetailedString() string {
+	s := fmt.Sprintf("Topology Analysis:\n")
+	s += fmt.Sprintf("  Total Shards: %d\n", a.TotalShards)
+	s += fmt.Sprintf("  Data Centers: %d\n", a.TotalDCs)
+	for dc, count := range a.ShardsByDC {
+		s += fmt.Sprintf("    %s: %d shards\n", dc, count)
+	}
+	s += fmt.Sprintf("  Racks: %d\n", a.TotalRacks)
+	for rack, count := range a.ShardsByRack {
+		s += fmt.Sprintf("    %s: %d shards\n", rack, count)
+	}
+	s += fmt.Sprintf("  Nodes: %d\n", a.TotalNodes)
+	for nodeID, count := range a.ShardsByNode {
+		if count > 0 {
+			s += fmt.Sprintf("    %s: %d shards\n", nodeID, count)
+		}
+	}
+	return s
+}
+
+// TopologyExcess represents a topology level (DC/rack/node) with excess shards
+type TopologyExcess struct {
+	ID       string           // DC/rack/node ID
+	Level    string           // "dc", "rack", or "node"
+	Excess   int              // Number of excess shards (above target)
+	Shards   []int            // Shard IDs at this level
+	Nodes    []*TopologyNode  // Nodes at this level (for finding sources)
+}
+
+// CalculateDCExcess returns DCs with more shards than the target
+func CalculateDCExcess(analysis *TopologyAnalysis, dist *ECDistribution) []TopologyExcess {
+	var excess []TopologyExcess
+
+	for dc, count := range analysis.ShardsByDC {
+		if count > dist.TargetShardsPerDC {
+			// Collect nodes in this DC
+			var nodes []*TopologyNode
+			for _, rack := range analysis.DCToRacks[dc] {
+				nodes = append(nodes, analysis.RackToNodes[rack]...)
+			}
+			excess = append(excess, TopologyExcess{
+				ID:     dc,
+				Level:  "dc",
+				Excess: count - dist.TargetShardsPerDC,
+				Shards: analysis.DCToShards[dc],
+				Nodes:  nodes,
+			})
+		}
+	}
+
+	// Sort by excess (most excess first)
+	slices.SortFunc(excess, func(a, b TopologyExcess) int {
+		return b.Excess - a.Excess
+	})
+
+	return excess
+}
+
+// CalculateRackExcess returns racks with more shards than the target (within a DC)
+func CalculateRackExcess(analysis *TopologyAnalysis, dc string, targetPerRack int) []TopologyExcess {
+	var excess []TopologyExcess
+
+	for _, rack := range analysis.DCToRacks[dc] {
+		count := analysis.ShardsByRack[rack]
+		if count > targetPerRack {
+			excess = append(excess, TopologyExcess{
+				ID:     rack,
+				Level:  "rack",
+				Excess: count - targetPerRack,
+				Shards: analysis.RackToShards[rack],
+				Nodes:  analysis.RackToNodes[rack],
+			})
+		}
+	}
+
+	slices.SortFunc(excess, func(a, b TopologyExcess) int {
+		return b.Excess - a.Excess
+	})
+
+	return excess
+}
+
+// CalculateUnderservedDCs returns DCs that have fewer shards than target
+func CalculateUnderservedDCs(analysis *TopologyAnalysis, dist *ECDistribution) []string {
+	var underserved []string
+
+	// Check existing DCs
+	for dc, count := range analysis.ShardsByDC {
+		if count < dist.TargetShardsPerDC {
+			underserved = append(underserved, dc)
+		}
+	}
+
+	// Check DCs with nodes but no shards
+	for dc := range analysis.DCToRacks {
+		if _, exists := analysis.ShardsByDC[dc]; !exists {
+			underserved = append(underserved, dc)
+		}
+	}
+
+	return underserved
+}
+
+// CalculateUnderservedRacks returns racks that have fewer shards than target
+func CalculateUnderservedRacks(analysis *TopologyAnalysis, dc string, targetPerRack int) []string {
+	var underserved []string
+
+	for _, rack := range analysis.DCToRacks[dc] {
+		count := analysis.ShardsByRack[rack]
+		if count < targetPerRack {
+			underserved = append(underserved, rack)
+		}
+	}
+
+	return underserved
+}
+
--- a/weed/storage/erasure_coding/distribution/config.go
+++ b/weed/storage/erasure_coding/distribution/config.go
@@ -0,0 +1,171 @@
+// Package distribution provides EC shard distribution algorithms with configurable EC ratios.
+package distribution
+
+import (
+	"fmt"
+
+	"github.com/seaweedfs/seaweedfs/weed/storage/super_block"
+)
+
+// ECConfig holds erasure coding configuration parameters.
+// This replaces hard-coded constants like DataShardsCount=10, ParityShardsCount=4.
+type ECConfig struct {
+	DataShards   int // Number of data shards (e.g., 10)
+	ParityShards int // Number of parity shards (e.g., 4)
+}
+
+// DefaultECConfig returns the standard 10+4 EC configuration
+func DefaultECConfig() ECConfig {
+	return ECConfig{
+		DataShards:   10,
+		ParityShards: 4,
+	}
+}
+
+// NewECConfig creates a new EC configuration with validation
+func NewECConfig(dataShards, parityShards int) (ECConfig, error) {
+	if dataShards <= 0 {
+		return ECConfig{}, fmt.Errorf("dataShards must be positive, got %d", dataShards)
+	}
+	if parityShards <= 0 {
+		return ECConfig{}, fmt.Errorf("parityShards must be positive, got %d", parityShards)
+	}
+	if dataShards+parityShards > 32 {
+		return ECConfig{}, fmt.Errorf("total shards (%d+%d=%d) exceeds maximum of 32",
+			dataShards, parityShards, dataShards+parityShards)
+	}
+	return ECConfig{
+		DataShards:   dataShards,
+		ParityShards: parityShards,
+	}, nil
+}
+
+// TotalShards returns the total number of shards (data + parity)
+func (c ECConfig) TotalShards() int {
+	return c.DataShards + c.ParityShards
+}
+
+// MaxTolerableLoss returns the maximum number of shards that can be lost
+// while still being able to reconstruct the data
+func (c ECConfig) MaxTolerableLoss() int {
+	return c.ParityShards
+}
+
+// MinShardsForReconstruction returns the minimum number of shards needed
+// to reconstruct the original data
+func (c ECConfig) MinShardsForReconstruction() int {
+	return c.DataShards
+}
+
+// String returns a human-readable representation
+func (c ECConfig) String() string {
+	return fmt.Sprintf("%d+%d (total: %d, can lose: %d)",
+		c.DataShards, c.ParityShards, c.TotalShards(), c.MaxTolerableLoss())
+}
+
+// IsDataShard returns true if the shard ID is a data shard (0 to DataShards-1)
+func (c ECConfig) IsDataShard(shardID int) bool {
+	return shardID >= 0 && shardID < c.DataShards
+}
+
+// IsParityShard returns true if the shard ID is a parity shard (DataShards to TotalShards-1)
+func (c ECConfig) IsParityShard(shardID int) bool {
+	return shardID >= c.DataShards && shardID < c.TotalShards()
+}
+
+// SortShardsDataFirst returns a copy of shards sorted with data shards first.
+// This is useful for initial placement where data shards should be spread out first.
+func (c ECConfig) SortShardsDataFirst(shards []int) []int {
+	result := make([]int, len(shards))
+	copy(result, shards)
+
+	// Partition: data shards first, then parity shards
+	dataIdx := 0
+	parityIdx := len(result) - 1
+
+	sorted := make([]int, len(result))
+	for _, s := range result {
+		if c.IsDataShard(s) {
+			sorted[dataIdx] = s
+			dataIdx++
+		} else {
+			sorted[parityIdx] = s
+			parityIdx--
+		}
+	}
+
+	return sorted
+}
+
+// SortShardsParityFirst returns a copy of shards sorted with parity shards first.
+// This is useful for rebalancing where we prefer to move parity shards.
+func (c ECConfig) SortShardsParityFirst(shards []int) []int {
+	result := make([]int, len(shards))
+	copy(result, shards)
+
+	// Partition: parity shards first, then data shards
+	parityIdx := 0
+	dataIdx := len(result) - 1
+
+	sorted := make([]int, len(result))
+	for _, s := range result {
+		if c.IsParityShard(s) {
+			sorted[parityIdx] = s
+			parityIdx++
+		} else {
+			sorted[dataIdx] = s
+			dataIdx--
+		}
+	}
+
+	return sorted
+}
+
+// ReplicationConfig holds the parsed replication policy
+type ReplicationConfig struct {
+	MinDataCenters  int // X+1 from XYZ replication (minimum DCs to use)
+	MinRacksPerDC   int // Y+1 from XYZ replication (minimum racks per DC)
+	MinNodesPerRack int // Z+1 from XYZ replication (minimum nodes per rack)
+
+	// Original replication string (for logging/debugging)
+	Original string
+}
+
+// NewReplicationConfig creates a ReplicationConfig from a ReplicaPlacement
+func NewReplicationConfig(rp *super_block.ReplicaPlacement) ReplicationConfig {
+	if rp == nil {
+		return ReplicationConfig{
+			MinDataCenters:  1,
+			MinRacksPerDC:   1,
+			MinNodesPerRack: 1,
+			Original:        "000",
+		}
+	}
+	return ReplicationConfig{
+		MinDataCenters:  rp.DiffDataCenterCount + 1,
+		MinRacksPerDC:   rp.DiffRackCount + 1,
+		MinNodesPerRack: rp.SameRackCount + 1,
+		Original:        rp.String(),
+	}
+}
+
+// NewReplicationConfigFromString creates a ReplicationConfig from a replication string
+func NewReplicationConfigFromString(replication string) (ReplicationConfig, error) {
+	rp, err := super_block.NewReplicaPlacementFromString(replication)
+	if err != nil {
+		return ReplicationConfig{}, err
+	}
+	return NewReplicationConfig(rp), nil
+}
+
+// TotalPlacementSlots returns the minimum number of unique placement locations
+// based on the replication policy
+func (r ReplicationConfig) TotalPlacementSlots() int {
+	return r.MinDataCenters * r.MinRacksPerDC * r.MinNodesPerRack
+}
+
+// String returns a human-readable representation
+func (r ReplicationConfig) String() string {
+	return fmt.Sprintf("replication=%s (DCs:%d, Racks/DC:%d, Nodes/Rack:%d)",
+		r.Original, r.MinDataCenters, r.MinRacksPerDC, r.MinNodesPerRack)
+}
--- a/weed/storage/erasure_coding/distribution/distribution.go
+++ b/weed/storage/erasure_coding/distribution/distribution.go
@@ -0,0 +1,161 @@
+package distribution
+
+import (
+	"fmt"
+)
+
+// ECDistribution represents the target distribution of EC shards
+// based on EC configuration and replication policy.
+type ECDistribution struct {
+	// EC configuration
+	ECConfig ECConfig
+
+	// Replication configuration
+	ReplicationConfig ReplicationConfig
+
+	// Target shard counts per topology level (balanced distribution)
+	TargetShardsPerDC   int
+	TargetShardsPerRack int
+	TargetShardsPerNode int
+
+	// Maximum shard counts per topology level (fault tolerance limits)
+	// These prevent any single failure domain from having too many shards
+	MaxShardsPerDC   int
+	MaxShardsPerRack int
+	MaxShardsPerNode int
+}
+
+// CalculateDistribution computes the target EC shard distribution based on
+// EC configuration and replication policy.
+//
+// The algorithm:
+// 1. Uses replication policy to determine minimum topology spread
+// 2. Calculates target shards per level (evenly distributed)
+// 3. Calculates max shards per level (for fault tolerance)
+func CalculateDistribution(ec ECConfig, rep ReplicationConfig) *ECDistribution {
+	totalShards := ec.TotalShards()
+
+	// Target distribution (balanced, rounded up to ensure all shards placed)
+	targetShardsPerDC := ceilDivide(totalShards, rep.MinDataCenters)
+	targetShardsPerRack := ceilDivide(targetShardsPerDC, rep.MinRacksPerDC)
+	targetShardsPerNode := ceilDivide(targetShardsPerRack, rep.MinNodesPerRack)
+
+	// Maximum limits for fault tolerance
+	// The key constraint: losing one failure domain shouldn't lose more than parityShards
+	// So max shards per domain = totalShards - parityShards + tolerance
+	// We add small tolerance (+2) to allow for imbalanced topologies
+	faultToleranceLimit := totalShards - ec.ParityShards + 1
+
+	maxShardsPerDC := min(faultToleranceLimit, targetShardsPerDC+2)
+	maxShardsPerRack := min(faultToleranceLimit, targetShardsPerRack+2)
+	maxShardsPerNode := min(faultToleranceLimit, targetShardsPerNode+2)
+
+	return &ECDistribution{
+		ECConfig:            ec,
+		ReplicationConfig:   rep,
+		TargetShardsPerDC:   targetShardsPerDC,
+		TargetShardsPerRack: targetShardsPerRack,
+		TargetShardsPerNode: targetShardsPerNode,
+		MaxShardsPerDC:      maxShardsPerDC,
+		MaxShardsPerRack:    maxShardsPerRack,
+		MaxShardsPerNode:    maxShardsPerNode,
+	}
+}
+
+// String returns a human-readable description of the distribution
+func (d *ECDistribution) String() string {
+	return fmt.Sprintf(
+		"ECDistribution{EC:%s, DCs:%d (target:%d/max:%d), Racks/DC:%d (target:%d/max:%d), Nodes/Rack:%d (target:%d/max:%d)}",
+		d.ECConfig.String(),
+		d.ReplicationConfig.MinDataCenters, d.TargetShardsPerDC, d.MaxShardsPerDC,
+		d.ReplicationConfig.MinRacksPerDC, d.TargetShardsPerRack, d.MaxShardsPerRack,
+		d.ReplicationConfig.MinNodesPerRack, d.TargetShardsPerNode, d.MaxShardsPerNode,
+	)
+}
+
+// Summary returns a multi-line summary of the distribution plan
+func (d *ECDistribution) Summary() string {
+	summary := fmt.Sprintf("EC Configuration: %s\n", d.ECConfig.String())
+	summary += fmt.Sprintf("Replication: %s\n", d.ReplicationConfig.String())
+	summary += fmt.Sprintf("Distribution Plan:\n")
+	summary += fmt.Sprintf("  Data Centers: %d (target %d shards each, max %d)\n",
+		d.ReplicationConfig.MinDataCenters, d.TargetShardsPerDC, d.MaxShardsPerDC)
+	summary += fmt.Sprintf("  Racks per DC: %d (target %d shards each, max %d)\n",
+		d.ReplicationConfig.MinRacksPerDC, d.TargetShardsPerRack, d.MaxShardsPerRack)
+	summary += fmt.Sprintf("  Nodes per Rack: %d (target %d shards each, max %d)\n",
+		d.ReplicationConfig.MinNodesPerRack, d.TargetShardsPerNode, d.MaxShardsPerNode)
+	return summary
+}
+
+// CanSurviveDCFailure returns true if the distribution can survive
+// complete loss of one data center
+func (d *ECDistribution) CanSurviveDCFailure() bool {
+	// After losing one DC with max shards, check if remaining shards are enough
+	remainingAfterDCLoss := d.ECConfig.TotalShards() - d.TargetShardsPerDC
+	return remainingAfterDCLoss >= d.ECConfig.MinShardsForReconstruction()
+}
+
+// CanSurviveRackFailure returns true if the distribution can survive
+// complete loss of one rack
+func (d *ECDistribution) CanSurviveRackFailure() bool {
+	remainingAfterRackLoss := d.ECConfig.TotalShards() - d.TargetShardsPerRack
+	return remainingAfterRackLoss >= d.ECConfig.MinShardsForReconstruction()
+}
+
+// MinDCsForDCFaultTolerance calculates the minimum number of DCs needed
+// to survive complete DC failure with this EC configuration
+func (d *ECDistribution) MinDCsForDCFaultTolerance() int {
+	// To survive DC failure, max shards per DC = parityShards
+	maxShardsPerDC := d.ECConfig.MaxTolerableLoss()
+	if maxShardsPerDC == 0 {
+		return d.ECConfig.TotalShards() // Would need one DC per shard
+	}
+	return ceilDivide(d.ECConfig.TotalShards(), maxShardsPerDC)
+}
+
+// FaultToleranceAnalysis returns a detailed analysis of fault tolerance
+func (d *ECDistribution) FaultToleranceAnalysis() string {
+	analysis := fmt.Sprintf("Fault Tolerance Analysis for %s:\n", d.ECConfig.String())
+
+	// DC failure
+	dcSurvive := d.CanSurviveDCFailure()
+	shardsAfterDC := d.ECConfig.TotalShards() - d.TargetShardsPerDC
+	analysis += fmt.Sprintf("  DC Failure: %s\n", boolToResult(dcSurvive))
+	analysis += fmt.Sprintf("    - Losing one DC loses ~%d shards\n", d.TargetShardsPerDC)
+	analysis += fmt.Sprintf("    - Remaining: %d shards (need %d)\n", shardsAfterDC, d.ECConfig.DataShards)
+	if !dcSurvive {
+		analysis += fmt.Sprintf("    - Need at least %d DCs for DC fault tolerance\n", d.MinDCsForDCFaultTolerance())
+	}
+
+	// Rack failure
+	rackSurvive := d.CanSurviveRackFailure()
+	shardsAfterRack := d.ECConfig.TotalShards() - d.TargetShardsPerRack
+	analysis += fmt.Sprintf("  Rack Failure: %s\n", boolToResult(rackSurvive))
+	analysis += fmt.Sprintf("    - Losing one rack loses ~%d shards\n", d.TargetShardsPerRack)
+	analysis += fmt.Sprintf("    - Remaining: %d shards (need %d)\n", shardsAfterRack, d.ECConfig.DataShards)
+
+	// Node failure (usually survivable)
+	shardsAfterNode := d.ECConfig.TotalShards() - d.TargetShardsPerNode
+	nodeSurvive := shardsAfterNode >= d.ECConfig.DataShards
+	analysis += fmt.Sprintf("  Node Failure: %s\n", boolToResult(nodeSurvive))
+	analysis += fmt.Sprintf("    - Losing one node loses ~%d shards\n", d.TargetShardsPerNode)
+	analysis += fmt.Sprintf("    - Remaining: %d shards (need %d)\n", shardsAfterNode, d.ECConfig.DataShards)
+
+	return analysis
+}
+
+func boolToResult(b bool) string {
+	if b {
+		return "SURVIVABLE ✓"
+	}
+	return "NOT SURVIVABLE ✗"
+}
+
+// ceilDivide performs ceiling division
+func ceilDivide(a, b int) int {
+	if b <= 0 {
+		return a
+	}
+	return (a + b - 1) / b
+}
+
--- a/weed/storage/erasure_coding/distribution/distribution_test.go
+++ b/weed/storage/erasure_coding/distribution/distribution_test.go
@@ -0,0 +1,565 @@
+package distribution
+
+import (
+	"testing"
+)
+
+func TestNewECConfig(t *testing.T) {
+	tests := []struct {
+		name         string
+		dataShards   int
+		parityShards int
+		wantErr      bool
+	}{
+		{"valid 10+4", 10, 4, false},
+		{"valid 8+4", 8, 4, false},
+		{"valid 6+3", 6, 3, false},
+		{"valid 4+2", 4, 2, false},
+		{"invalid data=0", 0, 4, true},
+		{"invalid parity=0", 10, 0, true},
+		{"invalid total>32", 20, 15, true},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			config, err := NewECConfig(tt.dataShards, tt.parityShards)
+			if (err != nil) != tt.wantErr {
+				t.Errorf("NewECConfig() error = %v, wantErr %v", err, tt.wantErr)
+				return
+			}
+			if !tt.wantErr {
+				if config.DataShards != tt.dataShards {
+					t.Errorf("DataShards = %d, want %d", config.DataShards, tt.dataShards)
+				}
+				if config.ParityShards != tt.parityShards {
+					t.Errorf("ParityShards = %d, want %d", config.ParityShards, tt.parityShards)
+				}
+				if config.TotalShards() != tt.dataShards+tt.parityShards {
+					t.Errorf("TotalShards() = %d, want %d", config.TotalShards(), tt.dataShards+tt.parityShards)
+				}
+			}
+		})
+	}
+}
+
+func TestCalculateDistribution(t *testing.T) {
+	tests := []struct {
+		name                    string
+		ecConfig                ECConfig
+		replication             string
+		expectedMinDCs          int
+		expectedMinRacksPerDC   int
+		expectedMinNodesPerRack int
+		expectedTargetPerDC     int
+		expectedTargetPerRack   int
+		expectedTargetPerNode   int
+	}{
+		{
+			name:                    "10+4 with 000",
+			ecConfig:                DefaultECConfig(),
+			replication:             "000",
+			expectedMinDCs:          1,
+			expectedMinRacksPerDC:   1,
+			expectedMinNodesPerRack: 1,
+			expectedTargetPerDC:     14,
+			expectedTargetPerRack:   14,
+			expectedTargetPerNode:   14,
+		},
+		{
+			name:                    "10+4 with 100",
+			ecConfig:                DefaultECConfig(),
+			replication:             "100",
+			expectedMinDCs:          2,
+			expectedMinRacksPerDC:   1,
+			expectedMinNodesPerRack: 1,
+			expectedTargetPerDC:     7,
+			expectedTargetPerRack:   7,
+			expectedTargetPerNode:   7,
+		},
+		{
+			name:                    "10+4 with 110",
+			ecConfig:                DefaultECConfig(),
+			replication:             "110",
+			expectedMinDCs:          2,
+			expectedMinRacksPerDC:   2,
+			expectedMinNodesPerRack: 1,
+			expectedTargetPerDC:     7,
+			expectedTargetPerRack:   4,
+			expectedTargetPerNode:   4,
+		},
+		{
+			name:                    "10+4 with 200",
+			ecConfig:                DefaultECConfig(),
+			replication:             "200",
+			expectedMinDCs:          3,
+			expectedMinRacksPerDC:   1,
+			expectedMinNodesPerRack: 1,
+			expectedTargetPerDC:     5,
+			expectedTargetPerRack:   5,
+			expectedTargetPerNode:   5,
+		},
+		{
+			name: "8+4 with 110",
+			ecConfig: ECConfig{
+				DataShards:   8,
+				ParityShards: 4,
+			},
+			replication:             "110",
+			expectedMinDCs:          2,
+			expectedMinRacksPerDC:   2,
+			expectedMinNodesPerRack: 1,
+			expectedTargetPerDC:     6, // 12/2 = 6
+			expectedTargetPerRack:   3, // 6/2 = 3
+			expectedTargetPerNode:   3,
+		},
+		{
+			name: "6+3 with 100",
+			ecConfig: ECConfig{
+				DataShards:   6,
+				ParityShards: 3,
+			},
+			replication:             "100",
+			expectedMinDCs:          2,
+			expectedMinRacksPerDC:   1,
+			expectedMinNodesPerRack: 1,
+			expectedTargetPerDC:     5, // ceil(9/2) = 5
+			expectedTargetPerRack:   5,
+			expectedTargetPerNode:   5,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			rep, err := NewReplicationConfigFromString(tt.replication)
+			if err != nil {
+				t.Fatalf("Failed to parse replication %s: %v", tt.replication, err)
+			}
+
+			dist := CalculateDistribution(tt.ecConfig, rep)
+
+			if dist.ReplicationConfig.MinDataCenters != tt.expectedMinDCs {
+				t.Errorf("MinDataCenters = %d, want %d", dist.ReplicationConfig.MinDataCenters, tt.expectedMinDCs)
+			}
+			if dist.ReplicationConfig.MinRacksPerDC != tt.expectedMinRacksPerDC {
+				t.Errorf("MinRacksPerDC = %d, want %d", dist.ReplicationConfig.MinRacksPerDC, tt.expectedMinRacksPerDC)
+			}
+			if dist.ReplicationConfig.MinNodesPerRack != tt.expectedMinNodesPerRack {
+				t.Errorf("MinNodesPerRack = %d, want %d", dist.ReplicationConfig.MinNodesPerRack, tt.expectedMinNodesPerRack)
+			}
+			if dist.TargetShardsPerDC != tt.expectedTargetPerDC {
+				t.Errorf("TargetShardsPerDC = %d, want %d", dist.TargetShardsPerDC, tt.expectedTargetPerDC)
+			}
+			if dist.TargetShardsPerRack != tt.expectedTargetPerRack {
+				t.Errorf("TargetShardsPerRack = %d, want %d", dist.TargetShardsPerRack, tt.expectedTargetPerRack)
+			}
+			if dist.TargetShardsPerNode != tt.expectedTargetPerNode {
+				t.Errorf("TargetShardsPerNode = %d, want %d", dist.TargetShardsPerNode, tt.expectedTargetPerNode)
+			}
+
+			t.Logf("Distribution for %s: %s", tt.name, dist.String())
+		})
+	}
+}
+
+func TestFaultToleranceAnalysis(t *testing.T) {
+	tests := []struct {
+		name           string
+		ecConfig       ECConfig
+		replication    string
+		canSurviveDC   bool
+		canSurviveRack bool
+	}{
+		// 10+4 = 14 shards, need 10 to reconstruct, can lose 4
+		{"10+4 000", DefaultECConfig(), "000", false, false}, // All in one, any failure is fatal
+		{"10+4 100", DefaultECConfig(), "100", false, false}, // 7 per DC/rack, 7 remaining < 10
+		{"10+4 200", DefaultECConfig(), "200", false, false}, // 5 per DC/rack, 9 remaining < 10
+		{"10+4 110", DefaultECConfig(), "110", false, true},  // 4 per rack, 10 remaining = enough for rack
+
+		// 8+4 = 12 shards, need 8 to reconstruct, can lose 4
+		{"8+4 100", ECConfig{8, 4}, "100", false, false}, // 6 per DC/rack, 6 remaining < 8
+		{"8+4 200", ECConfig{8, 4}, "200", true, true},   // 4 per DC/rack, 8 remaining = enough!
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			rep, _ := NewReplicationConfigFromString(tt.replication)
+			dist := CalculateDistribution(tt.ecConfig, rep)
+
+			if dist.CanSurviveDCFailure() != tt.canSurviveDC {
+				t.Errorf("CanSurviveDCFailure() = %v, want %v", dist.CanSurviveDCFailure(), tt.canSurviveDC)
+			}
+			if dist.CanSurviveRackFailure() != tt.canSurviveRack {
+				t.Errorf("CanSurviveRackFailure() = %v, want %v", dist.CanSurviveRackFailure(), tt.canSurviveRack)
+			}
+
+			t.Log(dist.FaultToleranceAnalysis())
+		})
+	}
+}
+
+func TestMinDCsForDCFaultTolerance(t *testing.T) {
+	tests := []struct {
+		name     string
+		ecConfig ECConfig
+		minDCs   int
+	}{
+		// 10+4: can lose 4, so max 4 per DC, 14/4 = 4 DCs needed
+		{"10+4", DefaultECConfig(), 4},
+		// 8+4: can lose 4, so max 4 per DC, 12/4 = 3 DCs needed
+		{"8+4", ECConfig{8, 4}, 3},
+		// 6+3: can lose 3, so max 3 per DC, 9/3 = 3 DCs needed
+		{"6+3", ECConfig{6, 3}, 3},
+		// 4+2: can lose 2, so max 2 per DC, 6/2 = 3 DCs needed
+		{"4+2", ECConfig{4, 2}, 3},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			rep, _ := NewReplicationConfigFromString("000")
+			dist := CalculateDistribution(tt.ecConfig, rep)
+
+			if dist.MinDCsForDCFaultTolerance() != tt.minDCs {
+				t.Errorf("MinDCsForDCFaultTolerance() = %d, want %d",
+					dist.MinDCsForDCFaultTolerance(), tt.minDCs)
+			}
+
+			t.Logf("%s: needs %d DCs for DC fault tolerance", tt.name, dist.MinDCsForDCFaultTolerance())
+		})
+	}
+}
+
+func TestTopologyAnalysis(t *testing.T) {
+	analysis := NewTopologyAnalysis()
+
+	// Add nodes to topology
+	node1 := &TopologyNode{
+		NodeID:     "node1",
+		DataCenter: "dc1",
+		Rack:       "rack1",
+		FreeSlots:  5,
+	}
+	node2 := &TopologyNode{
+		NodeID:     "node2",
+		DataCenter: "dc1",
+		Rack:       "rack2",
+		FreeSlots:  10,
+	}
+	node3 := &TopologyNode{
+		NodeID:     "node3",
+		DataCenter: "dc2",
+		Rack:       "rack3",
+		FreeSlots:  10,
+	}
+
+	analysis.AddNode(node1)
+	analysis.AddNode(node2)
+	analysis.AddNode(node3)
+
+	// Add shard locations (all on node1)
+	for i := 0; i < 14; i++ {
+		analysis.AddShardLocation(ShardLocation{
+			ShardID:    i,
+			NodeID:     "node1",
+			DataCenter: "dc1",
+			Rack:       "rack1",
+		})
+	}
+
+	analysis.Finalize()
+
+	// Verify counts
+	if analysis.TotalShards != 14 {
+		t.Errorf("TotalShards = %d, want 14", analysis.TotalShards)
+	}
+	if analysis.ShardsByDC["dc1"] != 14 {
+		t.Errorf("ShardsByDC[dc1] = %d, want 14", analysis.ShardsByDC["dc1"])
+	}
+	if analysis.ShardsByRack["rack1"] != 14 {
+		t.Errorf("ShardsByRack[rack1] = %d, want 14", analysis.ShardsByRack["rack1"])
+	}
+	if analysis.ShardsByNode["node1"] != 14 {
+		t.Errorf("ShardsByNode[node1] = %d, want 14", analysis.ShardsByNode["node1"])
+	}
+
+	t.Log(analysis.DetailedString())
+}
+
+func TestRebalancer(t *testing.T) {
+	// Build topology: 2 DCs, 2 racks each, all shards on one node
+	analysis := NewTopologyAnalysis()
+
+	// Add nodes
+	nodes := []*TopologyNode{
+		{NodeID: "dc1-rack1-node1", DataCenter: "dc1", Rack: "dc1-rack1", FreeSlots: 0},
+		{NodeID: "dc1-rack2-node1", DataCenter: "dc1", Rack: "dc1-rack2", FreeSlots: 10},
+		{NodeID: "dc2-rack1-node1", DataCenter: "dc2", Rack: "dc2-rack1", FreeSlots: 10},
+		{NodeID: "dc2-rack2-node1", DataCenter: "dc2", Rack: "dc2-rack2", FreeSlots: 10},
+	}
+	for _, node := range nodes {
+		analysis.AddNode(node)
+	}
+
+	// Add all 14 shards to first node
+	for i := 0; i < 14; i++ {
+		analysis.AddShardLocation(ShardLocation{
+			ShardID:    i,
+			NodeID:     "dc1-rack1-node1",
+			DataCenter: "dc1",
+			Rack:       "dc1-rack1",
+		})
+	}
+	analysis.Finalize()
+
+	// Create rebalancer with 110 replication (2 DCs, 2 racks each)
+	ec := DefaultECConfig()
+	rep, _ := NewReplicationConfigFromString("110")
+	rebalancer := NewRebalancer(ec, rep)
+
+	plan, err := rebalancer.PlanRebalance(analysis)
+	if err != nil {
+		t.Fatalf("PlanRebalance failed: %v", err)
+	}
+
+	t.Logf("Planned %d moves", plan.TotalMoves)
+	t.Log(plan.DetailedString())
+
+	// Verify we're moving shards to dc2
+	movedToDC2 := 0
+	for _, move := range plan.Moves {
+		if move.DestNode.DataCenter == "dc2" {
+			movedToDC2++
+		}
+	}
+
+	if movedToDC2 == 0 {
+		t.Error("Expected some moves to dc2")
+	}
+
+	// With "110" replication, target is 7 shards per DC
+	// Starting with 14 in dc1, should plan to move 7 to dc2
+	if plan.MovesAcrossDC < 7 {
+		t.Errorf("Expected at least 7 cross-DC moves for 110 replication, got %d", plan.MovesAcrossDC)
+	}
+}
+
+func TestCustomECRatios(t *testing.T) {
+	// Test various custom EC ratios that seaweed-enterprise might use
+	ratios := []struct {
+		name   string
+		data   int
+		parity int
+	}{
+		{"4+2", 4, 2},
+		{"6+3", 6, 3},
+		{"8+2", 8, 2},
+		{"8+4", 8, 4},
+		{"10+4", 10, 4},
+		{"12+4", 12, 4},
+		{"16+4", 16, 4},
+	}
+
+	for _, ratio := range ratios {
+		t.Run(ratio.name, func(t *testing.T) {
+			ec, err := NewECConfig(ratio.data, ratio.parity)
+			if err != nil {
+				t.Fatalf("Failed to create EC config: %v", err)
+			}
+
+			rep, _ := NewReplicationConfigFromString("110")
+			dist := CalculateDistribution(ec, rep)
+
+			t.Logf("EC %s with replication 110:", ratio.name)
+			t.Logf("  Total shards: %d", ec.TotalShards())
+			t.Logf("  Can lose: %d shards", ec.MaxTolerableLoss())
+			t.Logf("  Target per DC: %d", dist.TargetShardsPerDC)
+			t.Logf("  Target per rack: %d", dist.TargetShardsPerRack)
+			t.Logf("  Min DCs for DC fault tolerance: %d", dist.MinDCsForDCFaultTolerance())
+
+			// Verify basic sanity
+			if dist.TargetShardsPerDC*2 < ec.TotalShards() {
+				t.Errorf("Target per DC (%d) * 2 should be >= total (%d)",
+					dist.TargetShardsPerDC, ec.TotalShards())
+			}
+		})
+	}
+}
+
+func TestShardClassification(t *testing.T) {
+	ec := DefaultECConfig() // 10+4
+
+	// Test IsDataShard
+	for i := 0; i < 10; i++ {
+		if !ec.IsDataShard(i) {
+			t.Errorf("Shard %d should be a data shard", i)
+		}
+		if ec.IsParityShard(i) {
+			t.Errorf("Shard %d should not be a parity shard", i)
+		}
+	}
+
+	// Test IsParityShard
+	for i := 10; i < 14; i++ {
+		if ec.IsDataShard(i) {
+			t.Errorf("Shard %d should not be a data shard", i)
+		}
+		if !ec.IsParityShard(i) {
+			t.Errorf("Shard %d should be a parity shard", i)
+		}
+	}
+
+	// Test with custom 8+4 EC
+	ec84, _ := NewECConfig(8, 4)
+	for i := 0; i < 8; i++ {
+		if !ec84.IsDataShard(i) {
+			t.Errorf("8+4 EC: Shard %d should be a data shard", i)
+		}
+	}
+	for i := 8; i < 12; i++ {
+		if !ec84.IsParityShard(i) {
+			t.Errorf("8+4 EC: Shard %d should be a parity shard", i)
+		}
+	}
+}
+
+func TestSortShardsDataFirst(t *testing.T) {
+	ec := DefaultECConfig() // 10+4
+
+	// Mixed shards: [0, 10, 5, 11, 2, 12, 7, 13]
+	shards := []int{0, 10, 5, 11, 2, 12, 7, 13}
+	sorted := ec.SortShardsDataFirst(shards)
+
+	t.Logf("Original: %v", shards)
+	t.Logf("Sorted (data first): %v", sorted)
+
+	// First 4 should be data shards (0, 5, 2, 7)
+	for i := 0; i < 4; i++ {
+		if !ec.IsDataShard(sorted[i]) {
+			t.Errorf("Position %d should be a data shard, got %d", i, sorted[i])
+		}
+	}
+
+	// Last 4 should be parity shards (10, 11, 12, 13)
+	for i := 4; i < 8; i++ {
+		if !ec.IsParityShard(sorted[i]) {
+			t.Errorf("Position %d should be a parity shard, got %d", i, sorted[i])
+		}
+	}
+}
+
+func TestSortShardsParityFirst(t *testing.T) {
+	ec := DefaultECConfig() // 10+4
+
+	// Mixed shards: [0, 10, 5, 11, 2, 12, 7, 13]
+	shards := []int{0, 10, 5, 11, 2, 12, 7, 13}
+	sorted := ec.SortShardsParityFirst(shards)
+
+	t.Logf("Original: %v", shards)
+	t.Logf("Sorted (parity first): %v", sorted)
+
+	// First 4 should be parity shards (10, 11, 12, 13)
+	for i := 0; i < 4; i++ {
+		if !ec.IsParityShard(sorted[i]) {
+			t.Errorf("Position %d should be a parity shard, got %d", i, sorted[i])
+		}
+	}
+
+	// Last 4 should be data shards (0, 5, 2, 7)
+	for i := 4; i < 8; i++ {
+		if !ec.IsDataShard(sorted[i]) {
+			t.Errorf("Position %d should be a data shard, got %d", i, sorted[i])
+		}
+	}
+}
+
+func TestRebalancerPrefersMovingParityShards(t *testing.T) {
+	// Build topology where one node has all shards including mix of data and parity
+	analysis := NewTopologyAnalysis()
+
+	// Node 1: Has all 14 shards (mixed data and parity)
+	node1 := &TopologyNode{
+		NodeID:     "node1",
+		DataCenter: "dc1",
+		Rack:       "rack1",
+		FreeSlots:  0,
+	}
+	analysis.AddNode(node1)
+
+	// Node 2: Empty, ready to receive
+	node2 := &TopologyNode{
+		NodeID:     "node2",
+		DataCenter: "dc1",
+		Rack:       "rack1",
+		FreeSlots:  10,
+	}
+	analysis.AddNode(node2)
+
+	// Add all 14 shards to node1
+	for i := 0; i < 14; i++ {
+		analysis.AddShardLocation(ShardLocation{
+			ShardID:    i,
+			NodeID:     "node1",
+			DataCenter: "dc1",
+			Rack:       "rack1",
+		})
+	}
+	analysis.Finalize()
+
+	// Create rebalancer
+	ec := DefaultECConfig()
+	rep, _ := NewReplicationConfigFromString("000")
+	rebalancer := NewRebalancer(ec, rep)
+
+	plan, err := rebalancer.PlanRebalance(analysis)
+	if err != nil {
+		t.Fatalf("PlanRebalance failed: %v", err)
+	}
+
+	t.Logf("Planned %d moves", len(plan.Moves))
+
+	// Check that parity shards are moved first
+	parityMovesFirst := 0
+	dataMovesFirst := 0
+	seenDataMove := false
+
+	for _, move := range plan.Moves {
+		isParity := ec.IsParityShard(move.ShardID)
+		t.Logf("Move shard %d (parity=%v): %s -> %s",
+			move.ShardID, isParity, move.SourceNode.NodeID, move.DestNode.NodeID)
+
+		if isParity && !seenDataMove {
+			parityMovesFirst++
+		} else if !isParity {
+			seenDataMove = true
+			dataMovesFirst++
+		}
+	}
+
+	t.Logf("Parity moves before first data move: %d", parityMovesFirst)
+	t.Logf("Data moves: %d", dataMovesFirst)
+
+	// With 10+4 EC, there are 4 parity shards
+	// They should be moved before data shards when possible
+	if parityMovesFirst < 4 && len(plan.Moves) >= 4 {
+		t.Logf("Note: Expected parity shards to be moved first, but got %d parity moves before data moves", parityMovesFirst)
+	}
+}
+
+func TestDistributionSummary(t *testing.T) {
+	ec := DefaultECConfig()
+	rep, _ := NewReplicationConfigFromString("110")
+	dist := CalculateDistribution(ec, rep)
+
+	summary := dist.Summary()
+	t.Log(summary)
+
+	if len(summary) == 0 {
+		t.Error("Summary should not be empty")
+	}
+
+	analysis := dist.FaultToleranceAnalysis()
+	t.Log(analysis)
+
+	if len(analysis) == 0 {
+		t.Error("Fault tolerance analysis should not be empty")
+	}
+}
--- a/weed/storage/erasure_coding/distribution/rebalancer.go
+++ b/weed/storage/erasure_coding/distribution/rebalancer.go
@@ -0,0 +1,378 @@
+package distribution
+
+import (
+	"fmt"
+	"slices"
+)
+
+// ShardMove represents a planned shard move
+type ShardMove struct {
+	ShardID    int
+	SourceNode *TopologyNode
+	DestNode   *TopologyNode
+	Reason     string
+}
+
+// String returns a human-readable description of the move
+func (m ShardMove) String() string {
+	return fmt.Sprintf("shard %d: %s -> %s (%s)",
+		m.ShardID, m.SourceNode.NodeID, m.DestNode.NodeID, m.Reason)
+}
+
+// RebalancePlan contains the complete plan for rebalancing EC shards
+type RebalancePlan struct {
+	Moves        []ShardMove
+	Distribution *ECDistribution
+	Analysis     *TopologyAnalysis
+
+	// Statistics
+	TotalMoves     int
+	MovesAcrossDC  int
+	MovesAcrossRack int
+	MovesWithinRack int
+}
+
+// String returns a summary of the plan
+func (p *RebalancePlan) String() string {
+	return fmt.Sprintf("RebalancePlan{moves:%d, acrossDC:%d, acrossRack:%d, withinRack:%d}",
+		p.TotalMoves, p.MovesAcrossDC, p.MovesAcrossRack, p.MovesWithinRack)
+}
+
+// DetailedString returns a detailed multi-line summary
+func (p *RebalancePlan) DetailedString() string {
+	s := fmt.Sprintf("Rebalance Plan:\n")
+	s += fmt.Sprintf("  Total Moves: %d\n", p.TotalMoves)
+	s += fmt.Sprintf("  Across DC: %d\n", p.MovesAcrossDC)
+	s += fmt.Sprintf("  Across Rack: %d\n", p.MovesAcrossRack)
+	s += fmt.Sprintf("  Within Rack: %d\n", p.MovesWithinRack)
+	s += fmt.Sprintf("\nMoves:\n")
+	for i, move := range p.Moves {
+		s += fmt.Sprintf("  %d. %s\n", i+1, move.String())
+	}
+	return s
+}
+
+// Rebalancer plans shard moves to achieve proportional distribution
+type Rebalancer struct {
+	ecConfig   ECConfig
+	repConfig  ReplicationConfig
+}
+
+// NewRebalancer creates a new rebalancer with the given configuration
+func NewRebalancer(ec ECConfig, rep ReplicationConfig) *Rebalancer {
+	return &Rebalancer{
+		ecConfig:  ec,
+		repConfig: rep,
+	}
+}
+
+// PlanRebalance creates a rebalancing plan based on current topology analysis
+func (r *Rebalancer) PlanRebalance(analysis *TopologyAnalysis) (*RebalancePlan, error) {
+	dist := CalculateDistribution(r.ecConfig, r.repConfig)
+
+	plan := &RebalancePlan{
+		Distribution: dist,
+		Analysis:     analysis,
+	}
+
+	// Step 1: Balance across data centers
+	dcMoves := r.planDCMoves(analysis, dist)
+	for _, move := range dcMoves {
+		plan.Moves = append(plan.Moves, move)
+		plan.MovesAcrossDC++
+	}
+
+	// Update analysis after DC moves (for planning purposes)
+	r.applyMovesToAnalysis(analysis, dcMoves)
+
+	// Step 2: Balance across racks within each DC
+	rackMoves := r.planRackMoves(analysis, dist)
+	for _, move := range rackMoves {
+		plan.Moves = append(plan.Moves, move)
+		plan.MovesAcrossRack++
+	}
+
+	// Update analysis after rack moves
+	r.applyMovesToAnalysis(analysis, rackMoves)
+
+	// Step 3: Balance across nodes within each rack
+	nodeMoves := r.planNodeMoves(analysis, dist)
+	for _, move := range nodeMoves {
+		plan.Moves = append(plan.Moves, move)
+		plan.MovesWithinRack++
+	}
+
+	plan.TotalMoves = len(plan.Moves)
+
+	return plan, nil
+}
+
+// planDCMoves plans moves to balance shards across data centers
+func (r *Rebalancer) planDCMoves(analysis *TopologyAnalysis, dist *ECDistribution) []ShardMove {
+	var moves []ShardMove
+
+	overDCs := CalculateDCExcess(analysis, dist)
+	underDCs := CalculateUnderservedDCs(analysis, dist)
+
+	underIdx := 0
+	for _, over := range overDCs {
+		for over.Excess > 0 && underIdx < len(underDCs) {
+			destDC := underDCs[underIdx]
+
+			// Find a shard and source node
+			shardID, srcNode := r.pickShardToMove(analysis, over.Nodes)
+			if srcNode == nil {
+				break
+			}
+
+			// Find destination node in target DC
+			destNode := r.pickBestDestination(analysis, destDC, "", dist)
+			if destNode == nil {
+				underIdx++
+				continue
+			}
+
+			moves = append(moves, ShardMove{
+				ShardID:    shardID,
+				SourceNode: srcNode,
+				DestNode:   destNode,
+				Reason:     fmt.Sprintf("balance DC: %s -> %s", srcNode.DataCenter, destDC),
+			})
+
+			over.Excess--
+			analysis.ShardsByDC[srcNode.DataCenter]--
+			analysis.ShardsByDC[destDC]++
+
+			// Check if destDC reached target
+			if analysis.ShardsByDC[destDC] >= dist.TargetShardsPerDC {
+				underIdx++
+			}
+		}
+	}
+
+	return moves
+}
+
+// planRackMoves plans moves to balance shards across racks within each DC
+func (r *Rebalancer) planRackMoves(analysis *TopologyAnalysis, dist *ECDistribution) []ShardMove {
+	var moves []ShardMove
+
+	for dc := range analysis.DCToRacks {
+		dcShards := analysis.ShardsByDC[dc]
+		numRacks := len(analysis.DCToRacks[dc])
+		if numRacks == 0 {
+			continue
+		}
+
+		targetPerRack := ceilDivide(dcShards, max(numRacks, dist.ReplicationConfig.MinRacksPerDC))
+
+		overRacks := CalculateRackExcess(analysis, dc, targetPerRack)
+		underRacks := CalculateUnderservedRacks(analysis, dc, targetPerRack)
+
+		underIdx := 0
+		for _, over := range overRacks {
+			for over.Excess > 0 && underIdx < len(underRacks) {
+				destRack := underRacks[underIdx]
+
+				// Find shard and source node
+				shardID, srcNode := r.pickShardToMove(analysis, over.Nodes)
+				if srcNode == nil {
+					break
+				}
+
+				// Find destination node in target rack
+				destNode := r.pickBestDestination(analysis, dc, destRack, dist)
+				if destNode == nil {
+					underIdx++
+					continue
+				}
+
+				moves = append(moves, ShardMove{
+					ShardID:    shardID,
+					SourceNode: srcNode,
+					DestNode:   destNode,
+					Reason:     fmt.Sprintf("balance rack: %s -> %s", srcNode.Rack, destRack),
+				})
+
+				over.Excess--
+				analysis.ShardsByRack[srcNode.Rack]--
+				analysis.ShardsByRack[destRack]++
+
+				if analysis.ShardsByRack[destRack] >= targetPerRack {
+					underIdx++
+				}
+			}
+		}
+	}
+
+	return moves
+}
+
+// planNodeMoves plans moves to balance shards across nodes within each rack
+func (r *Rebalancer) planNodeMoves(analysis *TopologyAnalysis, dist *ECDistribution) []ShardMove {
+	var moves []ShardMove
+
+	for rack, nodes := range analysis.RackToNodes {
+		if len(nodes) <= 1 {
+			continue
+		}
+
+		rackShards := analysis.ShardsByRack[rack]
+		targetPerNode := ceilDivide(rackShards, max(len(nodes), dist.ReplicationConfig.MinNodesPerRack))
+
+		// Find over and under nodes
+		var overNodes []*TopologyNode
+		var underNodes []*TopologyNode
+
+		for _, node := range nodes {
+			count := analysis.ShardsByNode[node.NodeID]
+			if count > targetPerNode {
+				overNodes = append(overNodes, node)
+			} else if count < targetPerNode {
+				underNodes = append(underNodes, node)
+			}
+		}
+
+		// Sort by excess/deficit
+		slices.SortFunc(overNodes, func(a, b *TopologyNode) int {
+			return analysis.ShardsByNode[b.NodeID] - analysis.ShardsByNode[a.NodeID]
+		})
+
+		underIdx := 0
+		for _, srcNode := range overNodes {
+			excess := analysis.ShardsByNode[srcNode.NodeID] - targetPerNode
+
+			for excess > 0 && underIdx < len(underNodes) {
+				destNode := underNodes[underIdx]
+
+				// Pick a shard from this node, preferring parity shards
+				shards := analysis.NodeToShards[srcNode.NodeID]
+				if len(shards) == 0 {
+					break
+				}
+
+				// Find a parity shard first, fallback to data shard
+				shardID := -1
+				shardIdx := -1
+				for i, s := range shards {
+					if r.ecConfig.IsParityShard(s) {
+						shardID = s
+						shardIdx = i
+						break
+					}
+				}
+				if shardID == -1 {
+					shardID = shards[0]
+					shardIdx = 0
+				}
+
+				moves = append(moves, ShardMove{
+					ShardID:    shardID,
+					SourceNode: srcNode,
+					DestNode:   destNode,
+					Reason:     fmt.Sprintf("balance node: %s -> %s", srcNode.NodeID, destNode.NodeID),
+				})
+
+				excess--
+				analysis.ShardsByNode[srcNode.NodeID]--
+				analysis.ShardsByNode[destNode.NodeID]++
+
+				// Update shard lists - remove the specific shard we picked
+				analysis.NodeToShards[srcNode.NodeID] = append(
+					shards[:shardIdx], shards[shardIdx+1:]...)
+				analysis.NodeToShards[destNode.NodeID] = append(
+					analysis.NodeToShards[destNode.NodeID], shardID)
+
+				if analysis.ShardsByNode[destNode.NodeID] >= targetPerNode {
+					underIdx++
+				}
+			}
+		}
+	}
+
+	return moves
+}
+
+// pickShardToMove selects a shard and its node from the given nodes.
+// It prefers to move parity shards first, keeping data shards spread out
+// since data shards serve read requests while parity shards are only for reconstruction.
+func (r *Rebalancer) pickShardToMove(analysis *TopologyAnalysis, nodes []*TopologyNode) (int, *TopologyNode) {
+	// Sort by shard count (most shards first)
+	slices.SortFunc(nodes, func(a, b *TopologyNode) int {
+		return analysis.ShardsByNode[b.NodeID] - analysis.ShardsByNode[a.NodeID]
+	})
+
+	// First pass: try to find a parity shard to move (prefer moving parity)
+	for _, node := range nodes {
+		shards := analysis.NodeToShards[node.NodeID]
+		for _, shardID := range shards {
+			if r.ecConfig.IsParityShard(shardID) {
+				return shardID, node
+			}
+		}
+	}
+
+	// Second pass: if no parity shards, move a data shard
+	for _, node := range nodes {
+		shards := analysis.NodeToShards[node.NodeID]
+		if len(shards) > 0 {
+			return shards[0], node
+		}
+	}
+
+	return -1, nil
+}
+
+// pickBestDestination selects the best destination node
+func (r *Rebalancer) pickBestDestination(analysis *TopologyAnalysis, targetDC, targetRack string, dist *ECDistribution) *TopologyNode {
+	var candidates []*TopologyNode
+
+	// Collect candidates
+	for _, node := range analysis.AllNodes {
+		// Filter by DC if specified
+		if targetDC != "" && node.DataCenter != targetDC {
+			continue
+		}
+		// Filter by rack if specified
+		if targetRack != "" && node.Rack != targetRack {
+			continue
+		}
+		// Check capacity
+		if node.FreeSlots <= 0 {
+			continue
+		}
+		// Check max shards limit
+		if analysis.ShardsByNode[node.NodeID] >= dist.MaxShardsPerNode {
+			continue
+		}
+
+		candidates = append(candidates, node)
+	}
+
+	if len(candidates) == 0 {
+		return nil
+	}
+
+	// Sort by: 1) fewer shards, 2) more free slots
+	slices.SortFunc(candidates, func(a, b *TopologyNode) int {
+		aShards := analysis.ShardsByNode[a.NodeID]
+		bShards := analysis.ShardsByNode[b.NodeID]
+		if aShards != bShards {
+			return aShards - bShards
+		}
+		return b.FreeSlots - a.FreeSlots
+	})
+
+	return candidates[0]
+}
+
+// applyMovesToAnalysis is a no-op placeholder for potential future use.
+// Note: All planners (planDCMoves, planRackMoves, planNodeMoves) update
+// their respective counts (ShardsByDC, ShardsByRack, ShardsByNode) and
+// shard lists (NodeToShards) inline during planning. This avoids duplicate
+// updates that would occur if we also updated counts here.
+func (r *Rebalancer) applyMovesToAnalysis(analysis *TopologyAnalysis, moves []ShardMove) {
+	// Counts are already updated by the individual planners.
+	// This function is kept for API compatibility and potential future use.
+}
+