Files
seaweedFS/weed/shell/ec_proportional_rebalance.go
Chris Lu 4aa50bfa6a fix: EC rebalance fails with replica placement 000 (#7812)
* fix: EC rebalance fails with replica placement 000

This PR fixes several issues with EC shard distribution:

1. Pre-flight check before EC encoding
   - Verify target disk type has capacity before encoding starts
   - Prevents encoding shards only to fail during rebalance
   - Shows helpful error when wrong diskType is specified (e.g., ssd when volumes are on hdd)

2. Fix EC rebalance with replica placement 000
   - When DiffRackCount=0, shards should be distributed freely across racks
   - The '000' placement means 'no volume replication needed' because EC provides redundancy
   - Previously all racks were skipped with error 'shards X > replica placement limit (0)'

3. Add unit tests for EC rebalance slot calculation
   - TestECRebalanceWithLimitedSlots: documents the limited slots scenario
   - TestECRebalanceZeroFreeSlots: reproduces the 0 free slots error

4. Add Makefile for manual EC testing
   - make setup: start cluster and populate data
   - make shell: open weed shell for EC commands
   - make clean: stop cluster and cleanup

* fix: default -rebalance to true for ec.encode

The -rebalance flag was defaulting to false, which meant ec.encode would
only print shard moves but not actually execute them. This is a poor
default since the whole point of EC encoding is to distribute shards
across servers for fault tolerance.

Now -rebalance defaults to true, so shards are actually distributed
after encoding. Users can use -rebalance=false if they only want to
see what would happen without making changes.

* test/erasure_coding: improve Makefile safety and docs

- Narrow pkill pattern for volume servers to use TEST_DIR instead of
  port pattern, avoiding accidental kills of unrelated SeaweedFS processes
- Document external dependencies (curl, jq) in header comments

* shell: refactor buildRackWithEcShards to reuse buildEcShards

Extract common shard bit construction logic to avoid duplication
between buildEcShards and buildRackWithEcShards helper functions.

* shell: update test for EC replication 000 behavior

When DiffRackCount=0 (replication "000"), EC shards should be
distributed freely across racks since erasure coding provides its
own redundancy. Update test expectation to reflect this behavior.

* erasure_coding: add distribution package for proportional EC shard placement

Add a new reusable package for EC shard distribution that:
- Supports configurable EC ratios (not hard-coded 10+4)
- Distributes shards proportionally based on replication policy
- Provides fault tolerance analysis
- Prefers moving parity shards to keep data shards spread out

Key components:
- ECConfig: Configurable data/parity shard counts
- ReplicationConfig: Parsed XYZ replication policy
- ECDistribution: Target shard counts per DC/rack/node
- Rebalancer: Plans shard moves with parity-first strategy

This enables seaweed-enterprise custom EC ratios and weed worker
integration while maintaining a clean, testable architecture.

* shell: integrate distribution package for EC rebalancing

Add shell wrappers around the distribution package:
- ProportionalECRebalancer: Plans moves using distribution.Rebalancer
- NewProportionalECRebalancerWithConfig: Supports custom EC configs
- GetDistributionSummary/GetFaultToleranceAnalysis: Helper functions

The shell layer converts between EcNode types and the generic
TopologyNode types used by the distribution package.

* test setup

* ec: improve data and parity shard distribution across racks

- Add shardsByTypePerRack helper to track data vs parity shards
- Rewrite doBalanceEcShardsAcrossRacks for two-pass balancing:
  1. Balance data shards (0-9) evenly, max ceil(10/6)=2 per rack
  2. Balance parity shards (10-13) evenly, max ceil(4/6)=1 per rack
- Add balanceShardTypeAcrossRacks for generic shard type balancing
- Add pickRackForShardType to select destination with room for type
- Add unit tests for even data/parity distribution verification

This ensures even read load during normal operation by spreading
both data and parity shards across all available racks.

* ec: make data/parity shard counts configurable in ecBalancer

- Add dataShardCount and parityShardCount fields to ecBalancer struct
- Add getDataShardCount() and getParityShardCount() methods with defaults
- Replace direct constant usage with configurable methods
- Fix unused variable warning for parityPerRack

This allows seaweed-enterprise to use custom EC ratios while
defaulting to standard 10+4 scheme.

* Address PR 7812 review comments

Makefile improvements:
- Save PIDs for each volume server for precise termination
- Use PID-based killing in stop target with pkill fallback
- Use more specific pkill patterns with TEST_DIR paths

Documentation:
- Document jq dependency in README.md

Rebalancer fix:
- Fix duplicate shard count updates in applyMovesToAnalysis
- All planners (DC/rack/node) update counts inline during planning
- Remove duplicate updates from applyMovesToAnalysis to avoid double-counting

* test/erasure_coding: use mktemp for test file template

Use mktemp instead of hardcoded /tmp/testfile_template.bin path
to provide better isolation for concurrent test runs.
2025-12-19 13:29:12 -08:00

285 lines
8.7 KiB
Go

package shell
import (
"fmt"
"github.com/seaweedfs/seaweedfs/weed/storage/erasure_coding"
"github.com/seaweedfs/seaweedfs/weed/storage/erasure_coding/distribution"
"github.com/seaweedfs/seaweedfs/weed/storage/needle"
"github.com/seaweedfs/seaweedfs/weed/storage/super_block"
"github.com/seaweedfs/seaweedfs/weed/storage/types"
)
// ECDistribution is an alias to the distribution package type for backward compatibility
type ECDistribution = distribution.ECDistribution
// CalculateECDistribution computes the target EC shard distribution based on replication policy.
// This is a convenience wrapper that uses the default 10+4 EC configuration.
// For custom EC ratios, use the distribution package directly.
func CalculateECDistribution(totalShards, parityShards int, rp *super_block.ReplicaPlacement) *ECDistribution {
ec := distribution.ECConfig{
DataShards: totalShards - parityShards,
ParityShards: parityShards,
}
rep := distribution.NewReplicationConfig(rp)
return distribution.CalculateDistribution(ec, rep)
}
// TopologyDistributionAnalysis holds the current shard distribution analysis
// This wraps the distribution package's TopologyAnalysis with shell-specific EcNode handling
type TopologyDistributionAnalysis struct {
inner *distribution.TopologyAnalysis
// Shell-specific mappings
nodeMap map[string]*EcNode // nodeID -> EcNode
}
// NewTopologyDistributionAnalysis creates a new analysis structure
func NewTopologyDistributionAnalysis() *TopologyDistributionAnalysis {
return &TopologyDistributionAnalysis{
inner: distribution.NewTopologyAnalysis(),
nodeMap: make(map[string]*EcNode),
}
}
// AddNode adds a node and its shards to the analysis
func (a *TopologyDistributionAnalysis) AddNode(node *EcNode, shardBits erasure_coding.ShardBits) {
nodeId := node.info.Id
// Create distribution.TopologyNode from EcNode
topoNode := &distribution.TopologyNode{
NodeID: nodeId,
DataCenter: string(node.dc),
Rack: string(node.rack),
FreeSlots: node.freeEcSlot,
TotalShards: shardBits.ShardIdCount(),
}
for _, shardId := range shardBits.ShardIds() {
topoNode.ShardIDs = append(topoNode.ShardIDs, int(shardId))
}
a.inner.AddNode(topoNode)
a.nodeMap[nodeId] = node
// Add shard locations
for _, shardId := range shardBits.ShardIds() {
a.inner.AddShardLocation(distribution.ShardLocation{
ShardID: int(shardId),
NodeID: nodeId,
DataCenter: string(node.dc),
Rack: string(node.rack),
})
}
}
// Finalize completes the analysis
func (a *TopologyDistributionAnalysis) Finalize() {
a.inner.Finalize()
}
// String returns a summary
func (a *TopologyDistributionAnalysis) String() string {
return a.inner.String()
}
// DetailedString returns detailed analysis
func (a *TopologyDistributionAnalysis) DetailedString() string {
return a.inner.DetailedString()
}
// GetShardsByDC returns shard counts by DC
func (a *TopologyDistributionAnalysis) GetShardsByDC() map[DataCenterId]int {
result := make(map[DataCenterId]int)
for dc, count := range a.inner.ShardsByDC {
result[DataCenterId(dc)] = count
}
return result
}
// GetShardsByRack returns shard counts by rack
func (a *TopologyDistributionAnalysis) GetShardsByRack() map[RackId]int {
result := make(map[RackId]int)
for rack, count := range a.inner.ShardsByRack {
result[RackId(rack)] = count
}
return result
}
// GetShardsByNode returns shard counts by node
func (a *TopologyDistributionAnalysis) GetShardsByNode() map[EcNodeId]int {
result := make(map[EcNodeId]int)
for nodeId, count := range a.inner.ShardsByNode {
result[EcNodeId(nodeId)] = count
}
return result
}
// AnalyzeVolumeDistribution creates an analysis of current shard distribution for a volume
func AnalyzeVolumeDistribution(volumeId needle.VolumeId, locations []*EcNode, diskType types.DiskType) *TopologyDistributionAnalysis {
analysis := NewTopologyDistributionAnalysis()
for _, node := range locations {
shardBits := findEcVolumeShards(node, volumeId, diskType)
if shardBits.ShardIdCount() > 0 {
analysis.AddNode(node, shardBits)
}
}
analysis.Finalize()
return analysis
}
// ECShardMove represents a planned shard move (shell-specific with EcNode references)
type ECShardMove struct {
VolumeId needle.VolumeId
ShardId erasure_coding.ShardId
SourceNode *EcNode
DestNode *EcNode
Reason string
}
// String returns a human-readable description
func (m ECShardMove) String() string {
return fmt.Sprintf("volume %d shard %d: %s -> %s (%s)",
m.VolumeId, m.ShardId, m.SourceNode.info.Id, m.DestNode.info.Id, m.Reason)
}
// ProportionalECRebalancer implements proportional shard distribution for shell commands
type ProportionalECRebalancer struct {
ecNodes []*EcNode
replicaPlacement *super_block.ReplicaPlacement
diskType types.DiskType
ecConfig distribution.ECConfig
}
// NewProportionalECRebalancer creates a new proportional rebalancer with default EC config
func NewProportionalECRebalancer(
ecNodes []*EcNode,
rp *super_block.ReplicaPlacement,
diskType types.DiskType,
) *ProportionalECRebalancer {
return NewProportionalECRebalancerWithConfig(
ecNodes,
rp,
diskType,
distribution.DefaultECConfig(),
)
}
// NewProportionalECRebalancerWithConfig creates a rebalancer with custom EC configuration
func NewProportionalECRebalancerWithConfig(
ecNodes []*EcNode,
rp *super_block.ReplicaPlacement,
diskType types.DiskType,
ecConfig distribution.ECConfig,
) *ProportionalECRebalancer {
return &ProportionalECRebalancer{
ecNodes: ecNodes,
replicaPlacement: rp,
diskType: diskType,
ecConfig: ecConfig,
}
}
// PlanMoves generates a plan for moving shards to achieve proportional distribution
func (r *ProportionalECRebalancer) PlanMoves(
volumeId needle.VolumeId,
locations []*EcNode,
) ([]ECShardMove, error) {
// Build topology analysis
analysis := distribution.NewTopologyAnalysis()
nodeMap := make(map[string]*EcNode)
// Add all EC nodes to the analysis (even those without shards)
for _, node := range r.ecNodes {
nodeId := node.info.Id
topoNode := &distribution.TopologyNode{
NodeID: nodeId,
DataCenter: string(node.dc),
Rack: string(node.rack),
FreeSlots: node.freeEcSlot,
}
analysis.AddNode(topoNode)
nodeMap[nodeId] = node
}
// Add shard locations from nodes that have shards
for _, node := range locations {
nodeId := node.info.Id
shardBits := findEcVolumeShards(node, volumeId, r.diskType)
for _, shardId := range shardBits.ShardIds() {
analysis.AddShardLocation(distribution.ShardLocation{
ShardID: int(shardId),
NodeID: nodeId,
DataCenter: string(node.dc),
Rack: string(node.rack),
})
}
if _, exists := nodeMap[nodeId]; !exists {
nodeMap[nodeId] = node
}
}
analysis.Finalize()
// Create rebalancer and plan moves
rep := distribution.NewReplicationConfig(r.replicaPlacement)
rebalancer := distribution.NewRebalancer(r.ecConfig, rep)
plan, err := rebalancer.PlanRebalance(analysis)
if err != nil {
return nil, err
}
// Convert distribution moves to shell moves
var moves []ECShardMove
for _, move := range plan.Moves {
srcNode := nodeMap[move.SourceNode.NodeID]
destNode := nodeMap[move.DestNode.NodeID]
if srcNode == nil || destNode == nil {
continue
}
moves = append(moves, ECShardMove{
VolumeId: volumeId,
ShardId: erasure_coding.ShardId(move.ShardID),
SourceNode: srcNode,
DestNode: destNode,
Reason: move.Reason,
})
}
return moves, nil
}
// GetDistributionSummary returns a summary of the planned distribution
func GetDistributionSummary(rp *super_block.ReplicaPlacement) string {
ec := distribution.DefaultECConfig()
rep := distribution.NewReplicationConfig(rp)
dist := distribution.CalculateDistribution(ec, rep)
return dist.Summary()
}
// GetDistributionSummaryWithConfig returns a summary with custom EC configuration
func GetDistributionSummaryWithConfig(rp *super_block.ReplicaPlacement, ecConfig distribution.ECConfig) string {
rep := distribution.NewReplicationConfig(rp)
dist := distribution.CalculateDistribution(ecConfig, rep)
return dist.Summary()
}
// GetFaultToleranceAnalysis returns fault tolerance analysis for the given configuration
func GetFaultToleranceAnalysis(rp *super_block.ReplicaPlacement) string {
ec := distribution.DefaultECConfig()
rep := distribution.NewReplicationConfig(rp)
dist := distribution.CalculateDistribution(ec, rep)
return dist.FaultToleranceAnalysis()
}
// GetFaultToleranceAnalysisWithConfig returns fault tolerance analysis with custom EC configuration
func GetFaultToleranceAnalysisWithConfig(rp *super_block.ReplicaPlacement, ecConfig distribution.ECConfig) string {
rep := distribution.NewReplicationConfig(rp)
dist := distribution.CalculateDistribution(ecConfig, rep)
return dist.FaultToleranceAnalysis()
}