fix: EC rebalance fails with replica placement 000 (#7812)
* fix: EC rebalance fails with replica placement 000 This PR fixes several issues with EC shard distribution: 1. Pre-flight check before EC encoding - Verify target disk type has capacity before encoding starts - Prevents encoding shards only to fail during rebalance - Shows helpful error when wrong diskType is specified (e.g., ssd when volumes are on hdd) 2. Fix EC rebalance with replica placement 000 - When DiffRackCount=0, shards should be distributed freely across racks - The '000' placement means 'no volume replication needed' because EC provides redundancy - Previously all racks were skipped with error 'shards X > replica placement limit (0)' 3. Add unit tests for EC rebalance slot calculation - TestECRebalanceWithLimitedSlots: documents the limited slots scenario - TestECRebalanceZeroFreeSlots: reproduces the 0 free slots error 4. Add Makefile for manual EC testing - make setup: start cluster and populate data - make shell: open weed shell for EC commands - make clean: stop cluster and cleanup * fix: default -rebalance to true for ec.encode The -rebalance flag was defaulting to false, which meant ec.encode would only print shard moves but not actually execute them. This is a poor default since the whole point of EC encoding is to distribute shards across servers for fault tolerance. Now -rebalance defaults to true, so shards are actually distributed after encoding. Users can use -rebalance=false if they only want to see what would happen without making changes. * test/erasure_coding: improve Makefile safety and docs - Narrow pkill pattern for volume servers to use TEST_DIR instead of port pattern, avoiding accidental kills of unrelated SeaweedFS processes - Document external dependencies (curl, jq) in header comments * shell: refactor buildRackWithEcShards to reuse buildEcShards Extract common shard bit construction logic to avoid duplication between buildEcShards and buildRackWithEcShards helper functions. * shell: update test for EC replication 000 behavior When DiffRackCount=0 (replication "000"), EC shards should be distributed freely across racks since erasure coding provides its own redundancy. Update test expectation to reflect this behavior. * erasure_coding: add distribution package for proportional EC shard placement Add a new reusable package for EC shard distribution that: - Supports configurable EC ratios (not hard-coded 10+4) - Distributes shards proportionally based on replication policy - Provides fault tolerance analysis - Prefers moving parity shards to keep data shards spread out Key components: - ECConfig: Configurable data/parity shard counts - ReplicationConfig: Parsed XYZ replication policy - ECDistribution: Target shard counts per DC/rack/node - Rebalancer: Plans shard moves with parity-first strategy This enables seaweed-enterprise custom EC ratios and weed worker integration while maintaining a clean, testable architecture. * shell: integrate distribution package for EC rebalancing Add shell wrappers around the distribution package: - ProportionalECRebalancer: Plans moves using distribution.Rebalancer - NewProportionalECRebalancerWithConfig: Supports custom EC configs - GetDistributionSummary/GetFaultToleranceAnalysis: Helper functions The shell layer converts between EcNode types and the generic TopologyNode types used by the distribution package. * test setup * ec: improve data and parity shard distribution across racks - Add shardsByTypePerRack helper to track data vs parity shards - Rewrite doBalanceEcShardsAcrossRacks for two-pass balancing: 1. Balance data shards (0-9) evenly, max ceil(10/6)=2 per rack 2. Balance parity shards (10-13) evenly, max ceil(4/6)=1 per rack - Add balanceShardTypeAcrossRacks for generic shard type balancing - Add pickRackForShardType to select destination with room for type - Add unit tests for even data/parity distribution verification This ensures even read load during normal operation by spreading both data and parity shards across all available racks. * ec: make data/parity shard counts configurable in ecBalancer - Add dataShardCount and parityShardCount fields to ecBalancer struct - Add getDataShardCount() and getParityShardCount() methods with defaults - Replace direct constant usage with configurable methods - Fix unused variable warning for parityPerRack This allows seaweed-enterprise to use custom EC ratios while defaulting to standard 10+4 scheme. * Address PR 7812 review comments Makefile improvements: - Save PIDs for each volume server for precise termination - Use PID-based killing in stop target with pkill fallback - Use more specific pkill patterns with TEST_DIR paths Documentation: - Document jq dependency in README.md Rebalancer fix: - Fix duplicate shard count updates in applyMovesToAnalysis - All planners (DC/rack/node) update counts inline during planning - Remove duplicate updates from applyMovesToAnalysis to avoid double-counting * test/erasure_coding: use mktemp for test file template Use mktemp instead of hardcoded /tmp/testfile_template.bin path to provide better isolation for concurrent test runs.
This commit is contained in:
284
weed/shell/ec_proportional_rebalance.go
Normal file
284
weed/shell/ec_proportional_rebalance.go
Normal file
@@ -0,0 +1,284 @@
|
||||
package shell
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/seaweedfs/seaweedfs/weed/storage/erasure_coding"
|
||||
"github.com/seaweedfs/seaweedfs/weed/storage/erasure_coding/distribution"
|
||||
"github.com/seaweedfs/seaweedfs/weed/storage/needle"
|
||||
"github.com/seaweedfs/seaweedfs/weed/storage/super_block"
|
||||
"github.com/seaweedfs/seaweedfs/weed/storage/types"
|
||||
)
|
||||
|
||||
// ECDistribution is an alias to the distribution package type for backward compatibility
|
||||
type ECDistribution = distribution.ECDistribution
|
||||
|
||||
// CalculateECDistribution computes the target EC shard distribution based on replication policy.
|
||||
// This is a convenience wrapper that uses the default 10+4 EC configuration.
|
||||
// For custom EC ratios, use the distribution package directly.
|
||||
func CalculateECDistribution(totalShards, parityShards int, rp *super_block.ReplicaPlacement) *ECDistribution {
|
||||
ec := distribution.ECConfig{
|
||||
DataShards: totalShards - parityShards,
|
||||
ParityShards: parityShards,
|
||||
}
|
||||
rep := distribution.NewReplicationConfig(rp)
|
||||
return distribution.CalculateDistribution(ec, rep)
|
||||
}
|
||||
|
||||
// TopologyDistributionAnalysis holds the current shard distribution analysis
|
||||
// This wraps the distribution package's TopologyAnalysis with shell-specific EcNode handling
|
||||
type TopologyDistributionAnalysis struct {
|
||||
inner *distribution.TopologyAnalysis
|
||||
|
||||
// Shell-specific mappings
|
||||
nodeMap map[string]*EcNode // nodeID -> EcNode
|
||||
}
|
||||
|
||||
// NewTopologyDistributionAnalysis creates a new analysis structure
|
||||
func NewTopologyDistributionAnalysis() *TopologyDistributionAnalysis {
|
||||
return &TopologyDistributionAnalysis{
|
||||
inner: distribution.NewTopologyAnalysis(),
|
||||
nodeMap: make(map[string]*EcNode),
|
||||
}
|
||||
}
|
||||
|
||||
// AddNode adds a node and its shards to the analysis
|
||||
func (a *TopologyDistributionAnalysis) AddNode(node *EcNode, shardBits erasure_coding.ShardBits) {
|
||||
nodeId := node.info.Id
|
||||
|
||||
// Create distribution.TopologyNode from EcNode
|
||||
topoNode := &distribution.TopologyNode{
|
||||
NodeID: nodeId,
|
||||
DataCenter: string(node.dc),
|
||||
Rack: string(node.rack),
|
||||
FreeSlots: node.freeEcSlot,
|
||||
TotalShards: shardBits.ShardIdCount(),
|
||||
}
|
||||
|
||||
for _, shardId := range shardBits.ShardIds() {
|
||||
topoNode.ShardIDs = append(topoNode.ShardIDs, int(shardId))
|
||||
}
|
||||
|
||||
a.inner.AddNode(topoNode)
|
||||
a.nodeMap[nodeId] = node
|
||||
|
||||
// Add shard locations
|
||||
for _, shardId := range shardBits.ShardIds() {
|
||||
a.inner.AddShardLocation(distribution.ShardLocation{
|
||||
ShardID: int(shardId),
|
||||
NodeID: nodeId,
|
||||
DataCenter: string(node.dc),
|
||||
Rack: string(node.rack),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Finalize completes the analysis
|
||||
func (a *TopologyDistributionAnalysis) Finalize() {
|
||||
a.inner.Finalize()
|
||||
}
|
||||
|
||||
// String returns a summary
|
||||
func (a *TopologyDistributionAnalysis) String() string {
|
||||
return a.inner.String()
|
||||
}
|
||||
|
||||
// DetailedString returns detailed analysis
|
||||
func (a *TopologyDistributionAnalysis) DetailedString() string {
|
||||
return a.inner.DetailedString()
|
||||
}
|
||||
|
||||
// GetShardsByDC returns shard counts by DC
|
||||
func (a *TopologyDistributionAnalysis) GetShardsByDC() map[DataCenterId]int {
|
||||
result := make(map[DataCenterId]int)
|
||||
for dc, count := range a.inner.ShardsByDC {
|
||||
result[DataCenterId(dc)] = count
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// GetShardsByRack returns shard counts by rack
|
||||
func (a *TopologyDistributionAnalysis) GetShardsByRack() map[RackId]int {
|
||||
result := make(map[RackId]int)
|
||||
for rack, count := range a.inner.ShardsByRack {
|
||||
result[RackId(rack)] = count
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// GetShardsByNode returns shard counts by node
|
||||
func (a *TopologyDistributionAnalysis) GetShardsByNode() map[EcNodeId]int {
|
||||
result := make(map[EcNodeId]int)
|
||||
for nodeId, count := range a.inner.ShardsByNode {
|
||||
result[EcNodeId(nodeId)] = count
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// AnalyzeVolumeDistribution creates an analysis of current shard distribution for a volume
|
||||
func AnalyzeVolumeDistribution(volumeId needle.VolumeId, locations []*EcNode, diskType types.DiskType) *TopologyDistributionAnalysis {
|
||||
analysis := NewTopologyDistributionAnalysis()
|
||||
|
||||
for _, node := range locations {
|
||||
shardBits := findEcVolumeShards(node, volumeId, diskType)
|
||||
if shardBits.ShardIdCount() > 0 {
|
||||
analysis.AddNode(node, shardBits)
|
||||
}
|
||||
}
|
||||
|
||||
analysis.Finalize()
|
||||
return analysis
|
||||
}
|
||||
|
||||
// ECShardMove represents a planned shard move (shell-specific with EcNode references)
|
||||
type ECShardMove struct {
|
||||
VolumeId needle.VolumeId
|
||||
ShardId erasure_coding.ShardId
|
||||
SourceNode *EcNode
|
||||
DestNode *EcNode
|
||||
Reason string
|
||||
}
|
||||
|
||||
// String returns a human-readable description
|
||||
func (m ECShardMove) String() string {
|
||||
return fmt.Sprintf("volume %d shard %d: %s -> %s (%s)",
|
||||
m.VolumeId, m.ShardId, m.SourceNode.info.Id, m.DestNode.info.Id, m.Reason)
|
||||
}
|
||||
|
||||
// ProportionalECRebalancer implements proportional shard distribution for shell commands
|
||||
type ProportionalECRebalancer struct {
|
||||
ecNodes []*EcNode
|
||||
replicaPlacement *super_block.ReplicaPlacement
|
||||
diskType types.DiskType
|
||||
ecConfig distribution.ECConfig
|
||||
}
|
||||
|
||||
// NewProportionalECRebalancer creates a new proportional rebalancer with default EC config
|
||||
func NewProportionalECRebalancer(
|
||||
ecNodes []*EcNode,
|
||||
rp *super_block.ReplicaPlacement,
|
||||
diskType types.DiskType,
|
||||
) *ProportionalECRebalancer {
|
||||
return NewProportionalECRebalancerWithConfig(
|
||||
ecNodes,
|
||||
rp,
|
||||
diskType,
|
||||
distribution.DefaultECConfig(),
|
||||
)
|
||||
}
|
||||
|
||||
// NewProportionalECRebalancerWithConfig creates a rebalancer with custom EC configuration
|
||||
func NewProportionalECRebalancerWithConfig(
|
||||
ecNodes []*EcNode,
|
||||
rp *super_block.ReplicaPlacement,
|
||||
diskType types.DiskType,
|
||||
ecConfig distribution.ECConfig,
|
||||
) *ProportionalECRebalancer {
|
||||
return &ProportionalECRebalancer{
|
||||
ecNodes: ecNodes,
|
||||
replicaPlacement: rp,
|
||||
diskType: diskType,
|
||||
ecConfig: ecConfig,
|
||||
}
|
||||
}
|
||||
|
||||
// PlanMoves generates a plan for moving shards to achieve proportional distribution
|
||||
func (r *ProportionalECRebalancer) PlanMoves(
|
||||
volumeId needle.VolumeId,
|
||||
locations []*EcNode,
|
||||
) ([]ECShardMove, error) {
|
||||
// Build topology analysis
|
||||
analysis := distribution.NewTopologyAnalysis()
|
||||
nodeMap := make(map[string]*EcNode)
|
||||
|
||||
// Add all EC nodes to the analysis (even those without shards)
|
||||
for _, node := range r.ecNodes {
|
||||
nodeId := node.info.Id
|
||||
topoNode := &distribution.TopologyNode{
|
||||
NodeID: nodeId,
|
||||
DataCenter: string(node.dc),
|
||||
Rack: string(node.rack),
|
||||
FreeSlots: node.freeEcSlot,
|
||||
}
|
||||
analysis.AddNode(topoNode)
|
||||
nodeMap[nodeId] = node
|
||||
}
|
||||
|
||||
// Add shard locations from nodes that have shards
|
||||
for _, node := range locations {
|
||||
nodeId := node.info.Id
|
||||
shardBits := findEcVolumeShards(node, volumeId, r.diskType)
|
||||
for _, shardId := range shardBits.ShardIds() {
|
||||
analysis.AddShardLocation(distribution.ShardLocation{
|
||||
ShardID: int(shardId),
|
||||
NodeID: nodeId,
|
||||
DataCenter: string(node.dc),
|
||||
Rack: string(node.rack),
|
||||
})
|
||||
}
|
||||
if _, exists := nodeMap[nodeId]; !exists {
|
||||
nodeMap[nodeId] = node
|
||||
}
|
||||
}
|
||||
|
||||
analysis.Finalize()
|
||||
|
||||
// Create rebalancer and plan moves
|
||||
rep := distribution.NewReplicationConfig(r.replicaPlacement)
|
||||
rebalancer := distribution.NewRebalancer(r.ecConfig, rep)
|
||||
|
||||
plan, err := rebalancer.PlanRebalance(analysis)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
// Convert distribution moves to shell moves
|
||||
var moves []ECShardMove
|
||||
for _, move := range plan.Moves {
|
||||
srcNode := nodeMap[move.SourceNode.NodeID]
|
||||
destNode := nodeMap[move.DestNode.NodeID]
|
||||
if srcNode == nil || destNode == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
moves = append(moves, ECShardMove{
|
||||
VolumeId: volumeId,
|
||||
ShardId: erasure_coding.ShardId(move.ShardID),
|
||||
SourceNode: srcNode,
|
||||
DestNode: destNode,
|
||||
Reason: move.Reason,
|
||||
})
|
||||
}
|
||||
|
||||
return moves, nil
|
||||
}
|
||||
|
||||
// GetDistributionSummary returns a summary of the planned distribution
|
||||
func GetDistributionSummary(rp *super_block.ReplicaPlacement) string {
|
||||
ec := distribution.DefaultECConfig()
|
||||
rep := distribution.NewReplicationConfig(rp)
|
||||
dist := distribution.CalculateDistribution(ec, rep)
|
||||
return dist.Summary()
|
||||
}
|
||||
|
||||
// GetDistributionSummaryWithConfig returns a summary with custom EC configuration
|
||||
func GetDistributionSummaryWithConfig(rp *super_block.ReplicaPlacement, ecConfig distribution.ECConfig) string {
|
||||
rep := distribution.NewReplicationConfig(rp)
|
||||
dist := distribution.CalculateDistribution(ecConfig, rep)
|
||||
return dist.Summary()
|
||||
}
|
||||
|
||||
// GetFaultToleranceAnalysis returns fault tolerance analysis for the given configuration
|
||||
func GetFaultToleranceAnalysis(rp *super_block.ReplicaPlacement) string {
|
||||
ec := distribution.DefaultECConfig()
|
||||
rep := distribution.NewReplicationConfig(rp)
|
||||
dist := distribution.CalculateDistribution(ec, rep)
|
||||
return dist.FaultToleranceAnalysis()
|
||||
}
|
||||
|
||||
// GetFaultToleranceAnalysisWithConfig returns fault tolerance analysis with custom EC configuration
|
||||
func GetFaultToleranceAnalysisWithConfig(rp *super_block.ReplicaPlacement, ecConfig distribution.ECConfig) string {
|
||||
rep := distribution.NewReplicationConfig(rp)
|
||||
dist := distribution.CalculateDistribution(ecConfig, rep)
|
||||
return dist.FaultToleranceAnalysis()
|
||||
}
|
||||
Reference in New Issue
Block a user