Files
seaweedFS/test/erasure_coding/README.md
Chris Lu 4aa50bfa6a fix: EC rebalance fails with replica placement 000 (#7812)
* fix: EC rebalance fails with replica placement 000

This PR fixes several issues with EC shard distribution:

1. Pre-flight check before EC encoding
   - Verify target disk type has capacity before encoding starts
   - Prevents encoding shards only to fail during rebalance
   - Shows helpful error when wrong diskType is specified (e.g., ssd when volumes are on hdd)

2. Fix EC rebalance with replica placement 000
   - When DiffRackCount=0, shards should be distributed freely across racks
   - The '000' placement means 'no volume replication needed' because EC provides redundancy
   - Previously all racks were skipped with error 'shards X > replica placement limit (0)'

3. Add unit tests for EC rebalance slot calculation
   - TestECRebalanceWithLimitedSlots: documents the limited slots scenario
   - TestECRebalanceZeroFreeSlots: reproduces the 0 free slots error

4. Add Makefile for manual EC testing
   - make setup: start cluster and populate data
   - make shell: open weed shell for EC commands
   - make clean: stop cluster and cleanup

* fix: default -rebalance to true for ec.encode

The -rebalance flag was defaulting to false, which meant ec.encode would
only print shard moves but not actually execute them. This is a poor
default since the whole point of EC encoding is to distribute shards
across servers for fault tolerance.

Now -rebalance defaults to true, so shards are actually distributed
after encoding. Users can use -rebalance=false if they only want to
see what would happen without making changes.

* test/erasure_coding: improve Makefile safety and docs

- Narrow pkill pattern for volume servers to use TEST_DIR instead of
  port pattern, avoiding accidental kills of unrelated SeaweedFS processes
- Document external dependencies (curl, jq) in header comments

* shell: refactor buildRackWithEcShards to reuse buildEcShards

Extract common shard bit construction logic to avoid duplication
between buildEcShards and buildRackWithEcShards helper functions.

* shell: update test for EC replication 000 behavior

When DiffRackCount=0 (replication "000"), EC shards should be
distributed freely across racks since erasure coding provides its
own redundancy. Update test expectation to reflect this behavior.

* erasure_coding: add distribution package for proportional EC shard placement

Add a new reusable package for EC shard distribution that:
- Supports configurable EC ratios (not hard-coded 10+4)
- Distributes shards proportionally based on replication policy
- Provides fault tolerance analysis
- Prefers moving parity shards to keep data shards spread out

Key components:
- ECConfig: Configurable data/parity shard counts
- ReplicationConfig: Parsed XYZ replication policy
- ECDistribution: Target shard counts per DC/rack/node
- Rebalancer: Plans shard moves with parity-first strategy

This enables seaweed-enterprise custom EC ratios and weed worker
integration while maintaining a clean, testable architecture.

* shell: integrate distribution package for EC rebalancing

Add shell wrappers around the distribution package:
- ProportionalECRebalancer: Plans moves using distribution.Rebalancer
- NewProportionalECRebalancerWithConfig: Supports custom EC configs
- GetDistributionSummary/GetFaultToleranceAnalysis: Helper functions

The shell layer converts between EcNode types and the generic
TopologyNode types used by the distribution package.

* test setup

* ec: improve data and parity shard distribution across racks

- Add shardsByTypePerRack helper to track data vs parity shards
- Rewrite doBalanceEcShardsAcrossRacks for two-pass balancing:
  1. Balance data shards (0-9) evenly, max ceil(10/6)=2 per rack
  2. Balance parity shards (10-13) evenly, max ceil(4/6)=1 per rack
- Add balanceShardTypeAcrossRacks for generic shard type balancing
- Add pickRackForShardType to select destination with room for type
- Add unit tests for even data/parity distribution verification

This ensures even read load during normal operation by spreading
both data and parity shards across all available racks.

* ec: make data/parity shard counts configurable in ecBalancer

- Add dataShardCount and parityShardCount fields to ecBalancer struct
- Add getDataShardCount() and getParityShardCount() methods with defaults
- Replace direct constant usage with configurable methods
- Fix unused variable warning for parityPerRack

This allows seaweed-enterprise to use custom EC ratios while
defaulting to standard 10+4 scheme.

* Address PR 7812 review comments

Makefile improvements:
- Save PIDs for each volume server for precise termination
- Use PID-based killing in stop target with pkill fallback
- Use more specific pkill patterns with TEST_DIR paths

Documentation:
- Document jq dependency in README.md

Rebalancer fix:
- Fix duplicate shard count updates in applyMovesToAnalysis
- All planners (DC/rack/node) update counts inline during planning
- Remove duplicate updates from applyMovesToAnalysis to avoid double-counting

* test/erasure_coding: use mktemp for test file template

Use mktemp instead of hardcoded /tmp/testfile_template.bin path
to provide better isolation for concurrent test runs.
2025-12-19 13:29:12 -08:00

123 lines
5.2 KiB
Markdown

# Erasure Coding Integration Tests
This directory contains integration tests for the EC (Erasure Coding) encoding volume location timing bug fix.
## The Bug
The bug caused **double storage usage** during EC encoding because:
1. **Silent failure**: Functions returned `nil` instead of proper error messages
2. **Timing race condition**: Volume locations were collected **AFTER** EC encoding when master metadata was already updated
3. **Missing cleanup**: Original volumes weren't being deleted after EC encoding
This resulted in both original `.dat` files AND EC `.ec00-.ec13` files coexisting, effectively **doubling storage usage**.
## The Fix
The fix addresses all three issues:
1. **Fixed silent failures**: Updated `doDeleteVolumes()` and `doEcEncode()` to return proper errors
2. **Fixed timing race condition**: Created `doDeleteVolumesWithLocations()` that uses pre-collected volume locations
3. **Enhanced cleanup**: Volume locations are now collected **BEFORE** EC encoding, preventing the race condition
## Integration Tests
### TestECEncodingVolumeLocationTimingBug
The main integration test that:
- **Simulates master timing race condition**: Tests what happens when volume locations are read from master AFTER EC encoding has updated the metadata
- **Verifies fix effectiveness**: Checks for the "Collecting volume locations...before EC encoding" message that proves the fix is working
- **Tests multi-server distribution**: Runs EC encoding with 6 volume servers to test shard distribution
- **Validates cleanup**: Ensures original volumes are properly cleaned up after EC encoding
### TestECEncodingMasterTimingRaceCondition
A focused test that specifically targets the **master metadata timing race condition**:
- **Simulates the exact race condition**: Tests volume location collection timing relative to master metadata updates
- **Detects timing fix**: Verifies that volume locations are collected BEFORE EC encoding starts
- **Demonstrates bug impact**: Shows what happens when volume locations are unavailable after master metadata update
### TestECEncodingRegressionPrevention
Regression tests that ensure:
- **Function signatures**: Fixed functions still exist and return proper errors
- **Timing patterns**: Volume location collection happens in the correct order
## Test Architecture
The tests use:
- **Real SeaweedFS cluster**: 1 master server + 6 volume servers
- **Multi-server setup**: Tests realistic EC shard distribution across multiple servers
- **Timing simulation**: Goroutines and delays to simulate race conditions
- **Output validation**: Checks for specific log messages that prove the fix is working
## Why Integration Tests Were Necessary
Unit tests could not catch this bug because:
1. **Race condition**: The bug only occurred in real-world timing scenarios
2. **Master-volume server interaction**: Required actual master metadata updates
3. **File system operations**: Needed real volume creation and EC shard generation
4. **Cleanup timing**: Required testing the sequence of operations in correct order
The integration tests successfully catch the timing bug by:
- **Testing real command execution**: Uses actual `ec.encode` shell command
- **Simulating race conditions**: Creates timing scenarios that expose the bug
- **Validating output messages**: Checks for the key "Collecting volume locations...before EC encoding" message
- **Monitoring cleanup behavior**: Ensures original volumes are properly deleted
## Running the Tests
```bash
# Run all integration tests
go test -v
# Run only the main timing test
go test -v -run TestECEncodingVolumeLocationTimingBug
# Run only the race condition test
go test -v -run TestECEncodingMasterTimingRaceCondition
# Skip integration tests (short mode)
go test -v -short
```
## Manual Testing with Makefile
A Makefile is provided for manual EC testing.
**Requirements:** `curl`, `jq` (command-line JSON processor)
```bash
# Quick start: start cluster and populate data
make setup
# Open weed shell to run EC commands
make shell
# Individual targets
make start # Start test cluster (master + 6 volume servers + filer)
make stop # Stop test cluster
make populate # Populate ~300MB of test data
make status # Show cluster and EC shard status
make clean # Stop cluster and remove all test data
make help # Show all targets
```
### EC Rebalance Limited Slots (Unit Test)
The "no free ec shard slots" issue is tested with a **unit test** that works directly on
topology data structures without requiring a running cluster.
**Location**: `weed/shell/ec_rebalance_slots_test.go`
Tests included:
- `TestECRebalanceWithLimitedSlots`: Tests a topology with 6 servers, 7 EC volumes (98 shards)
- `TestECRebalanceZeroFreeSlots`: Reproduces the exact 0 free slots scenario
**Known Issue**: When volume servers are at capacity (`volumeCount == maxVolumeCount`),
the rebalance step fails with "no free ec shard slots" instead of recognizing that
moving shards frees slots on source servers.
## Test Results
**With the fix**: Shows "Collecting volume locations for N volumes before EC encoding..." message
**Without the fix**: No collection message, potential timing race condition
The tests demonstrate that the fix prevents the volume location timing bug that caused double storage usage in EC encoding operations.