* fix: EC rebalance fails with replica placement 000 This PR fixes several issues with EC shard distribution: 1. Pre-flight check before EC encoding - Verify target disk type has capacity before encoding starts - Prevents encoding shards only to fail during rebalance - Shows helpful error when wrong diskType is specified (e.g., ssd when volumes are on hdd) 2. Fix EC rebalance with replica placement 000 - When DiffRackCount=0, shards should be distributed freely across racks - The '000' placement means 'no volume replication needed' because EC provides redundancy - Previously all racks were skipped with error 'shards X > replica placement limit (0)' 3. Add unit tests for EC rebalance slot calculation - TestECRebalanceWithLimitedSlots: documents the limited slots scenario - TestECRebalanceZeroFreeSlots: reproduces the 0 free slots error 4. Add Makefile for manual EC testing - make setup: start cluster and populate data - make shell: open weed shell for EC commands - make clean: stop cluster and cleanup * fix: default -rebalance to true for ec.encode The -rebalance flag was defaulting to false, which meant ec.encode would only print shard moves but not actually execute them. This is a poor default since the whole point of EC encoding is to distribute shards across servers for fault tolerance. Now -rebalance defaults to true, so shards are actually distributed after encoding. Users can use -rebalance=false if they only want to see what would happen without making changes. * test/erasure_coding: improve Makefile safety and docs - Narrow pkill pattern for volume servers to use TEST_DIR instead of port pattern, avoiding accidental kills of unrelated SeaweedFS processes - Document external dependencies (curl, jq) in header comments * shell: refactor buildRackWithEcShards to reuse buildEcShards Extract common shard bit construction logic to avoid duplication between buildEcShards and buildRackWithEcShards helper functions. * shell: update test for EC replication 000 behavior When DiffRackCount=0 (replication "000"), EC shards should be distributed freely across racks since erasure coding provides its own redundancy. Update test expectation to reflect this behavior. * erasure_coding: add distribution package for proportional EC shard placement Add a new reusable package for EC shard distribution that: - Supports configurable EC ratios (not hard-coded 10+4) - Distributes shards proportionally based on replication policy - Provides fault tolerance analysis - Prefers moving parity shards to keep data shards spread out Key components: - ECConfig: Configurable data/parity shard counts - ReplicationConfig: Parsed XYZ replication policy - ECDistribution: Target shard counts per DC/rack/node - Rebalancer: Plans shard moves with parity-first strategy This enables seaweed-enterprise custom EC ratios and weed worker integration while maintaining a clean, testable architecture. * shell: integrate distribution package for EC rebalancing Add shell wrappers around the distribution package: - ProportionalECRebalancer: Plans moves using distribution.Rebalancer - NewProportionalECRebalancerWithConfig: Supports custom EC configs - GetDistributionSummary/GetFaultToleranceAnalysis: Helper functions The shell layer converts between EcNode types and the generic TopologyNode types used by the distribution package. * test setup * ec: improve data and parity shard distribution across racks - Add shardsByTypePerRack helper to track data vs parity shards - Rewrite doBalanceEcShardsAcrossRacks for two-pass balancing: 1. Balance data shards (0-9) evenly, max ceil(10/6)=2 per rack 2. Balance parity shards (10-13) evenly, max ceil(4/6)=1 per rack - Add balanceShardTypeAcrossRacks for generic shard type balancing - Add pickRackForShardType to select destination with room for type - Add unit tests for even data/parity distribution verification This ensures even read load during normal operation by spreading both data and parity shards across all available racks. * ec: make data/parity shard counts configurable in ecBalancer - Add dataShardCount and parityShardCount fields to ecBalancer struct - Add getDataShardCount() and getParityShardCount() methods with defaults - Replace direct constant usage with configurable methods - Fix unused variable warning for parityPerRack This allows seaweed-enterprise to use custom EC ratios while defaulting to standard 10+4 scheme. * Address PR 7812 review comments Makefile improvements: - Save PIDs for each volume server for precise termination - Use PID-based killing in stop target with pkill fallback - Use more specific pkill patterns with TEST_DIR paths Documentation: - Document jq dependency in README.md Rebalancer fix: - Fix duplicate shard count updates in applyMovesToAnalysis - All planners (DC/rack/node) update counts inline during planning - Remove duplicate updates from applyMovesToAnalysis to avoid double-counting * test/erasure_coding: use mktemp for test file template Use mktemp instead of hardcoded /tmp/testfile_template.bin path to provide better isolation for concurrent test runs.
123 lines
5.2 KiB
Markdown
123 lines
5.2 KiB
Markdown
# Erasure Coding Integration Tests
|
|
|
|
This directory contains integration tests for the EC (Erasure Coding) encoding volume location timing bug fix.
|
|
|
|
## The Bug
|
|
|
|
The bug caused **double storage usage** during EC encoding because:
|
|
|
|
1. **Silent failure**: Functions returned `nil` instead of proper error messages
|
|
2. **Timing race condition**: Volume locations were collected **AFTER** EC encoding when master metadata was already updated
|
|
3. **Missing cleanup**: Original volumes weren't being deleted after EC encoding
|
|
|
|
This resulted in both original `.dat` files AND EC `.ec00-.ec13` files coexisting, effectively **doubling storage usage**.
|
|
|
|
## The Fix
|
|
|
|
The fix addresses all three issues:
|
|
|
|
1. **Fixed silent failures**: Updated `doDeleteVolumes()` and `doEcEncode()` to return proper errors
|
|
2. **Fixed timing race condition**: Created `doDeleteVolumesWithLocations()` that uses pre-collected volume locations
|
|
3. **Enhanced cleanup**: Volume locations are now collected **BEFORE** EC encoding, preventing the race condition
|
|
|
|
## Integration Tests
|
|
|
|
### TestECEncodingVolumeLocationTimingBug
|
|
The main integration test that:
|
|
- **Simulates master timing race condition**: Tests what happens when volume locations are read from master AFTER EC encoding has updated the metadata
|
|
- **Verifies fix effectiveness**: Checks for the "Collecting volume locations...before EC encoding" message that proves the fix is working
|
|
- **Tests multi-server distribution**: Runs EC encoding with 6 volume servers to test shard distribution
|
|
- **Validates cleanup**: Ensures original volumes are properly cleaned up after EC encoding
|
|
|
|
### TestECEncodingMasterTimingRaceCondition
|
|
A focused test that specifically targets the **master metadata timing race condition**:
|
|
- **Simulates the exact race condition**: Tests volume location collection timing relative to master metadata updates
|
|
- **Detects timing fix**: Verifies that volume locations are collected BEFORE EC encoding starts
|
|
- **Demonstrates bug impact**: Shows what happens when volume locations are unavailable after master metadata update
|
|
|
|
### TestECEncodingRegressionPrevention
|
|
Regression tests that ensure:
|
|
- **Function signatures**: Fixed functions still exist and return proper errors
|
|
- **Timing patterns**: Volume location collection happens in the correct order
|
|
|
|
## Test Architecture
|
|
|
|
The tests use:
|
|
- **Real SeaweedFS cluster**: 1 master server + 6 volume servers
|
|
- **Multi-server setup**: Tests realistic EC shard distribution across multiple servers
|
|
- **Timing simulation**: Goroutines and delays to simulate race conditions
|
|
- **Output validation**: Checks for specific log messages that prove the fix is working
|
|
|
|
## Why Integration Tests Were Necessary
|
|
|
|
Unit tests could not catch this bug because:
|
|
1. **Race condition**: The bug only occurred in real-world timing scenarios
|
|
2. **Master-volume server interaction**: Required actual master metadata updates
|
|
3. **File system operations**: Needed real volume creation and EC shard generation
|
|
4. **Cleanup timing**: Required testing the sequence of operations in correct order
|
|
|
|
The integration tests successfully catch the timing bug by:
|
|
- **Testing real command execution**: Uses actual `ec.encode` shell command
|
|
- **Simulating race conditions**: Creates timing scenarios that expose the bug
|
|
- **Validating output messages**: Checks for the key "Collecting volume locations...before EC encoding" message
|
|
- **Monitoring cleanup behavior**: Ensures original volumes are properly deleted
|
|
|
|
## Running the Tests
|
|
|
|
```bash
|
|
# Run all integration tests
|
|
go test -v
|
|
|
|
# Run only the main timing test
|
|
go test -v -run TestECEncodingVolumeLocationTimingBug
|
|
|
|
# Run only the race condition test
|
|
go test -v -run TestECEncodingMasterTimingRaceCondition
|
|
|
|
# Skip integration tests (short mode)
|
|
go test -v -short
|
|
```
|
|
|
|
## Manual Testing with Makefile
|
|
|
|
A Makefile is provided for manual EC testing.
|
|
|
|
**Requirements:** `curl`, `jq` (command-line JSON processor)
|
|
|
|
```bash
|
|
# Quick start: start cluster and populate data
|
|
make setup
|
|
|
|
# Open weed shell to run EC commands
|
|
make shell
|
|
|
|
# Individual targets
|
|
make start # Start test cluster (master + 6 volume servers + filer)
|
|
make stop # Stop test cluster
|
|
make populate # Populate ~300MB of test data
|
|
make status # Show cluster and EC shard status
|
|
make clean # Stop cluster and remove all test data
|
|
make help # Show all targets
|
|
```
|
|
|
|
### EC Rebalance Limited Slots (Unit Test)
|
|
|
|
The "no free ec shard slots" issue is tested with a **unit test** that works directly on
|
|
topology data structures without requiring a running cluster.
|
|
|
|
**Location**: `weed/shell/ec_rebalance_slots_test.go`
|
|
|
|
Tests included:
|
|
- `TestECRebalanceWithLimitedSlots`: Tests a topology with 6 servers, 7 EC volumes (98 shards)
|
|
- `TestECRebalanceZeroFreeSlots`: Reproduces the exact 0 free slots scenario
|
|
|
|
**Known Issue**: When volume servers are at capacity (`volumeCount == maxVolumeCount`),
|
|
the rebalance step fails with "no free ec shard slots" instead of recognizing that
|
|
moving shards frees slots on source servers.
|
|
|
|
## Test Results
|
|
|
|
**With the fix**: Shows "Collecting volume locations for N volumes before EC encoding..." message
|
|
**Without the fix**: No collection message, potential timing race condition
|
|
|
|
The tests demonstrate that the fix prevents the volume location timing bug that caused double storage usage in EC encoding operations. |