Fix flaky EC integration tests by collecting server logs on failure (#7969)
* Fix flaky EC integration tests by collecting server logs on failure The EC Integration Tests were experiencing flaky timeouts with errors like "error reading from server: EOF" and master client reconnection attempts. When tests failed, server logs were not collected, making debugging difficult. Changes: - Updated all test functions to use t.TempDir() instead of os.MkdirTemp() and manual cleanup. t.TempDir() automatically preserves directories when tests fail, ensuring logs are available for debugging. - Modified GitHub Actions workflow to collect server logs from temp directories when tests fail, including master.log and volume*.log files. - Added explicit log collection step that searches for test temp directories and copies them to artifacts for upload. This will make debugging flaky test failures much easier by providing access to the actual server logs showing what went wrong. * Fix find command precedence in log collection The -type d flag only applied to the first -name predicate because -o has lower precedence than the implicit AND. Grouped the -name predicates with escaped parentheses so -type d applies to all directory name patterns.
This commit is contained in:
22
.github/workflows/ec-integration-tests.yml
vendored
22
.github/workflows/ec-integration-tests.yml
vendored
@@ -33,9 +33,29 @@ jobs:
|
||||
run: |
|
||||
go test -v
|
||||
|
||||
- name: Collect server logs on failure
|
||||
if: failure()
|
||||
run: |
|
||||
echo "Collecting server logs from temp directories..."
|
||||
mkdir -p /tmp/ec-test-logs
|
||||
# Find all temp directories created by the tests (they persist on failure with t.TempDir())
|
||||
find /tmp -maxdepth 1 -type d \( -name "TestEC*" -o -name "TestDisk*" -o -name "TestCross*" -o -name "TestEvacuation*" \) 2>/dev/null | while read dir; do
|
||||
if [ -d "$dir" ]; then
|
||||
echo "Found test directory: $dir"
|
||||
# Copy the entire directory structure to preserve organization
|
||||
cp -r "$dir" /tmp/ec-test-logs/ 2>/dev/null || true
|
||||
fi
|
||||
done
|
||||
# List what we collected
|
||||
echo "Collected logs:"
|
||||
find /tmp/ec-test-logs -type f -name "*.log" 2>/dev/null || echo "No logs found"
|
||||
|
||||
- name: Archive logs
|
||||
if: failure()
|
||||
uses: actions/upload-artifact@v6
|
||||
with:
|
||||
name: ec-integration-test-logs
|
||||
path: test/erasure_coding
|
||||
path: |
|
||||
/tmp/ec-test-logs/
|
||||
test/erasure_coding/
|
||||
if-no-files-found: warn
|
||||
Reference in New Issue
Block a user