fix: prevent filer.backup stall in single-filer setups (#7695)

* fix: prevent filer.backup stall in single-filer setups (#4977)

When MetaAggregator.MetaLogBuffer is empty (which happens in single-filer
setups with no peers), ReadFromBuffer was returning nil error, causing
LoopProcessLogData to enter an infinite wait loop on ListenersCond.

This fix returns ResumeFromDiskError instead, allowing SubscribeMetadata
to loop back and read from persisted logs on disk. This ensures filer.backup
continues processing events even when the in-memory aggregator buffer is empty.

Fixes #4977

* test: add integration tests for metadata subscription

Add integration tests for metadata subscription functionality:

- TestMetadataSubscribeBasic: Tests basic subscription and event receiving
- TestMetadataSubscribeSingleFilerNoStall: Regression test for #4977,
  verifies subscription doesn't stall under high load in single-filer setups
- TestMetadataSubscribeResumeFromDisk: Tests resuming subscription from disk

Related to #4977

* ci: add GitHub Actions workflow for metadata subscribe tests

Add CI workflow that runs on:
- Push/PR to master affecting filer, log_buffer, or metadata subscribe code
- Runs the integration tests for metadata subscription
- Uploads logs on failure for debugging

Related to #4977

* fix: use multipart form-data for file uploads in integration tests

The filer expects multipart/form-data for file uploads, not raw POST body.
This fixes the 'Content-Type isn't multipart/form-data' error.

* test: use -peers=none for faster master startup

* test: add -peers=none to remaining master startup in ec tests

* fix: use filer HTTP port 8888, WithFilerClient adds 10000 for gRPC

WithFilerClient calls ToGrpcAddress() which adds 10000 to the port.
Passing 18888 resulted in connecting to 28888. Use 8888 instead.

* test: add concurrent writes and million updates tests

- TestMetadataSubscribeConcurrentWrites: 50 goroutines writing 20 files each
- TestMetadataSubscribeMillionUpdates: 1 million metadata entries via gRPC
  (metadata only, no actual file content for speed)

* fix: address PR review comments

- Handle os.MkdirAll errors explicitly instead of ignoring
- Handle log file creation errors with proper error messages
- Replace silent event dropping with 100ms timeout and warning log

* Update metadata_subscribe_integration_test.go
This commit is contained in:
Chris Lu
2025-12-09 20:15:35 -08:00
committed by GitHub
parent 1b13324fb7
commit d970c15d71
6 changed files with 1090 additions and 1 deletions

View File

@@ -0,0 +1,92 @@
name: "Metadata Subscribe Integration Tests"
on:
push:
branches: [ master ]
paths:
- 'weed/filer/**'
- 'weed/pb/filer_pb/**'
- 'weed/util/log_buffer/**'
- 'weed/server/filer_grpc_server_sub_meta.go'
- 'weed/command/filer_backup.go'
- 'test/metadata_subscribe/**'
- '.github/workflows/metadata-subscribe-tests.yml'
pull_request:
branches: [ master ]
paths:
- 'weed/filer/**'
- 'weed/pb/filer_pb/**'
- 'weed/util/log_buffer/**'
- 'weed/server/filer_grpc_server_sub_meta.go'
- 'weed/command/filer_backup.go'
- 'test/metadata_subscribe/**'
- '.github/workflows/metadata-subscribe-tests.yml'
concurrency:
group: ${{ github.head_ref }}/metadata-subscribe-tests
cancel-in-progress: true
permissions:
contents: read
env:
GO_VERSION: '1.24'
TEST_TIMEOUT: '10m'
jobs:
metadata-subscribe-integration:
name: Metadata Subscribe Integration Tests
runs-on: ubuntu-22.04
timeout-minutes: 20
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Set up Go ${{ env.GO_VERSION }}
uses: actions/setup-go@v6
with:
go-version: ${{ env.GO_VERSION }}
- name: Build SeaweedFS
run: |
cd weed
go build -o weed .
chmod +x weed
./weed version
- name: Run Metadata Subscribe Integration Tests
run: |
cd test/metadata_subscribe
echo "Running Metadata Subscribe integration tests..."
echo "============================================"
# Run tests with verbose output
go test -v -timeout=${{ env.TEST_TIMEOUT }} ./...
echo "============================================"
echo "Metadata Subscribe integration tests completed"
- name: Archive logs on failure
if: failure()
uses: actions/upload-artifact@v5
with:
name: metadata-subscribe-test-logs
path: |
/tmp/seaweedfs_*
retention-days: 7
- name: Test Summary
if: always()
run: |
echo "## Metadata Subscribe Integration Test Summary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "### Test Coverage" >> $GITHUB_STEP_SUMMARY
echo "- **Basic Subscription**: Subscribe to metadata changes and receive events" >> $GITHUB_STEP_SUMMARY
echo "- **Single-Filer No Stall**: Regression test for issue #4977" >> $GITHUB_STEP_SUMMARY
echo "- **Resume from Disk**: Verify subscription can resume from persisted logs" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "### Related Issues" >> $GITHUB_STEP_SUMMARY
echo "- [#4977](https://github.com/seaweedfs/seaweedfs/issues/4977): filer.backup synchronisation stall" >> $GITHUB_STEP_SUMMARY