s3: fix remote object not caching (#7790)

* s3: fix remote object not caching

* s3: address review comments for remote object caching

- Fix leading slash in object name by using strings.TrimPrefix
- Return cached entry from CacheRemoteObjectToLocalCluster to get updated local chunk locations
- Reuse existing helper function instead of inline gRPC call

* s3/filer: add singleflight deduplication for remote object caching

- Add singleflight.Group to FilerServer to deduplicate concurrent cache operations
- Wrap CacheRemoteObjectToLocalCluster with singleflight to ensure only one
  caching operation runs per object when multiple clients request the same file
- Add early-return check for already-cached objects
- S3 API calls filer gRPC with timeout and graceful fallback on error
- Clear negative bucket cache when bucket is created via weed shell
- Add integration tests for remote cache with singleflight deduplication

This benefits all clients (S3, HTTP, Hadoop) accessing remote-mounted objects
by preventing redundant cache operations and improving concurrent access performance.

Fixes: https://github.com/seaweedfs/seaweedfs/discussions/7599

* fix: data race in concurrent remote object caching

- Add mutex to protect chunks slice from concurrent append
- Add mutex to protect fetchAndWriteErr from concurrent read/write
- Fix incorrect error check (was checking assignResult.Error instead of parseErr)
- Rename inner variable to avoid shadowing fetchAndWriteErr

* fix: address code review comments

- Remove duplicate remote caching block in GetObjectHandler, keep only singleflight version
- Add mutex protection for concurrent chunk slice and error access (data race fix)
- Use lazy initialization for S3 client in tests to avoid panic during package load
- Fix markdown linting: add language specifier to code fence, blank lines around tables
- Add 'all' target to Makefile as alias for test-with-server
- Remove unused 'util' import

* style: remove emojis from test files

* fix: add defensive checks and sort chunks by offset

- Add nil check and type assertion check for singleflight result
- Sort chunks by offset after concurrent fetching to maintain file order

* fix: improve test diagnostics and path normalization

- runWeedShell now returns error for better test diagnostics
- Add all targets to .PHONY in Makefile (logs-primary, logs-remote, health)
- Strip leading slash from normalizedObject to avoid double slashes in path

---------

Co-authored-by: chrislu <chris.lu@gmail.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
This commit is contained in:
G-OD
2025-12-16 20:41:04 +00:00
committed by GitHub
parent 697b56003d
commit 504b258258
13 changed files with 992 additions and 36 deletions

View File

@@ -482,4 +482,59 @@ jobs:
path: test/s3/tagging/weed-test*.log
retention-days: 3
s3-remote-cache-tests:
name: S3 Remote Cache Tests
runs-on: ubuntu-22.04
timeout-minutes: 20
steps:
- name: Check out code
uses: actions/checkout@v6
- name: Set up Go
uses: actions/setup-go@v6
with:
go-version-file: 'go.mod'
id: go
- name: Install SeaweedFS
run: |
go install -buildvcs=false
- name: Run S3 Remote Cache Tests
timeout-minutes: 15
working-directory: test/s3/remote_cache
run: |
set -x
echo "=== System Information ==="
uname -a
free -h
# Run the remote cache integration tests
# Tests singleflight deduplication for caching remote objects
make test-with-server || {
echo "❌ Test failed, checking logs..."
if [ -f primary-weed.log ]; then
echo "=== Primary server logs ==="
tail -100 primary-weed.log
fi
if [ -f remote-weed.log ]; then
echo "=== Remote server logs ==="
tail -100 remote-weed.log
fi
echo "=== Process information ==="
ps aux | grep -E "(weed|test)" || true
exit 1
}
- name: Upload test logs on failure
if: failure()
uses: actions/upload-artifact@v6
with:
name: s3-remote-cache-test-logs
path: |
test/s3/remote_cache/primary-weed.log
test/s3/remote_cache/remote-weed.log
retention-days: 3
# Removed SSE-C integration tests and compatibility job