s3: fix remote object not caching (#7790)
* s3: fix remote object not caching * s3: address review comments for remote object caching - Fix leading slash in object name by using strings.TrimPrefix - Return cached entry from CacheRemoteObjectToLocalCluster to get updated local chunk locations - Reuse existing helper function instead of inline gRPC call * s3/filer: add singleflight deduplication for remote object caching - Add singleflight.Group to FilerServer to deduplicate concurrent cache operations - Wrap CacheRemoteObjectToLocalCluster with singleflight to ensure only one caching operation runs per object when multiple clients request the same file - Add early-return check for already-cached objects - S3 API calls filer gRPC with timeout and graceful fallback on error - Clear negative bucket cache when bucket is created via weed shell - Add integration tests for remote cache with singleflight deduplication This benefits all clients (S3, HTTP, Hadoop) accessing remote-mounted objects by preventing redundant cache operations and improving concurrent access performance. Fixes: https://github.com/seaweedfs/seaweedfs/discussions/7599 * fix: data race in concurrent remote object caching - Add mutex to protect chunks slice from concurrent append - Add mutex to protect fetchAndWriteErr from concurrent read/write - Fix incorrect error check (was checking assignResult.Error instead of parseErr) - Rename inner variable to avoid shadowing fetchAndWriteErr * fix: address code review comments - Remove duplicate remote caching block in GetObjectHandler, keep only singleflight version - Add mutex protection for concurrent chunk slice and error access (data race fix) - Use lazy initialization for S3 client in tests to avoid panic during package load - Fix markdown linting: add language specifier to code fence, blank lines around tables - Add 'all' target to Makefile as alias for test-with-server - Remove unused 'util' import * style: remove emojis from test files * fix: add defensive checks and sort chunks by offset - Add nil check and type assertion check for singleflight result - Sort chunks by offset after concurrent fetching to maintain file order * fix: improve test diagnostics and path normalization - runWeedShell now returns error for better test diagnostics - Add all targets to .PHONY in Makefile (logs-primary, logs-remote, health) - Strip leading slash from normalizedObject to avoid double slashes in path --------- Co-authored-by: chrislu <chris.lu@gmail.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
This commit is contained in:
157
test/s3/remote_cache/README.md
Normal file
157
test/s3/remote_cache/README.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Remote Object Cache Integration Tests
|
||||
|
||||
This directory contains integration tests for the remote object caching feature with singleflight deduplication.
|
||||
|
||||
## Test Flow
|
||||
|
||||
Each test follows this pattern:
|
||||
1. **Write to local** - Upload data to primary SeaweedFS (local storage)
|
||||
2. **Uncache** - Push data to remote storage and remove local chunks
|
||||
3. **Read** - Read data (triggers caching from remote back to local)
|
||||
|
||||
This tests the full remote caching workflow including singleflight deduplication.
|
||||
|
||||
## Architecture
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Test Client │
|
||||
│ │
|
||||
│ 1. PUT data to primary SeaweedFS │
|
||||
│ 2. remote.cache.uncache (push to remote, purge local) │
|
||||
│ 3. GET data (triggers caching from remote) │
|
||||
│ 4. Verify singleflight deduplication │
|
||||
└──────────────────────────────────┬──────────────────────────────┘
|
||||
│
|
||||
┌─────────────────┴─────────────────┐
|
||||
▼ ▼
|
||||
┌────────────────────────────────────┐ ┌────────────────────────────────┐
|
||||
│ Primary SeaweedFS │ │ Remote SeaweedFS │
|
||||
│ (port 8333) │ │ (port 8334) │
|
||||
│ │ │ │
|
||||
│ - Being tested │ │ - Acts as "remote" S3 │
|
||||
│ - Has remote storage mounted │──▶│ - Receives uncached data │
|
||||
│ - Caches remote objects │ │ - Serves data for caching │
|
||||
│ - Singleflight deduplication │ │ │
|
||||
└────────────────────────────────────┘ └────────────────────────────────┘
|
||||
```
|
||||
|
||||
## What's Being Tested
|
||||
|
||||
1. **Basic Remote Caching**: Write → Uncache → Read workflow
|
||||
2. **Singleflight Deduplication**: Concurrent reads only trigger ONE caching operation
|
||||
3. **Large Object Caching**: 5MB files cache correctly
|
||||
4. **Range Requests**: Partial reads work with cached objects
|
||||
5. **Not Found Handling**: Proper error for non-existent objects
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Run Full Test Suite (Recommended)
|
||||
|
||||
```bash
|
||||
# Build SeaweedFS, start both servers, run tests, stop servers
|
||||
make test-with-server
|
||||
```
|
||||
|
||||
### Manual Steps
|
||||
|
||||
```bash
|
||||
# 1. Build SeaweedFS binary
|
||||
make build-weed
|
||||
|
||||
# 2. Start remote SeaweedFS (acts as "remote" storage)
|
||||
make start-remote
|
||||
|
||||
# 3. Start primary SeaweedFS (the one being tested)
|
||||
make start-primary
|
||||
|
||||
# 4. Configure remote storage mount
|
||||
make setup-remote
|
||||
|
||||
# 5. Run tests
|
||||
make test
|
||||
|
||||
# 6. Clean up
|
||||
make clean
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Primary SeaweedFS (Being Tested)
|
||||
|
||||
| Service | Port |
|
||||
|---------|------|
|
||||
| S3 API | 8333 |
|
||||
| Filer | 8888 |
|
||||
| Master | 9333 |
|
||||
| Volume | 8080 |
|
||||
|
||||
### Remote SeaweedFS (Remote Storage)
|
||||
|
||||
| Service | Port |
|
||||
|---------|------|
|
||||
| S3 API | 8334 |
|
||||
| Filer | 8889 |
|
||||
| Master | 9334 |
|
||||
| Volume | 8081 |
|
||||
|
||||
## Makefile Targets
|
||||
|
||||
```bash
|
||||
make help # Show all available targets
|
||||
make build-weed # Build SeaweedFS binary
|
||||
make start-remote # Start remote SeaweedFS
|
||||
make start-primary # Start primary SeaweedFS
|
||||
make setup-remote # Configure remote storage mount
|
||||
make test # Run tests
|
||||
make test-with-server # Full automated test workflow
|
||||
make logs # Show server logs
|
||||
make health # Check server status
|
||||
make clean # Stop servers and clean up
|
||||
```
|
||||
|
||||
## Test Details
|
||||
|
||||
### TestRemoteCacheBasic
|
||||
Basic workflow test:
|
||||
1. Write object to primary (local)
|
||||
2. Uncache (push to remote, remove local chunks)
|
||||
3. Read (triggers caching from remote)
|
||||
4. Read again (from local cache - should be faster)
|
||||
|
||||
### TestRemoteCacheConcurrent
|
||||
Singleflight deduplication test:
|
||||
1. Write 1MB object
|
||||
2. Uncache to remote
|
||||
3. Launch 10 concurrent reads
|
||||
4. All should succeed with correct data
|
||||
5. Only ONE caching operation should run (singleflight)
|
||||
|
||||
### TestRemoteCacheLargeObject
|
||||
Large file test (5MB) to verify chunked transfer works correctly.
|
||||
|
||||
### TestRemoteCacheRangeRequest
|
||||
Tests HTTP range requests work correctly after caching.
|
||||
|
||||
### TestRemoteCacheNotFound
|
||||
Tests proper error handling for non-existent objects.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### View logs
|
||||
```bash
|
||||
make logs # Show recent logs from both servers
|
||||
make logs-primary # Follow primary logs in real-time
|
||||
make logs-remote # Follow remote logs in real-time
|
||||
```
|
||||
|
||||
### Check server health
|
||||
```bash
|
||||
make health
|
||||
```
|
||||
|
||||
### Clean up and retry
|
||||
```bash
|
||||
make clean
|
||||
make test-with-server
|
||||
```
|
||||
Reference in New Issue
Block a user