Files
seaweedFS/test/s3/remote_cache/README.md
G-OD 504b258258 s3: fix remote object not caching (#7790)
* s3: fix remote object not caching

* s3: address review comments for remote object caching

- Fix leading slash in object name by using strings.TrimPrefix
- Return cached entry from CacheRemoteObjectToLocalCluster to get updated local chunk locations
- Reuse existing helper function instead of inline gRPC call

* s3/filer: add singleflight deduplication for remote object caching

- Add singleflight.Group to FilerServer to deduplicate concurrent cache operations
- Wrap CacheRemoteObjectToLocalCluster with singleflight to ensure only one
  caching operation runs per object when multiple clients request the same file
- Add early-return check for already-cached objects
- S3 API calls filer gRPC with timeout and graceful fallback on error
- Clear negative bucket cache when bucket is created via weed shell
- Add integration tests for remote cache with singleflight deduplication

This benefits all clients (S3, HTTP, Hadoop) accessing remote-mounted objects
by preventing redundant cache operations and improving concurrent access performance.

Fixes: https://github.com/seaweedfs/seaweedfs/discussions/7599

* fix: data race in concurrent remote object caching

- Add mutex to protect chunks slice from concurrent append
- Add mutex to protect fetchAndWriteErr from concurrent read/write
- Fix incorrect error check (was checking assignResult.Error instead of parseErr)
- Rename inner variable to avoid shadowing fetchAndWriteErr

* fix: address code review comments

- Remove duplicate remote caching block in GetObjectHandler, keep only singleflight version
- Add mutex protection for concurrent chunk slice and error access (data race fix)
- Use lazy initialization for S3 client in tests to avoid panic during package load
- Fix markdown linting: add language specifier to code fence, blank lines around tables
- Add 'all' target to Makefile as alias for test-with-server
- Remove unused 'util' import

* style: remove emojis from test files

* fix: add defensive checks and sort chunks by offset

- Add nil check and type assertion check for singleflight result
- Sort chunks by offset after concurrent fetching to maintain file order

* fix: improve test diagnostics and path normalization

- runWeedShell now returns error for better test diagnostics
- Add all targets to .PHONY in Makefile (logs-primary, logs-remote, health)
- Strip leading slash from normalizedObject to avoid double slashes in path

---------

Co-authored-by: chrislu <chris.lu@gmail.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2025-12-16 12:41:04 -08:00

158 lines
5.2 KiB
Markdown

# Remote Object Cache Integration Tests
This directory contains integration tests for the remote object caching feature with singleflight deduplication.
## Test Flow
Each test follows this pattern:
1. **Write to local** - Upload data to primary SeaweedFS (local storage)
2. **Uncache** - Push data to remote storage and remove local chunks
3. **Read** - Read data (triggers caching from remote back to local)
This tests the full remote caching workflow including singleflight deduplication.
## Architecture
```text
┌─────────────────────────────────────────────────────────────────┐
│ Test Client │
│ │
│ 1. PUT data to primary SeaweedFS │
│ 2. remote.cache.uncache (push to remote, purge local) │
│ 3. GET data (triggers caching from remote) │
│ 4. Verify singleflight deduplication │
└──────────────────────────────────┬──────────────────────────────┘
┌─────────────────┴─────────────────┐
▼ ▼
┌────────────────────────────────────┐ ┌────────────────────────────────┐
│ Primary SeaweedFS │ │ Remote SeaweedFS │
│ (port 8333) │ │ (port 8334) │
│ │ │ │
│ - Being tested │ │ - Acts as "remote" S3 │
│ - Has remote storage mounted │──▶│ - Receives uncached data │
│ - Caches remote objects │ │ - Serves data for caching │
│ - Singleflight deduplication │ │ │
└────────────────────────────────────┘ └────────────────────────────────┘
```
## What's Being Tested
1. **Basic Remote Caching**: Write → Uncache → Read workflow
2. **Singleflight Deduplication**: Concurrent reads only trigger ONE caching operation
3. **Large Object Caching**: 5MB files cache correctly
4. **Range Requests**: Partial reads work with cached objects
5. **Not Found Handling**: Proper error for non-existent objects
## Quick Start
### Run Full Test Suite (Recommended)
```bash
# Build SeaweedFS, start both servers, run tests, stop servers
make test-with-server
```
### Manual Steps
```bash
# 1. Build SeaweedFS binary
make build-weed
# 2. Start remote SeaweedFS (acts as "remote" storage)
make start-remote
# 3. Start primary SeaweedFS (the one being tested)
make start-primary
# 4. Configure remote storage mount
make setup-remote
# 5. Run tests
make test
# 6. Clean up
make clean
```
## Configuration
### Primary SeaweedFS (Being Tested)
| Service | Port |
|---------|------|
| S3 API | 8333 |
| Filer | 8888 |
| Master | 9333 |
| Volume | 8080 |
### Remote SeaweedFS (Remote Storage)
| Service | Port |
|---------|------|
| S3 API | 8334 |
| Filer | 8889 |
| Master | 9334 |
| Volume | 8081 |
## Makefile Targets
```bash
make help # Show all available targets
make build-weed # Build SeaweedFS binary
make start-remote # Start remote SeaweedFS
make start-primary # Start primary SeaweedFS
make setup-remote # Configure remote storage mount
make test # Run tests
make test-with-server # Full automated test workflow
make logs # Show server logs
make health # Check server status
make clean # Stop servers and clean up
```
## Test Details
### TestRemoteCacheBasic
Basic workflow test:
1. Write object to primary (local)
2. Uncache (push to remote, remove local chunks)
3. Read (triggers caching from remote)
4. Read again (from local cache - should be faster)
### TestRemoteCacheConcurrent
Singleflight deduplication test:
1. Write 1MB object
2. Uncache to remote
3. Launch 10 concurrent reads
4. All should succeed with correct data
5. Only ONE caching operation should run (singleflight)
### TestRemoteCacheLargeObject
Large file test (5MB) to verify chunked transfer works correctly.
### TestRemoteCacheRangeRequest
Tests HTTP range requests work correctly after caching.
### TestRemoteCacheNotFound
Tests proper error handling for non-existent objects.
## Troubleshooting
### View logs
```bash
make logs # Show recent logs from both servers
make logs-primary # Follow primary logs in real-time
make logs-remote # Follow remote logs in real-time
```
### Check server health
```bash
make health
```
### Clean up and retry
```bash
make clean
make test-with-server
```