s3: fix PutObject ETag format for multi-chunk uploads (#7771)
* s3: fix PutObject ETag format for multi-chunk uploads Fix issue #7768: AWS S3 SDK for Java fails with 'Invalid base 16 character: -' when performing PutObject on files that are internally auto-chunked. The issue was that SeaweedFS returned a composite ETag format (<md5hash>-<count>) for regular PutObject when the file was split into multiple chunks due to auto-chunking. However, per AWS S3 spec, the composite ETag format should only be used for multipart uploads (CreateMultipartUpload/UploadPart/CompleteMultipartUpload API). Regular PutObject should always return a pure MD5 hash as the ETag, regardless of how the file is stored internally. The fix ensures the MD5 hash is always stored in entry.Attributes.Md5 for regular PutObject operations, so filer.ETag() returns the pure MD5 hash instead of falling back to ETagChunks() composite format. * test: add comprehensive ETag format tests for issue #7768 Add integration tests to ensure PutObject ETag format compatibility: Go tests (test/s3/etag/): - TestPutObjectETagFormat_SmallFile: 1KB single chunk - TestPutObjectETagFormat_LargeFile: 10MB auto-chunked (critical for #7768) - TestPutObjectETagFormat_ExtraLargeFile: 25MB multi-chunk - TestMultipartUploadETagFormat: verify composite ETag for multipart - TestPutObjectETagConsistency: ETag consistency across PUT/HEAD/GET - TestETagHexValidation: simulate AWS SDK v2 hex decoding - TestMultipleLargeFileUploads: stress test multiple large uploads Java tests (other/java/s3copier/): - Update pom.xml to include AWS SDK v2 (2.20.127) - Add ETagValidationTest.java with comprehensive SDK v2 tests - Add README.md documenting SDK versions and test coverage Documentation: - Add test/s3/SDK_COMPATIBILITY.md documenting validated SDK versions - Add test/s3/etag/README.md explaining test coverage These tests ensure large file PutObject (>8MB) returns pure MD5 ETags (not composite format), which is required for AWS SDK v2 compatibility. * fix: lower Java version requirement to 11 for CI compatibility * address CodeRabbit review comments - s3_etag_test.go: Handle rand.Read error, fix multipart part-count logging - Makefile: Add 'all' target, pass S3_ENDPOINT to test commands - SDK_COMPATIBILITY.md: Add language tag to fenced code block - ETagValidationTest.java: Add pagination to cleanup logic - README.md: Clarify Go SDK tests are in separate location * ci: add s3copier ETag validation tests to Java integration tests - Enable S3 API (-s3 -s3.port=8333) in SeaweedFS test server - Add S3 API readiness check to wait loop - Add step to run ETagValidationTest from s3copier This ensures the fix for issue #7768 is continuously tested against AWS SDK v2 for Java in CI. * ci: add S3 config with credentials for s3copier tests - Add -s3.config pointing to docker/compose/s3.json - Add -s3.allowDeleteBucketNotEmpty for test cleanup - Set S3_ACCESS_KEY and S3_SECRET_KEY env vars for tests * ci: pass S3 config as Maven system properties Pass S3_ENDPOINT, S3_ACCESS_KEY, S3_SECRET_KEY via -D flags so they're available via System.getProperty() in Java tests
This commit is contained in:
126
test/s3/SDK_COMPATIBILITY.md
Normal file
126
test/s3/SDK_COMPATIBILITY.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# S3 SDK Compatibility Testing
|
||||
|
||||
This document describes the SDK versions tested against SeaweedFS S3 API and known compatibility considerations.
|
||||
|
||||
## Validated SDK Versions
|
||||
|
||||
### Go SDKs
|
||||
|
||||
| SDK | Version | Test Location | Status |
|
||||
|-----|---------|---------------|--------|
|
||||
| AWS SDK v2 for Go | 1.20+ | `test/s3/etag/`, `test/s3/copying/` | ✅ Tested |
|
||||
| AWS SDK v1 for Go | 1.x | `test/s3/basic/` | ✅ Tested |
|
||||
|
||||
### Java SDKs
|
||||
|
||||
| SDK | Version | Test Location | Status |
|
||||
|-----|---------|---------------|--------|
|
||||
| AWS SDK v2 for Java | 2.20.127+ | `other/java/s3copier/` | ✅ Tested |
|
||||
| AWS SDK v1 for Java | 1.12.600+ | `other/java/s3copier/` | ✅ Tested |
|
||||
|
||||
### Python SDKs
|
||||
|
||||
| SDK | Version | Test Location | Status |
|
||||
|-----|---------|---------------|--------|
|
||||
| boto3 | 1.x | `test/s3/parquet/` | ✅ Tested |
|
||||
| PyArrow S3 | 14+ | `test/s3/parquet/` | ✅ Tested |
|
||||
|
||||
## SDK-Specific Considerations
|
||||
|
||||
### AWS SDK v2 for Java - ETag Validation
|
||||
|
||||
**Issue**: [GitHub #7768](https://github.com/seaweedfs/seaweedfs/issues/7768)
|
||||
|
||||
AWS SDK v2 for Java includes strict ETag validation in `ChecksumsEnabledValidator.validatePutObjectChecksum`. It decodes the ETag as a hexadecimal MD5 hash using `Base16Codec.decode()`.
|
||||
|
||||
**Impact**: If the ETag contains non-hexadecimal characters (like `-` in composite format), the SDK fails with:
|
||||
```text
|
||||
java.lang.IllegalArgumentException: Invalid base 16 character: '-'
|
||||
```
|
||||
|
||||
**Resolution**: SeaweedFS now correctly returns:
|
||||
- **PutObject**: Pure MD5 hex ETag (32 characters) regardless of internal chunking
|
||||
- **CompleteMultipartUpload**: Composite ETag (`<md5>-<partcount>`)
|
||||
|
||||
**Test Coverage**: `test/s3/etag/` and `other/java/s3copier/ETagValidationTest.java`
|
||||
|
||||
### AWS SDK v1 vs v2 Differences
|
||||
|
||||
| Feature | SDK v1 | SDK v2 |
|
||||
|---------|--------|--------|
|
||||
| ETag hex validation | No | Yes (strict) |
|
||||
| Checksum validation | Basic | Enhanced |
|
||||
| Async support | Limited | Full |
|
||||
| Default retry behavior | Lenient | Stricter |
|
||||
|
||||
### Large File Handling
|
||||
|
||||
SeaweedFS auto-chunks files larger than **8MB** for efficient storage. This is transparent to clients, but previously affected ETag format. The current implementation ensures:
|
||||
|
||||
1. Regular `PutObject` (any size): Returns pure MD5 ETag
|
||||
2. Multipart upload: Returns composite ETag per AWS S3 specification
|
||||
|
||||
## Test Categories by File Size
|
||||
|
||||
| Category | Size | Chunks | ETag Format |
|
||||
|----------|------|--------|-------------|
|
||||
| Small | < 256KB | 1 (inline) | Pure MD5 |
|
||||
| Medium | 256KB - 8MB | 1 | Pure MD5 |
|
||||
| Large | 8MB - 24MB | 2-3 | Pure MD5 |
|
||||
| Extra Large | > 24MB | 4+ | Pure MD5 |
|
||||
| Multipart | N/A | Per part | Composite |
|
||||
|
||||
## Running SDK Compatibility Tests
|
||||
|
||||
### Go Tests
|
||||
|
||||
```bash
|
||||
# Run all ETag tests
|
||||
cd test/s3/etag && make test
|
||||
|
||||
# Run large file tests only
|
||||
cd test/s3/etag && make test-large
|
||||
```
|
||||
|
||||
### Java Tests
|
||||
|
||||
```bash
|
||||
# Run all Java SDK tests
|
||||
cd other/java/s3copier && mvn test
|
||||
|
||||
# Run only ETag validation tests
|
||||
cd other/java/s3copier && mvn test -Dtest=ETagValidationTest
|
||||
```
|
||||
|
||||
### Python Tests
|
||||
|
||||
```bash
|
||||
# Run PyArrow S3 tests
|
||||
cd test/s3/parquet && make test
|
||||
```
|
||||
|
||||
## Adding New SDK Tests
|
||||
|
||||
When adding tests for new SDKs, ensure:
|
||||
|
||||
1. **Large file tests (>8MB)**: Critical for verifying ETag format with auto-chunking
|
||||
2. **Multipart upload tests**: Verify composite ETag format
|
||||
3. **Checksum validation**: Test SDK-specific checksum validation if applicable
|
||||
4. **Document SDK version**: Add to this compatibility matrix
|
||||
|
||||
## Known Issues and Workarounds
|
||||
|
||||
### Issue: Older SDK Versions
|
||||
|
||||
Some very old SDK versions (e.g., AWS SDK v1 for Java < 1.11.x) may have different behavior. Testing with the versions listed above is recommended.
|
||||
|
||||
### Issue: Custom Checksum Algorithms
|
||||
|
||||
AWS SDK v2 supports SHA-256 and CRC32 checksums in addition to MD5. SeaweedFS currently returns MD5-based ETags. For checksums other than MD5, use the `x-amz-checksum-*` headers.
|
||||
|
||||
## References
|
||||
|
||||
- [AWS S3 ETag Documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html)
|
||||
- [AWS SDK v2 Migration Guide](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration.html)
|
||||
- [GitHub Issue #7768](https://github.com/seaweedfs/seaweedfs/issues/7768)
|
||||
|
||||
50
test/s3/etag/Makefile
Normal file
50
test/s3/etag/Makefile
Normal file
@@ -0,0 +1,50 @@
|
||||
# ETag Format Integration Tests
|
||||
#
|
||||
# These tests verify S3 ETag format compatibility, particularly for large files
|
||||
# that trigger SeaweedFS auto-chunking. This addresses GitHub Issue #7768.
|
||||
#
|
||||
# Prerequisites:
|
||||
# - SeaweedFS running with S3 API enabled on port 8333
|
||||
# - Go 1.21+
|
||||
#
|
||||
# Usage:
|
||||
# make test - Run all tests
|
||||
# make test-large - Run only large file tests
|
||||
# make test-verbose - Run with verbose output
|
||||
# make clean - Clean test artifacts
|
||||
|
||||
.PHONY: all test test-large test-verbose test-quick clean help
|
||||
|
||||
# Default S3 endpoint
|
||||
S3_ENDPOINT ?= http://127.0.0.1:8333
|
||||
|
||||
all: test
|
||||
|
||||
test:
|
||||
@echo "Running ETag format tests against $(S3_ENDPOINT)..."
|
||||
S3_ENDPOINT=$(S3_ENDPOINT) go test -v -timeout 5m ./...
|
||||
|
||||
test-large:
|
||||
@echo "Running large file ETag tests..."
|
||||
S3_ENDPOINT=$(S3_ENDPOINT) go test -v -timeout 5m -run "LargeFile|ExtraLarge" ./...
|
||||
|
||||
test-verbose:
|
||||
S3_ENDPOINT=$(S3_ENDPOINT) go test -v -timeout 5m -count=1 ./...
|
||||
|
||||
test-quick:
|
||||
@echo "Running quick ETag tests (small files only)..."
|
||||
S3_ENDPOINT=$(S3_ENDPOINT) go test -v -timeout 1m -run "SmallFile|Consistency" ./...
|
||||
|
||||
clean:
|
||||
go clean -testcache
|
||||
|
||||
help:
|
||||
@echo "ETag Format Integration Tests"
|
||||
@echo "Targets:"
|
||||
@echo " test Run all ETag format tests"
|
||||
@echo " test-large Run only large file tests (>8MB)"
|
||||
@echo " test-quick Run quick tests (small files only)"
|
||||
@echo " test-verbose Run with verbose output"
|
||||
@echo " clean Clean test cache"
|
||||
@echo "Environment Variables:"
|
||||
@echo " S3_ENDPOINT S3 endpoint URL (default: http://127.0.0.1:8333)"
|
||||
92
test/s3/etag/README.md
Normal file
92
test/s3/etag/README.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# S3 ETag Format Integration Tests
|
||||
|
||||
This test suite verifies that SeaweedFS returns correct ETag formats for S3 operations, ensuring compatibility with AWS S3 SDKs.
|
||||
|
||||
## Background
|
||||
|
||||
**GitHub Issue #7768**: AWS S3 SDK for Java v2 was failing with `Invalid base 16 character: '-'` when performing `PutObject` on large files.
|
||||
|
||||
### Root Cause
|
||||
|
||||
SeaweedFS internally auto-chunks large files (>8MB) for efficient storage. Previously, when a regular `PutObject` request resulted in multiple internal chunks, SeaweedFS returned a composite ETag format (`<md5>-<count>`) instead of a pure MD5 hash.
|
||||
|
||||
### AWS S3 Specification
|
||||
|
||||
| Operation | ETag Format | Example |
|
||||
|-----------|-------------|---------|
|
||||
| PutObject (any size) | Pure MD5 hex (32 chars) | `d41d8cd98f00b204e9800998ecf8427e` |
|
||||
| CompleteMultipartUpload | Composite (`<md5>-<partcount>`) | `d41d8cd98f00b204e9800998ecf8427e-3` |
|
||||
|
||||
AWS S3 SDK v2 for Java validates `PutObject` ETags as hexadecimal, which fails when the ETag contains a hyphen.
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| Test | File Size | Purpose |
|
||||
|------|-----------|---------|
|
||||
| `TestPutObjectETagFormat_SmallFile` | 1KB | Verify single-chunk uploads return pure MD5 |
|
||||
| `TestPutObjectETagFormat_LargeFile` | 10MB | **Critical**: Verify auto-chunked uploads return pure MD5 |
|
||||
| `TestPutObjectETagFormat_ExtraLargeFile` | 25MB | Verify multi-chunk auto-chunked uploads return pure MD5 |
|
||||
| `TestMultipartUploadETagFormat` | 15MB | Verify multipart uploads correctly return composite ETag |
|
||||
| `TestPutObjectETagConsistency` | Various | Verify ETag consistency across PUT/HEAD/GET |
|
||||
| `TestETagHexValidation` | 10MB | Simulate AWS SDK v2 hex validation |
|
||||
| `TestMultipleLargeFileUploads` | 10MB x5 | Stress test multiple large uploads |
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. SeaweedFS running with S3 API enabled:
|
||||
```bash
|
||||
weed server -s3
|
||||
```
|
||||
|
||||
2. Go 1.21 or later
|
||||
|
||||
3. AWS SDK v2 for Go (installed via go modules)
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
make test
|
||||
|
||||
# Run only large file tests (the critical ones for issue #7768)
|
||||
make test-large
|
||||
|
||||
# Run quick tests (small files only)
|
||||
make test-quick
|
||||
|
||||
# Run with verbose output
|
||||
make test-verbose
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
By default, tests connect to `http://127.0.0.1:8333`. To use a different endpoint:
|
||||
|
||||
```bash
|
||||
S3_ENDPOINT=http://localhost:8333 make test
|
||||
```
|
||||
|
||||
Or modify `defaultConfig` in `s3_etag_test.go`.
|
||||
|
||||
## SDK Compatibility
|
||||
|
||||
These tests use **AWS SDK v2 for Go**, which has the same ETag validation behavior as AWS SDK v2 for Java. The tests include:
|
||||
|
||||
- ETag format validation (pure MD5 vs composite)
|
||||
- Hex decoding validation (simulates `Base16Codec.decode`)
|
||||
- Content integrity verification
|
||||
|
||||
## Validated SDK Versions
|
||||
|
||||
| SDK | Version | Status |
|
||||
|-----|---------|--------|
|
||||
| AWS SDK v2 for Go | 1.20+ | ✅ Tested |
|
||||
| AWS SDK v2 for Java | 2.20+ | ✅ Compatible (issue #7768 fixed) |
|
||||
| AWS SDK v1 for Go | 1.x | ✅ Compatible (less strict validation) |
|
||||
| AWS SDK v1 for Java | 1.x | ✅ Compatible (less strict validation) |
|
||||
|
||||
## Related
|
||||
|
||||
- [GitHub Issue #7768](https://github.com/seaweedfs/seaweedfs/issues/7768)
|
||||
- [AWS S3 ETag Documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html)
|
||||
|
||||
543
test/s3/etag/s3_etag_test.go
Normal file
543
test/s3/etag/s3_etag_test.go
Normal file
@@ -0,0 +1,543 @@
|
||||
// Package etag_test provides integration tests for S3 ETag format validation.
|
||||
//
|
||||
// These tests verify that SeaweedFS returns correct ETag formats for different
|
||||
// upload scenarios, ensuring compatibility with AWS S3 SDKs that validate ETags.
|
||||
//
|
||||
// Background (GitHub Issue #7768):
|
||||
// AWS S3 SDK for Java v2 validates ETags as hexadecimal MD5 hashes for PutObject
|
||||
// responses. SeaweedFS was incorrectly returning composite ETags ("<md5>-<count>")
|
||||
// for regular PutObject when files were internally auto-chunked (>8MB), causing
|
||||
// the SDK to fail with "Invalid base 16 character: '-'".
|
||||
//
|
||||
// Per AWS S3 specification:
|
||||
// - Regular PutObject: ETag is always a pure MD5 hex string (32 chars)
|
||||
// - Multipart Upload (CompleteMultipartUpload): ETag is "<md5>-<partcount>"
|
||||
//
|
||||
// These tests ensure this behavior is maintained.
|
||||
package etag_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"crypto/md5"
|
||||
"crypto/rand"
|
||||
"encoding/hex"
|
||||
"fmt"
|
||||
"io"
|
||||
mathrand "math/rand"
|
||||
"regexp"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/aws/aws-sdk-go-v2/aws"
|
||||
"github.com/aws/aws-sdk-go-v2/config"
|
||||
"github.com/aws/aws-sdk-go-v2/credentials"
|
||||
"github.com/aws/aws-sdk-go-v2/service/s3"
|
||||
"github.com/aws/aws-sdk-go-v2/service/s3/types"
|
||||
"github.com/stretchr/testify/assert"
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// S3TestConfig holds configuration for S3 tests
|
||||
type S3TestConfig struct {
|
||||
Endpoint string
|
||||
AccessKey string
|
||||
SecretKey string
|
||||
Region string
|
||||
BucketPrefix string
|
||||
}
|
||||
|
||||
// Default test configuration
|
||||
var defaultConfig = &S3TestConfig{
|
||||
Endpoint: "http://127.0.0.1:8333",
|
||||
AccessKey: "some_access_key1",
|
||||
SecretKey: "some_secret_key1",
|
||||
Region: "us-east-1",
|
||||
BucketPrefix: "test-etag-",
|
||||
}
|
||||
|
||||
// Constants for auto-chunking thresholds (must match s3api_object_handlers_put.go)
|
||||
const (
|
||||
// SeaweedFS auto-chunks files larger than 8MB
|
||||
autoChunkSize = 8 * 1024 * 1024
|
||||
|
||||
// Test sizes
|
||||
smallFileSize = 1 * 1024 // 1KB - single chunk
|
||||
mediumFileSize = 256 * 1024 // 256KB - single chunk (at threshold)
|
||||
largeFileSize = 10 * 1024 * 1024 // 10MB - triggers auto-chunking (2 chunks)
|
||||
xlFileSize = 25 * 1024 * 1024 // 25MB - triggers auto-chunking (4 chunks)
|
||||
multipartSize = 5 * 1024 * 1024 // 5MB per part for multipart uploads
|
||||
)
|
||||
|
||||
// ETag format patterns
|
||||
var (
|
||||
// Pure MD5 ETag: 32 hex characters (with or without quotes)
|
||||
pureMD5Pattern = regexp.MustCompile(`^"?[a-f0-9]{32}"?$`)
|
||||
|
||||
// Composite ETag for multipart: 32 hex chars, hyphen, part count (with or without quotes)
|
||||
compositePattern = regexp.MustCompile(`^"?[a-f0-9]{32}-\d+"?$`)
|
||||
)
|
||||
|
||||
func init() {
|
||||
mathrand.Seed(time.Now().UnixNano())
|
||||
}
|
||||
|
||||
// getS3Client creates an AWS S3 v2 client for testing
|
||||
func getS3Client(t *testing.T) *s3.Client {
|
||||
cfg, err := config.LoadDefaultConfig(context.TODO(),
|
||||
config.WithRegion(defaultConfig.Region),
|
||||
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
|
||||
defaultConfig.AccessKey,
|
||||
defaultConfig.SecretKey,
|
||||
"",
|
||||
)),
|
||||
config.WithEndpointResolverWithOptions(aws.EndpointResolverWithOptionsFunc(
|
||||
func(service, region string, options ...interface{}) (aws.Endpoint, error) {
|
||||
return aws.Endpoint{
|
||||
URL: defaultConfig.Endpoint,
|
||||
SigningRegion: defaultConfig.Region,
|
||||
HostnameImmutable: true,
|
||||
}, nil
|
||||
})),
|
||||
)
|
||||
require.NoError(t, err)
|
||||
|
||||
return s3.NewFromConfig(cfg, func(o *s3.Options) {
|
||||
o.UsePathStyle = true
|
||||
})
|
||||
}
|
||||
|
||||
// getNewBucketName generates a unique bucket name
|
||||
func getNewBucketName() string {
|
||||
timestamp := time.Now().UnixNano()
|
||||
randomSuffix := mathrand.Intn(100000)
|
||||
return fmt.Sprintf("%s%d-%d", defaultConfig.BucketPrefix, timestamp, randomSuffix)
|
||||
}
|
||||
|
||||
// generateRandomData generates random test data of specified size
|
||||
func generateRandomData(size int) []byte {
|
||||
data := make([]byte, size)
|
||||
if _, err := rand.Read(data); err != nil {
|
||||
panic(fmt.Sprintf("failed to generate random test data: %v", err))
|
||||
}
|
||||
return data
|
||||
}
|
||||
|
||||
// calculateMD5 calculates the MD5 hash of data and returns hex string
|
||||
func calculateMD5(data []byte) string {
|
||||
hash := md5.Sum(data)
|
||||
return hex.EncodeToString(hash[:])
|
||||
}
|
||||
|
||||
// cleanETag removes quotes from ETag
|
||||
func cleanETag(etag string) string {
|
||||
return strings.Trim(etag, `"`)
|
||||
}
|
||||
|
||||
// isPureMD5ETag checks if ETag is a pure MD5 hex string (no composite format)
|
||||
func isPureMD5ETag(etag string) bool {
|
||||
return pureMD5Pattern.MatchString(etag)
|
||||
}
|
||||
|
||||
// isCompositeETag checks if ETag is in composite format (md5-count)
|
||||
func isCompositeETag(etag string) bool {
|
||||
return compositePattern.MatchString(etag)
|
||||
}
|
||||
|
||||
// createTestBucket creates a new bucket for testing
|
||||
func createTestBucket(ctx context.Context, client *s3.Client, bucketName string) error {
|
||||
_, err := client.CreateBucket(ctx, &s3.CreateBucketInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// cleanupTestBucket deletes all objects and the bucket
|
||||
func cleanupTestBucket(ctx context.Context, client *s3.Client, bucketName string) {
|
||||
// Delete all objects
|
||||
paginator := s3.NewListObjectsV2Paginator(client, &s3.ListObjectsV2Input{
|
||||
Bucket: aws.String(bucketName),
|
||||
})
|
||||
|
||||
for paginator.HasMorePages() {
|
||||
page, err := paginator.NextPage(ctx)
|
||||
if err != nil {
|
||||
break
|
||||
}
|
||||
for _, obj := range page.Contents {
|
||||
client.DeleteObject(ctx, &s3.DeleteObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: obj.Key,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Abort any in-progress multipart uploads
|
||||
mpPaginator := s3.NewListMultipartUploadsPaginator(client, &s3.ListMultipartUploadsInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
})
|
||||
for mpPaginator.HasMorePages() {
|
||||
page, err := mpPaginator.NextPage(ctx)
|
||||
if err != nil {
|
||||
break
|
||||
}
|
||||
for _, upload := range page.Uploads {
|
||||
client.AbortMultipartUpload(ctx, &s3.AbortMultipartUploadInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: upload.Key,
|
||||
UploadId: upload.UploadId,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Delete bucket
|
||||
client.DeleteBucket(ctx, &s3.DeleteBucketInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
})
|
||||
}
|
||||
|
||||
// TestPutObjectETagFormat_SmallFile verifies ETag format for small files (single chunk)
|
||||
func TestPutObjectETagFormat_SmallFile(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
testData := generateRandomData(smallFileSize)
|
||||
expectedMD5 := calculateMD5(testData)
|
||||
objectKey := "small-file.bin"
|
||||
|
||||
// Upload small file
|
||||
putResp, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
require.NoError(t, err, "Failed to upload small file")
|
||||
|
||||
// Verify ETag format
|
||||
etag := aws.ToString(putResp.ETag)
|
||||
t.Logf("Small file (%d bytes) ETag: %s", smallFileSize, etag)
|
||||
|
||||
assert.True(t, isPureMD5ETag(etag),
|
||||
"Small file ETag should be pure MD5, got: %s", etag)
|
||||
assert.False(t, isCompositeETag(etag),
|
||||
"Small file ETag should NOT be composite format, got: %s", etag)
|
||||
assert.Equal(t, expectedMD5, cleanETag(etag),
|
||||
"ETag should match calculated MD5")
|
||||
}
|
||||
|
||||
// TestPutObjectETagFormat_LargeFile verifies ETag format for large files that trigger auto-chunking
|
||||
// This is the critical test for GitHub Issue #7768
|
||||
func TestPutObjectETagFormat_LargeFile(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
testData := generateRandomData(largeFileSize)
|
||||
expectedMD5 := calculateMD5(testData)
|
||||
objectKey := "large-file.bin"
|
||||
|
||||
t.Logf("Uploading large file (%d bytes, > %d byte auto-chunk threshold)...",
|
||||
largeFileSize, autoChunkSize)
|
||||
|
||||
// Upload large file (triggers auto-chunking internally)
|
||||
putResp, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
require.NoError(t, err, "Failed to upload large file")
|
||||
|
||||
// Verify ETag format - MUST be pure MD5, NOT composite
|
||||
etag := aws.ToString(putResp.ETag)
|
||||
t.Logf("Large file (%d bytes, ~%d internal chunks) ETag: %s",
|
||||
largeFileSize, (largeFileSize/autoChunkSize)+1, etag)
|
||||
|
||||
assert.True(t, isPureMD5ETag(etag),
|
||||
"Large file PutObject ETag MUST be pure MD5 (not composite), got: %s", etag)
|
||||
assert.False(t, isCompositeETag(etag),
|
||||
"Large file PutObject ETag should NOT contain '-' (composite format), got: %s", etag)
|
||||
assert.False(t, strings.Contains(cleanETag(etag), "-"),
|
||||
"ETag should not contain hyphen for regular PutObject, got: %s", etag)
|
||||
assert.Equal(t, expectedMD5, cleanETag(etag),
|
||||
"ETag should match calculated MD5 of entire content")
|
||||
|
||||
// Verify we can read back the object correctly
|
||||
getResp, err := client.GetObject(ctx, &s3.GetObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
})
|
||||
require.NoError(t, err, "Failed to get large file")
|
||||
defer getResp.Body.Close()
|
||||
|
||||
downloadedData, err := io.ReadAll(getResp.Body)
|
||||
require.NoError(t, err, "Failed to read large file content")
|
||||
assert.Equal(t, testData, downloadedData, "Downloaded content should match uploaded content")
|
||||
}
|
||||
|
||||
// TestPutObjectETagFormat_ExtraLargeFile tests even larger files with multiple internal chunks
|
||||
func TestPutObjectETagFormat_ExtraLargeFile(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
testData := generateRandomData(xlFileSize)
|
||||
expectedMD5 := calculateMD5(testData)
|
||||
objectKey := "xl-file.bin"
|
||||
|
||||
expectedChunks := (xlFileSize / autoChunkSize) + 1
|
||||
t.Logf("Uploading XL file (%d bytes, expected ~%d internal chunks)...",
|
||||
xlFileSize, expectedChunks)
|
||||
|
||||
// Upload extra large file
|
||||
putResp, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
require.NoError(t, err, "Failed to upload XL file")
|
||||
|
||||
// Verify ETag format
|
||||
etag := aws.ToString(putResp.ETag)
|
||||
t.Logf("XL file (%d bytes) ETag: %s", xlFileSize, etag)
|
||||
|
||||
assert.True(t, isPureMD5ETag(etag),
|
||||
"XL file PutObject ETag MUST be pure MD5, got: %s", etag)
|
||||
assert.False(t, isCompositeETag(etag),
|
||||
"XL file PutObject ETag should NOT be composite, got: %s", etag)
|
||||
assert.Equal(t, expectedMD5, cleanETag(etag),
|
||||
"ETag should match calculated MD5")
|
||||
}
|
||||
|
||||
// TestMultipartUploadETagFormat verifies that ONLY multipart uploads get composite ETags
|
||||
func TestMultipartUploadETagFormat(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
// Create test data for multipart upload (15MB = 3 parts of 5MB each)
|
||||
totalSize := 15 * 1024 * 1024
|
||||
testData := generateRandomData(totalSize)
|
||||
objectKey := "multipart-file.bin"
|
||||
|
||||
expectedPartCount := (totalSize + multipartSize - 1) / multipartSize // ceiling division
|
||||
t.Logf("Performing multipart upload (%d bytes, %d parts)...",
|
||||
totalSize, expectedPartCount)
|
||||
|
||||
// Initiate multipart upload
|
||||
createResp, err := client.CreateMultipartUpload(ctx, &s3.CreateMultipartUploadInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
})
|
||||
require.NoError(t, err, "Failed to create multipart upload")
|
||||
|
||||
uploadId := createResp.UploadId
|
||||
var completedParts []types.CompletedPart
|
||||
partNumber := int32(1)
|
||||
|
||||
// Upload parts
|
||||
for offset := 0; offset < totalSize; offset += multipartSize {
|
||||
end := offset + multipartSize
|
||||
if end > totalSize {
|
||||
end = totalSize
|
||||
}
|
||||
partData := testData[offset:end]
|
||||
|
||||
uploadResp, err := client.UploadPart(ctx, &s3.UploadPartInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
UploadId: uploadId,
|
||||
PartNumber: aws.Int32(partNumber),
|
||||
Body: bytes.NewReader(partData),
|
||||
})
|
||||
require.NoError(t, err, "Failed to upload part %d", partNumber)
|
||||
|
||||
completedParts = append(completedParts, types.CompletedPart{
|
||||
ETag: uploadResp.ETag,
|
||||
PartNumber: aws.Int32(partNumber),
|
||||
})
|
||||
partNumber++
|
||||
}
|
||||
|
||||
// Complete multipart upload
|
||||
completeResp, err := client.CompleteMultipartUpload(ctx, &s3.CompleteMultipartUploadInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
UploadId: uploadId,
|
||||
MultipartUpload: &types.CompletedMultipartUpload{
|
||||
Parts: completedParts,
|
||||
},
|
||||
})
|
||||
require.NoError(t, err, "Failed to complete multipart upload")
|
||||
|
||||
// Verify ETag format - SHOULD be composite for multipart
|
||||
etag := aws.ToString(completeResp.ETag)
|
||||
t.Logf("Multipart upload ETag: %s", etag)
|
||||
|
||||
assert.True(t, isCompositeETag(etag),
|
||||
"Multipart upload ETag SHOULD be composite format (md5-count), got: %s", etag)
|
||||
assert.True(t, strings.Contains(cleanETag(etag), "-"),
|
||||
"Multipart ETag should contain hyphen, got: %s", etag)
|
||||
|
||||
// Verify the part count in the ETag matches
|
||||
parts := strings.Split(cleanETag(etag), "-")
|
||||
require.Len(t, parts, 2, "Composite ETag should have format 'hash-count'")
|
||||
assert.Equal(t, fmt.Sprintf("%d", len(completedParts)), parts[1],
|
||||
"Part count in ETag should match number of parts uploaded")
|
||||
}
|
||||
|
||||
// TestPutObjectETagConsistency verifies ETag consistency between PUT and GET
|
||||
func TestPutObjectETagConsistency(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
testCases := []struct {
|
||||
name string
|
||||
size int
|
||||
}{
|
||||
{"tiny", 100},
|
||||
{"small", smallFileSize},
|
||||
{"medium", mediumFileSize},
|
||||
{"large", largeFileSize},
|
||||
}
|
||||
|
||||
for _, tc := range testCases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
testData := generateRandomData(tc.size)
|
||||
objectKey := fmt.Sprintf("consistency-test-%s.bin", tc.name)
|
||||
|
||||
// PUT object
|
||||
putResp, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
require.NoError(t, err)
|
||||
putETag := aws.ToString(putResp.ETag)
|
||||
|
||||
// HEAD object
|
||||
headResp, err := client.HeadObject(ctx, &s3.HeadObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
})
|
||||
require.NoError(t, err)
|
||||
headETag := aws.ToString(headResp.ETag)
|
||||
|
||||
// GET object
|
||||
getResp, err := client.GetObject(ctx, &s3.GetObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
})
|
||||
require.NoError(t, err)
|
||||
getETag := aws.ToString(getResp.ETag)
|
||||
getResp.Body.Close()
|
||||
|
||||
// All ETags should match
|
||||
t.Logf("%s (%d bytes): PUT=%s, HEAD=%s, GET=%s",
|
||||
tc.name, tc.size, putETag, headETag, getETag)
|
||||
|
||||
assert.Equal(t, putETag, headETag,
|
||||
"PUT and HEAD ETags should match")
|
||||
assert.Equal(t, putETag, getETag,
|
||||
"PUT and GET ETags should match")
|
||||
|
||||
// All should be pure MD5 (not composite) for regular PutObject
|
||||
assert.True(t, isPureMD5ETag(putETag),
|
||||
"PutObject ETag should be pure MD5, got: %s", putETag)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestETagHexValidation simulates the AWS SDK v2 validation that caused issue #7768
|
||||
func TestETagHexValidation(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
// Test with a file large enough to trigger auto-chunking
|
||||
testData := generateRandomData(largeFileSize)
|
||||
objectKey := "hex-validation-test.bin"
|
||||
|
||||
putResp, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
require.NoError(t, err)
|
||||
|
||||
etag := cleanETag(aws.ToString(putResp.ETag))
|
||||
|
||||
// Simulate AWS SDK v2's hex validation (Base16Codec.decode)
|
||||
// This is what fails in issue #7768 when ETag contains '-'
|
||||
t.Logf("Validating ETag as hex: %s", etag)
|
||||
|
||||
_, err = hex.DecodeString(etag)
|
||||
assert.NoError(t, err,
|
||||
"ETag should be valid hexadecimal (AWS SDK v2 validation). "+
|
||||
"Got ETag: %s. If this fails with 'invalid byte', the ETag contains non-hex chars like '-'",
|
||||
etag)
|
||||
}
|
||||
|
||||
// TestMultipleLargeFileUploads verifies ETag format across multiple large uploads
|
||||
func TestMultipleLargeFileUploads(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
client := getS3Client(t)
|
||||
|
||||
bucketName := getNewBucketName()
|
||||
err := createTestBucket(ctx, client, bucketName)
|
||||
require.NoError(t, err, "Failed to create test bucket")
|
||||
defer cleanupTestBucket(ctx, client, bucketName)
|
||||
|
||||
numFiles := 5
|
||||
for i := 0; i < numFiles; i++ {
|
||||
testData := generateRandomData(largeFileSize)
|
||||
expectedMD5 := calculateMD5(testData)
|
||||
objectKey := fmt.Sprintf("large-file-%d.bin", i)
|
||||
|
||||
putResp, err := client.PutObject(ctx, &s3.PutObjectInput{
|
||||
Bucket: aws.String(bucketName),
|
||||
Key: aws.String(objectKey),
|
||||
Body: bytes.NewReader(testData),
|
||||
})
|
||||
require.NoError(t, err, "Failed to upload file %d", i)
|
||||
|
||||
etag := aws.ToString(putResp.ETag)
|
||||
t.Logf("File %d ETag: %s (expected MD5: %s)", i, etag, expectedMD5)
|
||||
|
||||
assert.True(t, isPureMD5ETag(etag),
|
||||
"File %d ETag should be pure MD5, got: %s", i, etag)
|
||||
assert.Equal(t, expectedMD5, cleanETag(etag),
|
||||
"File %d ETag should match MD5", i)
|
||||
|
||||
// Validate as hex (AWS SDK v2 check)
|
||||
_, err = hex.DecodeString(cleanETag(etag))
|
||||
assert.NoError(t, err, "File %d ETag should be valid hex", i)
|
||||
}
|
||||
}
|
||||
|
||||
19
test/s3/etag/test_config.json
Normal file
19
test/s3/etag/test_config.json
Normal file
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"endpoint": "http://127.0.0.1:8333",
|
||||
"access_key": "some_access_key1",
|
||||
"secret_key": "some_secret_key1",
|
||||
"region": "us-east-1",
|
||||
"bucket_prefix": "test-etag-",
|
||||
"notes": {
|
||||
"description": "S3 ETag format integration tests",
|
||||
"issue": "https://github.com/seaweedfs/seaweedfs/issues/7768",
|
||||
"auto_chunk_size": "8MB - files larger than this trigger auto-chunking",
|
||||
"test_sizes": {
|
||||
"small": "1KB - single chunk",
|
||||
"medium": "256KB - at inline threshold",
|
||||
"large": "10MB - 2 internal chunks",
|
||||
"xl": "25MB - 4 internal chunks"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user