s3: fix PutObject ETag format for multi-chunk uploads (#7771)
* s3: fix PutObject ETag format for multi-chunk uploads Fix issue #7768: AWS S3 SDK for Java fails with 'Invalid base 16 character: -' when performing PutObject on files that are internally auto-chunked. The issue was that SeaweedFS returned a composite ETag format (<md5hash>-<count>) for regular PutObject when the file was split into multiple chunks due to auto-chunking. However, per AWS S3 spec, the composite ETag format should only be used for multipart uploads (CreateMultipartUpload/UploadPart/CompleteMultipartUpload API). Regular PutObject should always return a pure MD5 hash as the ETag, regardless of how the file is stored internally. The fix ensures the MD5 hash is always stored in entry.Attributes.Md5 for regular PutObject operations, so filer.ETag() returns the pure MD5 hash instead of falling back to ETagChunks() composite format. * test: add comprehensive ETag format tests for issue #7768 Add integration tests to ensure PutObject ETag format compatibility: Go tests (test/s3/etag/): - TestPutObjectETagFormat_SmallFile: 1KB single chunk - TestPutObjectETagFormat_LargeFile: 10MB auto-chunked (critical for #7768) - TestPutObjectETagFormat_ExtraLargeFile: 25MB multi-chunk - TestMultipartUploadETagFormat: verify composite ETag for multipart - TestPutObjectETagConsistency: ETag consistency across PUT/HEAD/GET - TestETagHexValidation: simulate AWS SDK v2 hex decoding - TestMultipleLargeFileUploads: stress test multiple large uploads Java tests (other/java/s3copier/): - Update pom.xml to include AWS SDK v2 (2.20.127) - Add ETagValidationTest.java with comprehensive SDK v2 tests - Add README.md documenting SDK versions and test coverage Documentation: - Add test/s3/SDK_COMPATIBILITY.md documenting validated SDK versions - Add test/s3/etag/README.md explaining test coverage These tests ensure large file PutObject (>8MB) returns pure MD5 ETags (not composite format), which is required for AWS SDK v2 compatibility. * fix: lower Java version requirement to 11 for CI compatibility * address CodeRabbit review comments - s3_etag_test.go: Handle rand.Read error, fix multipart part-count logging - Makefile: Add 'all' target, pass S3_ENDPOINT to test commands - SDK_COMPATIBILITY.md: Add language tag to fenced code block - ETagValidationTest.java: Add pagination to cleanup logic - README.md: Clarify Go SDK tests are in separate location * ci: add s3copier ETag validation tests to Java integration tests - Enable S3 API (-s3 -s3.port=8333) in SeaweedFS test server - Add S3 API readiness check to wait loop - Add step to run ETagValidationTest from s3copier This ensures the fix for issue #7768 is continuously tested against AWS SDK v2 for Java in CI. * ci: add S3 config with credentials for s3copier tests - Add -s3.config pointing to docker/compose/s3.json - Add -s3.allowDeleteBucketNotEmpty for test cleanup - Set S3_ACCESS_KEY and S3_SECRET_KEY env vars for tests * ci: pass S3 config as Maven system properties Pass S3_ENDPOINT, S3_ACCESS_KEY, S3_SECRET_KEY via -D flags so they're available via System.getProperty() in Java tests
This commit is contained in:
126
test/s3/SDK_COMPATIBILITY.md
Normal file
126
test/s3/SDK_COMPATIBILITY.md
Normal file
@@ -0,0 +1,126 @@
|
||||
# S3 SDK Compatibility Testing
|
||||
|
||||
This document describes the SDK versions tested against SeaweedFS S3 API and known compatibility considerations.
|
||||
|
||||
## Validated SDK Versions
|
||||
|
||||
### Go SDKs
|
||||
|
||||
| SDK | Version | Test Location | Status |
|
||||
|-----|---------|---------------|--------|
|
||||
| AWS SDK v2 for Go | 1.20+ | `test/s3/etag/`, `test/s3/copying/` | ✅ Tested |
|
||||
| AWS SDK v1 for Go | 1.x | `test/s3/basic/` | ✅ Tested |
|
||||
|
||||
### Java SDKs
|
||||
|
||||
| SDK | Version | Test Location | Status |
|
||||
|-----|---------|---------------|--------|
|
||||
| AWS SDK v2 for Java | 2.20.127+ | `other/java/s3copier/` | ✅ Tested |
|
||||
| AWS SDK v1 for Java | 1.12.600+ | `other/java/s3copier/` | ✅ Tested |
|
||||
|
||||
### Python SDKs
|
||||
|
||||
| SDK | Version | Test Location | Status |
|
||||
|-----|---------|---------------|--------|
|
||||
| boto3 | 1.x | `test/s3/parquet/` | ✅ Tested |
|
||||
| PyArrow S3 | 14+ | `test/s3/parquet/` | ✅ Tested |
|
||||
|
||||
## SDK-Specific Considerations
|
||||
|
||||
### AWS SDK v2 for Java - ETag Validation
|
||||
|
||||
**Issue**: [GitHub #7768](https://github.com/seaweedfs/seaweedfs/issues/7768)
|
||||
|
||||
AWS SDK v2 for Java includes strict ETag validation in `ChecksumsEnabledValidator.validatePutObjectChecksum`. It decodes the ETag as a hexadecimal MD5 hash using `Base16Codec.decode()`.
|
||||
|
||||
**Impact**: If the ETag contains non-hexadecimal characters (like `-` in composite format), the SDK fails with:
|
||||
```text
|
||||
java.lang.IllegalArgumentException: Invalid base 16 character: '-'
|
||||
```
|
||||
|
||||
**Resolution**: SeaweedFS now correctly returns:
|
||||
- **PutObject**: Pure MD5 hex ETag (32 characters) regardless of internal chunking
|
||||
- **CompleteMultipartUpload**: Composite ETag (`<md5>-<partcount>`)
|
||||
|
||||
**Test Coverage**: `test/s3/etag/` and `other/java/s3copier/ETagValidationTest.java`
|
||||
|
||||
### AWS SDK v1 vs v2 Differences
|
||||
|
||||
| Feature | SDK v1 | SDK v2 |
|
||||
|---------|--------|--------|
|
||||
| ETag hex validation | No | Yes (strict) |
|
||||
| Checksum validation | Basic | Enhanced |
|
||||
| Async support | Limited | Full |
|
||||
| Default retry behavior | Lenient | Stricter |
|
||||
|
||||
### Large File Handling
|
||||
|
||||
SeaweedFS auto-chunks files larger than **8MB** for efficient storage. This is transparent to clients, but previously affected ETag format. The current implementation ensures:
|
||||
|
||||
1. Regular `PutObject` (any size): Returns pure MD5 ETag
|
||||
2. Multipart upload: Returns composite ETag per AWS S3 specification
|
||||
|
||||
## Test Categories by File Size
|
||||
|
||||
| Category | Size | Chunks | ETag Format |
|
||||
|----------|------|--------|-------------|
|
||||
| Small | < 256KB | 1 (inline) | Pure MD5 |
|
||||
| Medium | 256KB - 8MB | 1 | Pure MD5 |
|
||||
| Large | 8MB - 24MB | 2-3 | Pure MD5 |
|
||||
| Extra Large | > 24MB | 4+ | Pure MD5 |
|
||||
| Multipart | N/A | Per part | Composite |
|
||||
|
||||
## Running SDK Compatibility Tests
|
||||
|
||||
### Go Tests
|
||||
|
||||
```bash
|
||||
# Run all ETag tests
|
||||
cd test/s3/etag && make test
|
||||
|
||||
# Run large file tests only
|
||||
cd test/s3/etag && make test-large
|
||||
```
|
||||
|
||||
### Java Tests
|
||||
|
||||
```bash
|
||||
# Run all Java SDK tests
|
||||
cd other/java/s3copier && mvn test
|
||||
|
||||
# Run only ETag validation tests
|
||||
cd other/java/s3copier && mvn test -Dtest=ETagValidationTest
|
||||
```
|
||||
|
||||
### Python Tests
|
||||
|
||||
```bash
|
||||
# Run PyArrow S3 tests
|
||||
cd test/s3/parquet && make test
|
||||
```
|
||||
|
||||
## Adding New SDK Tests
|
||||
|
||||
When adding tests for new SDKs, ensure:
|
||||
|
||||
1. **Large file tests (>8MB)**: Critical for verifying ETag format with auto-chunking
|
||||
2. **Multipart upload tests**: Verify composite ETag format
|
||||
3. **Checksum validation**: Test SDK-specific checksum validation if applicable
|
||||
4. **Document SDK version**: Add to this compatibility matrix
|
||||
|
||||
## Known Issues and Workarounds
|
||||
|
||||
### Issue: Older SDK Versions
|
||||
|
||||
Some very old SDK versions (e.g., AWS SDK v1 for Java < 1.11.x) may have different behavior. Testing with the versions listed above is recommended.
|
||||
|
||||
### Issue: Custom Checksum Algorithms
|
||||
|
||||
AWS SDK v2 supports SHA-256 and CRC32 checksums in addition to MD5. SeaweedFS currently returns MD5-based ETags. For checksums other than MD5, use the `x-amz-checksum-*` headers.
|
||||
|
||||
## References
|
||||
|
||||
- [AWS S3 ETag Documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html)
|
||||
- [AWS SDK v2 Migration Guide](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration.html)
|
||||
- [GitHub Issue #7768](https://github.com/seaweedfs/seaweedfs/issues/7768)
|
||||
|
||||
Reference in New Issue
Block a user