Files
seaweedFS/other/java/s3copier
Chris Lu 44beb42eb9 s3: fix PutObject ETag format for multi-chunk uploads (#7771)
* s3: fix PutObject ETag format for multi-chunk uploads

Fix issue #7768: AWS S3 SDK for Java fails with 'Invalid base 16
character: -' when performing PutObject on files that are internally
auto-chunked.

The issue was that SeaweedFS returned a composite ETag format
(<md5hash>-<count>) for regular PutObject when the file was split
into multiple chunks due to auto-chunking. However, per AWS S3 spec,
the composite ETag format should only be used for multipart uploads
(CreateMultipartUpload/UploadPart/CompleteMultipartUpload API).

Regular PutObject should always return a pure MD5 hash as the ETag,
regardless of how the file is stored internally.

The fix ensures the MD5 hash is always stored in entry.Attributes.Md5
for regular PutObject operations, so filer.ETag() returns the pure
MD5 hash instead of falling back to ETagChunks() composite format.

* test: add comprehensive ETag format tests for issue #7768

Add integration tests to ensure PutObject ETag format compatibility:

Go tests (test/s3/etag/):
- TestPutObjectETagFormat_SmallFile: 1KB single chunk
- TestPutObjectETagFormat_LargeFile: 10MB auto-chunked (critical for #7768)
- TestPutObjectETagFormat_ExtraLargeFile: 25MB multi-chunk
- TestMultipartUploadETagFormat: verify composite ETag for multipart
- TestPutObjectETagConsistency: ETag consistency across PUT/HEAD/GET
- TestETagHexValidation: simulate AWS SDK v2 hex decoding
- TestMultipleLargeFileUploads: stress test multiple large uploads

Java tests (other/java/s3copier/):
- Update pom.xml to include AWS SDK v2 (2.20.127)
- Add ETagValidationTest.java with comprehensive SDK v2 tests
- Add README.md documenting SDK versions and test coverage

Documentation:
- Add test/s3/SDK_COMPATIBILITY.md documenting validated SDK versions
- Add test/s3/etag/README.md explaining test coverage

These tests ensure large file PutObject (>8MB) returns pure MD5 ETags
(not composite format), which is required for AWS SDK v2 compatibility.

* fix: lower Java version requirement to 11 for CI compatibility

* address CodeRabbit review comments

- s3_etag_test.go: Handle rand.Read error, fix multipart part-count logging
- Makefile: Add 'all' target, pass S3_ENDPOINT to test commands
- SDK_COMPATIBILITY.md: Add language tag to fenced code block
- ETagValidationTest.java: Add pagination to cleanup logic
- README.md: Clarify Go SDK tests are in separate location

* ci: add s3copier ETag validation tests to Java integration tests

- Enable S3 API (-s3 -s3.port=8333) in SeaweedFS test server
- Add S3 API readiness check to wait loop
- Add step to run ETagValidationTest from s3copier

This ensures the fix for issue #7768 is continuously tested
against AWS SDK v2 for Java in CI.

* ci: add S3 config with credentials for s3copier tests

- Add -s3.config pointing to docker/compose/s3.json
- Add -s3.allowDeleteBucketNotEmpty for test cleanup
- Set S3_ACCESS_KEY and S3_SECRET_KEY env vars for tests

* ci: pass S3 config as Maven system properties

Pass S3_ENDPOINT, S3_ACCESS_KEY, S3_SECRET_KEY via -D flags
so they're available via System.getProperty() in Java tests
2025-12-15 12:43:33 -08:00
..

SeaweedFS S3 Java SDK Compatibility Tests

This project contains Java-based integration tests for SeaweedFS S3 API compatibility.

Overview

Tests are provided for both AWS SDK v1 and v2 to ensure compatibility with the various SDK versions commonly used in production.

SDK Versions

SDK Version Notes
AWS SDK v1 for Java 1.12.600 Legacy SDK, less strict ETag validation
AWS SDK v2 for Java 2.20.127 Modern SDK with strict checksum validation

Running Tests

Prerequisites

  1. SeaweedFS running with S3 API enabled:

    weed server -s3
    
  2. Java 18+ and Maven

Run All Tests

mvn test

Run Specific Tests

# Run only ETag validation tests (AWS SDK v2)
mvn test -Dtest=ETagValidationTest

# Run with custom endpoint
mvn test -Dtest=ETagValidationTest -DS3_ENDPOINT=http://localhost:8333

Environment Variables

Variable Default Description
S3_ENDPOINT http://127.0.0.1:8333 S3 API endpoint URL
S3_ACCESS_KEY some_access_key1 Access key ID
S3_SECRET_KEY some_secret_key1 Secret access key
S3_REGION us-east-1 AWS region

Test Coverage

ETagValidationTest (AWS SDK v2)

Tests for GitHub Issue #7768 - ETag format validation.

Test Description
testSmallFilePutObject Verify small files return pure MD5 ETag
testLargeFilePutObject_Issue7768 Critical: Verify large files (>8MB) return pure MD5 ETag
testExtraLargeFilePutObject Verify very large files (>24MB) return pure MD5 ETag
testMultipartUploadETag Verify multipart uploads return composite ETag
testETagConsistency Verify ETag consistency across PUT/HEAD/GET
testMultipleLargeFileUploads Stress test multiple large uploads

Background: Issue #7768

AWS SDK v2 for Java includes checksum validation that decodes the ETag as hexadecimal. When SeaweedFS returned composite ETags (<md5>-<count>) for regular PutObject with internally auto-chunked files, the SDK failed with:

java.lang.IllegalArgumentException: Invalid base 16 character: '-'

Per AWS S3 specification:

  • PutObject: ETag is always a pure MD5 hex string (32 chars)
  • CompleteMultipartUpload: ETag is composite format (<md5>-<partcount>)

The fix ensures SeaweedFS follows this specification.

Project Structure

src/
├── main/java/com/seaweedfs/s3/
│   ├── PutObject.java           # Example PutObject with SDK v1
│   └── HighLevelMultipartUpload.java
└── test/java/com/seaweedfs/s3/
    ├── PutObjectTest.java       # Basic SDK v1 test
    └── ETagValidationTest.java  # Comprehensive SDK v2 ETag tests

Validated SDK Versions

This Java test project validates:

  • AWS SDK v2 for Java 2.20.127+
  • AWS SDK v1 for Java 1.12.600+

Go SDK validation is performed by separate test suites: