Files
seaweedFS/test/s3/etag
Chris Lu 44beb42eb9 s3: fix PutObject ETag format for multi-chunk uploads (#7771)
* s3: fix PutObject ETag format for multi-chunk uploads

Fix issue #7768: AWS S3 SDK for Java fails with 'Invalid base 16
character: -' when performing PutObject on files that are internally
auto-chunked.

The issue was that SeaweedFS returned a composite ETag format
(<md5hash>-<count>) for regular PutObject when the file was split
into multiple chunks due to auto-chunking. However, per AWS S3 spec,
the composite ETag format should only be used for multipart uploads
(CreateMultipartUpload/UploadPart/CompleteMultipartUpload API).

Regular PutObject should always return a pure MD5 hash as the ETag,
regardless of how the file is stored internally.

The fix ensures the MD5 hash is always stored in entry.Attributes.Md5
for regular PutObject operations, so filer.ETag() returns the pure
MD5 hash instead of falling back to ETagChunks() composite format.

* test: add comprehensive ETag format tests for issue #7768

Add integration tests to ensure PutObject ETag format compatibility:

Go tests (test/s3/etag/):
- TestPutObjectETagFormat_SmallFile: 1KB single chunk
- TestPutObjectETagFormat_LargeFile: 10MB auto-chunked (critical for #7768)
- TestPutObjectETagFormat_ExtraLargeFile: 25MB multi-chunk
- TestMultipartUploadETagFormat: verify composite ETag for multipart
- TestPutObjectETagConsistency: ETag consistency across PUT/HEAD/GET
- TestETagHexValidation: simulate AWS SDK v2 hex decoding
- TestMultipleLargeFileUploads: stress test multiple large uploads

Java tests (other/java/s3copier/):
- Update pom.xml to include AWS SDK v2 (2.20.127)
- Add ETagValidationTest.java with comprehensive SDK v2 tests
- Add README.md documenting SDK versions and test coverage

Documentation:
- Add test/s3/SDK_COMPATIBILITY.md documenting validated SDK versions
- Add test/s3/etag/README.md explaining test coverage

These tests ensure large file PutObject (>8MB) returns pure MD5 ETags
(not composite format), which is required for AWS SDK v2 compatibility.

* fix: lower Java version requirement to 11 for CI compatibility

* address CodeRabbit review comments

- s3_etag_test.go: Handle rand.Read error, fix multipart part-count logging
- Makefile: Add 'all' target, pass S3_ENDPOINT to test commands
- SDK_COMPATIBILITY.md: Add language tag to fenced code block
- ETagValidationTest.java: Add pagination to cleanup logic
- README.md: Clarify Go SDK tests are in separate location

* ci: add s3copier ETag validation tests to Java integration tests

- Enable S3 API (-s3 -s3.port=8333) in SeaweedFS test server
- Add S3 API readiness check to wait loop
- Add step to run ETagValidationTest from s3copier

This ensures the fix for issue #7768 is continuously tested
against AWS SDK v2 for Java in CI.

* ci: add S3 config with credentials for s3copier tests

- Add -s3.config pointing to docker/compose/s3.json
- Add -s3.allowDeleteBucketNotEmpty for test cleanup
- Set S3_ACCESS_KEY and S3_SECRET_KEY env vars for tests

* ci: pass S3 config as Maven system properties

Pass S3_ENDPOINT, S3_ACCESS_KEY, S3_SECRET_KEY via -D flags
so they're available via System.getProperty() in Java tests
2025-12-15 12:43:33 -08:00
..

S3 ETag Format Integration Tests

This test suite verifies that SeaweedFS returns correct ETag formats for S3 operations, ensuring compatibility with AWS S3 SDKs.

Background

GitHub Issue #7768: AWS S3 SDK for Java v2 was failing with Invalid base 16 character: '-' when performing PutObject on large files.

Root Cause

SeaweedFS internally auto-chunks large files (>8MB) for efficient storage. Previously, when a regular PutObject request resulted in multiple internal chunks, SeaweedFS returned a composite ETag format (<md5>-<count>) instead of a pure MD5 hash.

AWS S3 Specification

Operation ETag Format Example
PutObject (any size) Pure MD5 hex (32 chars) d41d8cd98f00b204e9800998ecf8427e
CompleteMultipartUpload Composite (<md5>-<partcount>) d41d8cd98f00b204e9800998ecf8427e-3

AWS S3 SDK v2 for Java validates PutObject ETags as hexadecimal, which fails when the ETag contains a hyphen.

Test Coverage

Test File Size Purpose
TestPutObjectETagFormat_SmallFile 1KB Verify single-chunk uploads return pure MD5
TestPutObjectETagFormat_LargeFile 10MB Critical: Verify auto-chunked uploads return pure MD5
TestPutObjectETagFormat_ExtraLargeFile 25MB Verify multi-chunk auto-chunked uploads return pure MD5
TestMultipartUploadETagFormat 15MB Verify multipart uploads correctly return composite ETag
TestPutObjectETagConsistency Various Verify ETag consistency across PUT/HEAD/GET
TestETagHexValidation 10MB Simulate AWS SDK v2 hex validation
TestMultipleLargeFileUploads 10MB x5 Stress test multiple large uploads

Prerequisites

  1. SeaweedFS running with S3 API enabled:

    weed server -s3
    
  2. Go 1.21 or later

  3. AWS SDK v2 for Go (installed via go modules)

Running Tests

# Run all tests
make test

# Run only large file tests (the critical ones for issue #7768)
make test-large

# Run quick tests (small files only)
make test-quick

# Run with verbose output
make test-verbose

Configuration

By default, tests connect to http://127.0.0.1:8333. To use a different endpoint:

S3_ENDPOINT=http://localhost:8333 make test

Or modify defaultConfig in s3_etag_test.go.

SDK Compatibility

These tests use AWS SDK v2 for Go, which has the same ETag validation behavior as AWS SDK v2 for Java. The tests include:

  • ETag format validation (pure MD5 vs composite)
  • Hex decoding validation (simulates Base16Codec.decode)
  • Content integrity verification

Validated SDK Versions

SDK Version Status
AWS SDK v2 for Go 1.20+ Tested
AWS SDK v2 for Java 2.20+ Compatible (issue #7768 fixed)
AWS SDK v1 for Go 1.x Compatible (less strict validation)
AWS SDK v1 for Java 1.x Compatible (less strict validation)