Files
seaweedFS/weed/s3api/s3_constants/header.go
Chris Lu 2f6aa98221 Refactor: Replace removeDuplicateSlashes with NormalizeObjectKey (#7873)
* Replace removeDuplicateSlashes with NormalizeObjectKey

Use s3_constants.NormalizeObjectKey instead of removeDuplicateSlashes in most places
for consistency. NormalizeObjectKey handles both duplicate slash removal and ensures
the path starts with '/', providing more complete normalization.

* Fix double slash issues after NormalizeObjectKey

After using NormalizeObjectKey, object keys have a leading '/'. This commit ensures:
- getVersionedObjectDir strips leading slash before concatenation
- getEntry calls receive names without leading slash
- String concatenation with '/' doesn't create '//' paths

This prevents path construction errors like:
  /buckets/bucket//object  (wrong)
  /buckets/bucket/object   (correct)

* ensure object key leading "/"

* fix compilation

* fix: Strip leading slash from object keys in S3 API responses

After introducing NormalizeObjectKey, all internal object keys have a
leading slash. However, S3 API responses must return keys without
leading slashes to match AWS S3 behavior.

Fixed in three functions:
- addVersion: Strip slash for version list entries
- processRegularFile: Strip slash for regular file entries
- processExplicitDirectory: Strip slash for directory entries

This ensures ListObjectVersions and similar APIs return keys like
'bar' instead of '/bar', matching S3 API specifications.

* fix: Normalize keyMarker for consistent pagination comparison

The S3 API provides keyMarker without a leading slash (e.g., 'object-001'),
but after introducing NormalizeObjectKey, all internal object keys have
leading slashes (e.g., '/object-001').

When comparing keyMarker < normalizedObjectKey in shouldSkipObjectForMarker,
the ASCII value of '/' (47) is less than 'o' (111), causing all objects
to be incorrectly skipped during pagination. This resulted in page 2 and
beyond returning 0 results.

Fix: Normalize the keyMarker when creating versionCollector so comparisons
work correctly with normalized object keys.

Fixes pagination tests:
- TestVersioningPaginationOver1000Versions
- TestVersioningPaginationMultipleObjectsManyVersions

* refactor: Change NormalizeObjectKey to return keys without leading slash

BREAKING STRATEGY CHANGE:
Previously, NormalizeObjectKey added a leading slash to all object keys,
which required stripping it when returning keys to S3 API clients and
caused complexity in marker normalization for pagination.

NEW STRATEGY:
- NormalizeObjectKey now returns keys WITHOUT leading slash (e.g., 'foo/bar' not '/foo/bar')
- This matches the S3 API format directly
- All path concatenations now explicitly add '/' between bucket and object
- No need to strip slashes in responses or normalize markers

Changes:
1. Modified NormalizeObjectKey to strip leading slash instead of adding it
2. Fixed all path concatenations to use:
   - BucketsPath + '/' + bucket + '/' + object
   instead of:
   - BucketsPath + '/' + bucket + object
3. Reverted response key stripping in:
   - addVersion()
   - processRegularFile()
   - processExplicitDirectory()
4. Reverted keyMarker normalization in findVersionsRecursively()
5. Updated matchesPrefixFilter() to work with keys without leading slash
6. Fixed paths in handlers:
   - s3api_object_handlers.go (GetObject, HeadObject, cacheRemoteObjectForStreaming)
   - s3api_object_handlers_postpolicy.go
   - s3api_object_handlers_tagging.go
   - s3api_object_handlers_acl.go
   - s3api_version_id.go (getVersionedObjectDir, getVersionIdFormat)
   - s3api_object_versioning.go (getObjectVersionList, updateLatestVersionAfterDeletion)

All versioning tests pass including pagination stress tests.

* adjust format

* Update post policy tests to match new NormalizeObjectKey behavior

- Update TestPostPolicyKeyNormalization to expect keys without leading slashes
- Update TestNormalizeObjectKey to expect keys without leading slashes
- Update TestPostPolicyFilenameSubstitution to expect keys without leading slashes
- Update path construction in tests to use new pattern: BucketsPath + '/' + bucket + '/' + object

* Fix ListObjectVersions prefix filtering

Remove leading slash addition to prefix parameter to allow correct filtering
of .versions directories when listing object versions with a specific prefix.

The prefix parameter should match entry paths relative to bucket root.
Adding a leading slash was breaking the prefix filter for paginated requests.

Fixes pagination issue where second page returned 0 versions instead of
continuing with remaining versions.

* no leading slash

* Fix urlEscapeObject to add leading slash for filer paths

NormalizeObjectKey now returns keys without leading slashes to match S3 API format.
However, urlEscapeObject is used for filer paths which require leading slashes.
Add leading slash back after normalization to ensure filer paths are correct.

Fixes TestS3ApiServer_toFilerPath test failures.

* adjust tests

* normalize

* Fix: Normalize prefixes and markers in LIST operations using NormalizeObjectKey

Ensure consistent key normalization across all S3 operations (GET, PUT, LIST).
Previously, LIST operations were not applying the same normalization rules
(handling backslashes, duplicate slashes, leading slashes) as GET/PUT operations.

Changes:
- Updated normalizePrefixMarker() to call NormalizeObjectKey for both prefix and marker
- This ensures prefixes with leading slashes, backslashes, or duplicate slashes are
  handled consistently with how object keys are normalized
- Fixes Parquet test failures where pads.write_dataset creates implicit directory
  structures that couldn't be discovered by subsequent LIST operations
- Added TestPrefixNormalizationInList and TestListPrefixConsistency tests

All existing LIST tests continue to pass with the normalization improvements.

* Add debugging logging to LIST operations to track prefix normalization

* Fix: Remove leading slash addition from GetPrefix to work with NormalizeObjectKey

The NormalizeObjectKey function removes leading slashes to match S3 API format
(e.g., 'foo/bar' not '/foo/bar'). However, GetPrefix was adding a leading slash
back, which caused LIST operations to fail with incorrect path handling.

Now GetPrefix only normalizes duplicate slashes without adding a leading slash,
which allows NormalizeObjectKey changes to work correctly for S3 LIST operations.

All Parquet integration tests now pass (20/20).

* Fix: Handle object paths without leading slash in checkDirectoryObject

NormalizeObjectKey() removes the leading slash to match S3 API format.
However, checkDirectoryObject() was assuming the object path has a leading
slash when processing directory markers (paths ending with '/').

Now we ensure the object has a leading slash before processing it for
filer operations.

Fixes implicit directory marker test (explicit_dir/) while keeping
Parquet integration tests passing (20/20).

All tests pass:
- Implicit directory tests: 6/6
- Parquet integration tests: 20/20

* Fix: Handle explicit directory markers with trailing slashes

Explicit directory markers created with put_object(Key='dir/', ...) are stored
in the filer with the trailing slash as part of the name. The checkDirectoryObject()
function now checks for both:
1. Explicit directories: lookup with trailing slash preserved (e.g., 'explicit_dir/')
2. Implicit directories: lookup without trailing slash (e.g., 'implicit_dir')

This ensures both types of directory markers are properly recognized.

All tests pass:
- Implicit directory tests: 6/6 (including explicit directory marker test)
- Parquet integration tests: 20/20

* Fix: Preserve trailing slash in NormalizeObjectKey

NormalizeObjectKey now preserves trailing slashes when normalizing object keys.
This is important for explicit directory markers like 'explicit_dir/' which rely
on the trailing slash to be recognized as directory objects.

The normalization process:
1. Notes if trailing slash was present
2. Removes duplicate slashes and converts backslashes
3. Removes leading slash for S3 API format
4. Restores trailing slash if it was in the original

This ensures explicit directory markers created with put_object(Key='dir/', ...)
are properly normalized and can be looked up by their exact name.

All tests pass:
- Implicit directory tests: 6/6
- Parquet integration tests: 20/20

* clean object

* Fix: Don't restore trailing slash if result is empty

When normalizing paths that are only slashes (e.g., '///', '/'), the function
should return an empty string, not a single slash. The fix ensures we only
restore the trailing slash if the result is non-empty.

This fixes the 'just_slashes' test case:
- Input: '///'
- Expected: ''
- Previous: '/'
- Fixed: ''

All tests now pass:
- Unit tests: TestNormalizeObjectKey (13/13)
- Implicit directory tests: 6/6
- Parquet integration tests: 20/20

* prefixEndsOnDelimiter

* Update s3api_object_handlers_list.go

* Update s3api_object_handlers_list.go

* handle create directory
2025-12-24 19:07:08 -08:00

254 lines
10 KiB
Go

/*
* MinIO Cloud Storage, (C) 2019 MinIO, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package s3_constants
import (
"context"
"net/http"
"strings"
"github.com/gorilla/mux"
)
// S3 XML namespace
const (
S3Namespace = "http://s3.amazonaws.com/doc/2006-03-01/"
)
// Standard S3 HTTP request constants
const (
// S3 storage class
AmzStorageClass = "x-amz-storage-class"
// S3 user-defined metadata
AmzUserMetaPrefix = "X-Amz-Meta-"
AmzUserMetaDirective = "X-Amz-Metadata-Directive"
AmzUserMetaMtime = "X-Amz-Meta-Mtime"
// S3 object tagging
AmzObjectTagging = "X-Amz-Tagging"
AmzObjectTaggingPrefix = "X-Amz-Tagging-"
AmzObjectTaggingDirective = "X-Amz-Tagging-Directive"
AmzTagCount = "x-amz-tagging-count"
SeaweedFSUploadId = "X-Seaweedfs-Upload-Id"
SeaweedFSMultipartPartsCount = "X-Seaweedfs-Multipart-Parts-Count"
SeaweedFSMultipartPartBoundaries = "X-Seaweedfs-Multipart-Part-Boundaries" // JSON: [{part:1,start:0,end:2,etag:"abc"},{part:2,start:2,end:3,etag:"def"}]
SeaweedFSExpiresS3 = "X-Seaweedfs-Expires-S3"
AmzMpPartsCount = "x-amz-mp-parts-count"
// S3 ACL headers
AmzCannedAcl = "X-Amz-Acl"
AmzAclFullControl = "X-Amz-Grant-Full-Control"
AmzAclRead = "X-Amz-Grant-Read"
AmzAclWrite = "X-Amz-Grant-Write"
AmzAclReadAcp = "X-Amz-Grant-Read-Acp"
AmzAclWriteAcp = "X-Amz-Grant-Write-Acp"
// S3 Object Lock headers
AmzBucketObjectLockEnabled = "X-Amz-Bucket-Object-Lock-Enabled"
AmzObjectLockMode = "X-Amz-Object-Lock-Mode"
AmzObjectLockRetainUntilDate = "X-Amz-Object-Lock-Retain-Until-Date"
AmzObjectLockLegalHold = "X-Amz-Object-Lock-Legal-Hold"
// S3 conditional headers
IfMatch = "If-Match"
IfNoneMatch = "If-None-Match"
IfModifiedSince = "If-Modified-Since"
IfUnmodifiedSince = "If-Unmodified-Since"
// S3 conditional copy headers
AmzCopySourceIfMatch = "X-Amz-Copy-Source-If-Match"
AmzCopySourceIfNoneMatch = "X-Amz-Copy-Source-If-None-Match"
AmzCopySourceIfModifiedSince = "X-Amz-Copy-Source-If-Modified-Since"
AmzCopySourceIfUnmodifiedSince = "X-Amz-Copy-Source-If-Unmodified-Since"
// S3 Server-Side Encryption with Customer-provided Keys (SSE-C)
AmzServerSideEncryptionCustomerAlgorithm = "X-Amz-Server-Side-Encryption-Customer-Algorithm"
AmzServerSideEncryptionCustomerKey = "X-Amz-Server-Side-Encryption-Customer-Key"
AmzServerSideEncryptionCustomerKeyMD5 = "X-Amz-Server-Side-Encryption-Customer-Key-MD5"
AmzServerSideEncryptionContext = "X-Amz-Server-Side-Encryption-Context"
// S3 Server-Side Encryption with KMS (SSE-KMS)
AmzServerSideEncryption = "X-Amz-Server-Side-Encryption"
AmzServerSideEncryptionAwsKmsKeyId = "X-Amz-Server-Side-Encryption-Aws-Kms-Key-Id"
AmzServerSideEncryptionBucketKeyEnabled = "X-Amz-Server-Side-Encryption-Bucket-Key-Enabled"
// S3 SSE-C copy source headers
AmzCopySourceServerSideEncryptionCustomerAlgorithm = "X-Amz-Copy-Source-Server-Side-Encryption-Customer-Algorithm"
AmzCopySourceServerSideEncryptionCustomerKey = "X-Amz-Copy-Source-Server-Side-Encryption-Customer-Key"
AmzCopySourceServerSideEncryptionCustomerKeyMD5 = "X-Amz-Copy-Source-Server-Side-Encryption-Customer-Key-MD5"
)
// Metadata keys for internal storage
const (
// SSE-KMS metadata keys
AmzEncryptedDataKey = "x-amz-encrypted-data-key"
AmzEncryptionContextMeta = "x-amz-encryption-context"
// SeaweedFS internal metadata prefix (used to filter internal headers from client responses)
SeaweedFSInternalPrefix = "x-seaweedfs-"
// SeaweedFS internal metadata keys for encryption (prefixed to avoid automatic HTTP header conversion)
SeaweedFSSSEKMSKey = "x-seaweedfs-sse-kms-key" // Key for storing serialized SSE-KMS metadata
SeaweedFSSSES3Key = "x-seaweedfs-sse-s3-key" // Key for storing serialized SSE-S3 metadata
SeaweedFSSSEIV = "x-seaweedfs-sse-c-iv" // Key for storing SSE-C IV
// Multipart upload metadata keys for SSE-KMS (consistent with internal metadata key pattern)
SeaweedFSSSEKMSKeyID = "x-seaweedfs-sse-kms-key-id" // Key ID for multipart upload SSE-KMS inheritance
SeaweedFSSSEKMSEncryption = "x-seaweedfs-sse-kms-encryption" // Encryption type for multipart upload SSE-KMS inheritance
SeaweedFSSSEKMSBucketKeyEnabled = "x-seaweedfs-sse-kms-bucket-key-enabled" // Bucket key setting for multipart upload SSE-KMS inheritance
SeaweedFSSSEKMSEncryptionContext = "x-seaweedfs-sse-kms-encryption-context" // Encryption context for multipart upload SSE-KMS inheritance
SeaweedFSSSEKMSBaseIV = "x-seaweedfs-sse-kms-base-iv" // Base IV for multipart upload SSE-KMS (for IV offset calculation)
// Multipart upload metadata keys for SSE-S3
SeaweedFSSSES3Encryption = "x-seaweedfs-sse-s3-encryption" // Encryption type for multipart upload SSE-S3 inheritance
SeaweedFSSSES3BaseIV = "x-seaweedfs-sse-s3-base-iv" // Base IV for multipart upload SSE-S3 (for IV offset calculation)
SeaweedFSSSES3KeyData = "x-seaweedfs-sse-s3-key-data" // Encrypted key data for multipart upload SSE-S3 inheritance
)
// SeaweedFS internal headers for filer communication
const (
SeaweedFSSSEKMSKeyHeader = "X-SeaweedFS-SSE-KMS-Key" // Header for passing SSE-KMS metadata to filer
SeaweedFSSSEIVHeader = "X-SeaweedFS-SSE-IV" // Header for passing SSE-C IV to filer (SSE-C only)
SeaweedFSSSEKMSBaseIVHeader = "X-SeaweedFS-SSE-KMS-Base-IV" // Header for passing base IV for multipart SSE-KMS
SeaweedFSSSES3BaseIVHeader = "X-SeaweedFS-SSE-S3-Base-IV" // Header for passing base IV for multipart SSE-S3
SeaweedFSSSES3KeyDataHeader = "X-SeaweedFS-SSE-S3-Key-Data" // Header for passing key data for multipart SSE-S3
)
// Non-Standard S3 HTTP request constants
const (
AmzIdentityId = "s3-identity-id"
AmzAccountId = "s3-account-id"
AmzAuthType = "s3-auth-type"
)
func GetBucketAndObject(r *http.Request) (bucket, object string) {
vars := mux.Vars(r)
bucket = vars["bucket"]
object = NormalizeObjectKey(vars["object"])
return
}
// NormalizeObjectKey normalizes object keys by removing duplicate slashes and converting backslashes.
// This normalizes keys from various sources (URL path, form values, etc.) to a consistent format.
// It also converts Windows-style backslashes to forward slashes for cross-platform compatibility.
// Returns keys WITHOUT leading slash to match S3 API format (e.g., "foo/bar" not "/foo/bar").
// Preserves trailing slash if present (e.g., "foo/" stays "foo/").
func NormalizeObjectKey(object string) string {
// Preserve trailing slash if present
hasTrailingSlash := strings.HasSuffix(object, "/")
// Convert Windows-style backslashes to forward slashes
object = strings.ReplaceAll(object, "\\", "/")
object = removeDuplicateSlashes(object)
// Remove leading slash to match S3 API format
object = strings.TrimPrefix(object, "/")
// Restore trailing slash if it was present and result is not empty
if hasTrailingSlash && object != "" && !strings.HasSuffix(object, "/") {
object = object + "/"
}
return object
}
// removeDuplicateSlashes removes consecutive slashes from a path
func removeDuplicateSlashes(s string) string {
var result strings.Builder
result.Grow(len(s))
lastWasSlash := false
for _, r := range s {
if r == '/' {
if !lastWasSlash {
result.WriteRune(r)
}
lastWasSlash = true
} else {
result.WriteRune(r)
lastWasSlash = false
}
}
return result.String()
}
func GetPrefix(r *http.Request) string {
query := r.URL.Query()
prefix := query.Get("prefix")
prefix = removeDuplicateSlashes(prefix)
return prefix
}
var PassThroughHeaders = map[string]string{
"response-cache-control": "Cache-Control",
"response-content-disposition": "Content-Disposition",
"response-content-encoding": "Content-Encoding",
"response-content-language": "Content-Language",
"response-content-type": "Content-Type",
"response-expires": "Expires",
}
// IsSeaweedFSInternalHeader checks if a header key is a SeaweedFS internal header
// that should be filtered from client responses.
// Header names are case-insensitive in HTTP, so this function normalizes to lowercase.
func IsSeaweedFSInternalHeader(headerKey string) bool {
return strings.HasPrefix(strings.ToLower(headerKey), SeaweedFSInternalPrefix)
}
// Context keys for storing authenticated identity information
type contextKey string
const (
contextKeyIdentityName contextKey = "s3-identity-name"
contextKeyIdentityObject contextKey = "s3-identity-object"
)
// SetIdentityNameInContext stores the authenticated identity name in the request context
// This is the secure way to propagate identity - headers can be spoofed, context cannot
func SetIdentityNameInContext(ctx context.Context, identityName string) context.Context {
if identityName != "" {
return context.WithValue(ctx, contextKeyIdentityName, identityName)
}
return ctx
}
// GetIdentityNameFromContext retrieves the authenticated identity name from the request context
// Returns empty string if no identity is set (unauthenticated request)
// This is the secure way to retrieve identity - never read from headers directly
func GetIdentityNameFromContext(r *http.Request) string {
if name, ok := r.Context().Value(contextKeyIdentityName).(string); ok {
return name
}
return ""
}
// SetIdentityInContext stores the full authenticated identity object in the request context
// This is used to pass the full identity (including for JWT users) to handlers
func SetIdentityInContext(ctx context.Context, identity interface{}) context.Context {
if identity != nil {
return context.WithValue(ctx, contextKeyIdentityObject, identity)
}
return ctx
}
// GetIdentityFromContext retrieves the full identity object from the request context
// Returns nil if no identity is set (unauthenticated request)
func GetIdentityFromContext(r *http.Request) interface{} {
return r.Context().Value(contextKeyIdentityObject)
}