lifecycle worker: scan-time rule evaluation for object expiration (#8809)
* s3api: extend lifecycle XML types with NoncurrentVersionExpiration, AbortIncompleteMultipartUpload Add missing S3 lifecycle rule types to the XML data model: - NoncurrentVersionExpiration with NoncurrentDays and NewerNoncurrentVersions - NoncurrentVersionTransition with NoncurrentDays and StorageClass - AbortIncompleteMultipartUpload with DaysAfterInitiation - Filter.ObjectSizeGreaterThan and ObjectSizeLessThan - And.ObjectSizeGreaterThan and ObjectSizeLessThan - Filter.UnmarshalXML to properly parse Tag, And, and size filter elements Each new type follows the existing set-field pattern for conditional XML marshaling. No behavior changes - these types are not yet wired into handlers or the lifecycle worker. * s3lifecycle: add lifecycle rule evaluator package New package weed/s3api/s3lifecycle/ provides a pure-function lifecycle rule evaluation engine. The evaluator accepts flattened Rule structs and ObjectInfo metadata, and returns the appropriate Action. Components: - evaluator.go: Evaluate() for per-object actions with S3 priority ordering (delete marker > noncurrent version > current expiration), ShouldExpireNoncurrentVersion() with NewerNoncurrentVersions support, EvaluateMPUAbort() for multipart upload rules - filter.go: prefix, tag, and size-based filter matching - tags.go: ExtractTags() extracts S3 tags from filer Extended metadata, HasTagRules() for scan-time optimization - version_time.go: GetVersionTimestamp() extracts timestamps from SeaweedFS version IDs (both old and new format) Comprehensive test coverage: 54 tests covering all action types, filter combinations, edge cases, and version ID formats. * s3api: add UnmarshalXML for Expiration, Transition, ExpireDeleteMarker Add UnmarshalXML methods that set the internal 'set' flag during XML parsing. Previously these flags were only set programmatically, causing XML round-trip to drop elements. This ensures lifecycle configurations stored as XML survive unmarshal/marshal cycles correctly. Add comprehensive XML round-trip tests for all lifecycle rule types including NoncurrentVersionExpiration, AbortIncompleteMultipartUpload, Filter with Tag/And/size constraints, and a complete Terraform-style lifecycle configuration. * s3lifecycle: address review feedback - Fix version_time.go overflow: guard timestampPart > MaxInt64 before the inversion subtraction to prevent uint64 wrap - Make all expiry checks inclusive (!now.Before instead of now.After) so actions trigger at the exact scheduled instant - Add NoncurrentIndex to ObjectInfo so Evaluate() can properly handle NewerNoncurrentVersions via ShouldExpireNoncurrentVersion() - Add test for high-bit overflow version ID * s3lifecycle: guard ShouldExpireNoncurrentVersion against zero SuccessorModTime Add early return when obj.IsLatest or obj.SuccessorModTime.IsZero() to prevent premature expiration of versions with uninitialized successor timestamps (zero value would compute to epoch, always expired). * lifecycle worker: detect buckets with lifecycle XML, not just filer.conf TTLs Update the detection phase to check for stored lifecycle XML in bucket metadata (key: s3-bucket-lifecycle-configuration-xml) in addition to filer.conf TTL entries. A bucket is proposed for lifecycle processing if it has lifecycle XML OR filer.conf TTLs (backward compatible). New proposal parameters: - has_lifecycle_xml: whether the bucket has stored lifecycle XML - versioning_status: the bucket's versioning state (Enabled/Suspended/"") These parameters will be used by the execution phase (subsequent PR) to determine which evaluation path to use. * lifecycle worker: update detection function comment to reflect XML support * lifecycle worker: add lifecycle XML parsing and rule conversion Add rules.go with: - parseLifecycleXML() converts stored lifecycle XML to evaluator-friendly s3lifecycle.Rule structs, handling Filter.Prefix, Filter.Tag, Filter.And, size constraints, NoncurrentVersionExpiration, AbortIncompleteMultipartUpload, Expiration.Date, and ExpiredObjectDeleteMarker - loadLifecycleRulesFromBucket() reads lifecycle XML from bucket metadata - parseExpirationDate() supports RFC3339 and ISO 8601 date-only formats Comprehensive tests for all XML variants, filter types, and date formats. * lifecycle worker: add scan-time rule evaluation for object expiration Update executeLifecycleForBucket to try lifecycle XML evaluation first, falling back to TTL-only evaluation when no lifecycle XML exists. New listExpiredObjectsByRules() function: - Walks the bucket directory tree - Builds s3lifecycle.ObjectInfo from each filer entry - Calls s3lifecycle.Evaluate() to check lifecycle rules - Skips objects already handled by TTL fast path (TtlSec set) - Extracts tags only when rules use tag-based filters (optimization) - Skips .uploads and .versions directories (handled by other phases) Supports Expiration.Days, Expiration.Date, Filter.Prefix, Filter.Tag, Filter.And, and Filter.ObjectSize* in the scan-time evaluation path. Existing TTL-based path remains for backward compatibility. * lifecycle worker: address review feedback - Use sentinel error (errLimitReached) instead of string matching for scan limit detection - Fix loadLifecycleRulesFromBucket path: use bucketsPath directly as directory for LookupEntry instead of path.Dir which produced the wrong parent * lifecycle worker: fix And filter detection for size-only constraints The And branch condition only triggered when Prefix or Tags were present, missing the case where And contains only ObjectSizeGreaterThan or ObjectSizeLessThan without a prefix or tags. * lifecycle worker: address review feedback round 3 - rules.go: pass through Filter-level size constraints when Tag is present without And (Tag+size combination was dropping sizes) - execution.go: add doc comment to listExpiredObjectsByRules noting that it handles non-versioned objects only; versioned objects are handled by processVersionsDirectory - rules_test.go: add bounds checks before indexing rules[0] --------- Co-authored-by: Copilot <copilot@github.com>
This commit is contained in:
177
weed/plugin/worker/lifecycle/rules.go
Normal file
177
weed/plugin/worker/lifecycle/rules.go
Normal file
@@ -0,0 +1,177 @@
|
||||
package lifecycle
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/seaweedfs/seaweedfs/weed/glog"
|
||||
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
|
||||
"github.com/seaweedfs/seaweedfs/weed/s3api/s3lifecycle"
|
||||
)
|
||||
|
||||
// lifecycleConfig mirrors the XML structure just enough to parse rules.
|
||||
// We define a minimal local struct to avoid importing the s3api package
|
||||
// (which would create a circular dependency if s3api ever imports the worker).
|
||||
type lifecycleConfig struct {
|
||||
XMLName xml.Name `xml:"LifecycleConfiguration"`
|
||||
Rules []lifecycleConfigRule `xml:"Rule"`
|
||||
}
|
||||
|
||||
type lifecycleConfigRule struct {
|
||||
ID string `xml:"ID"`
|
||||
Status string `xml:"Status"`
|
||||
Filter lifecycleFilter `xml:"Filter"`
|
||||
Prefix string `xml:"Prefix"`
|
||||
Expiration lifecycleExpiration `xml:"Expiration"`
|
||||
NoncurrentVersionExpiration noncurrentVersionExpiration `xml:"NoncurrentVersionExpiration"`
|
||||
AbortIncompleteMultipartUpload abortMPU `xml:"AbortIncompleteMultipartUpload"`
|
||||
}
|
||||
|
||||
type lifecycleFilter struct {
|
||||
Prefix string `xml:"Prefix"`
|
||||
Tag lifecycleTag `xml:"Tag"`
|
||||
And lifecycleAnd `xml:"And"`
|
||||
ObjectSizeGreaterThan int64 `xml:"ObjectSizeGreaterThan"`
|
||||
ObjectSizeLessThan int64 `xml:"ObjectSizeLessThan"`
|
||||
}
|
||||
|
||||
type lifecycleAnd struct {
|
||||
Prefix string `xml:"Prefix"`
|
||||
Tags []lifecycleTag `xml:"Tag"`
|
||||
ObjectSizeGreaterThan int64 `xml:"ObjectSizeGreaterThan"`
|
||||
ObjectSizeLessThan int64 `xml:"ObjectSizeLessThan"`
|
||||
}
|
||||
|
||||
type lifecycleTag struct {
|
||||
Key string `xml:"Key"`
|
||||
Value string `xml:"Value"`
|
||||
}
|
||||
|
||||
type lifecycleExpiration struct {
|
||||
Days int `xml:"Days"`
|
||||
Date string `xml:"Date"`
|
||||
ExpiredObjectDeleteMarker bool `xml:"ExpiredObjectDeleteMarker"`
|
||||
}
|
||||
|
||||
type noncurrentVersionExpiration struct {
|
||||
NoncurrentDays int `xml:"NoncurrentDays"`
|
||||
NewerNoncurrentVersions int `xml:"NewerNoncurrentVersions"`
|
||||
}
|
||||
|
||||
type abortMPU struct {
|
||||
DaysAfterInitiation int `xml:"DaysAfterInitiation"`
|
||||
}
|
||||
|
||||
// loadLifecycleRulesFromBucket reads the lifecycle XML from a bucket's
|
||||
// metadata and converts it to evaluator-friendly rules.
|
||||
func loadLifecycleRulesFromBucket(
|
||||
ctx context.Context,
|
||||
client filer_pb.SeaweedFilerClient,
|
||||
bucketsPath, bucket string,
|
||||
) ([]s3lifecycle.Rule, error) {
|
||||
bucketDir := bucketsPath
|
||||
resp, err := filer_pb.LookupEntry(ctx, client, &filer_pb.LookupDirectoryEntryRequest{
|
||||
Directory: bucketDir,
|
||||
Name: bucket,
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("lookup bucket %s: %w", bucket, err)
|
||||
}
|
||||
if resp.Entry == nil || resp.Entry.Extended == nil {
|
||||
return nil, nil
|
||||
}
|
||||
xmlData := resp.Entry.Extended[lifecycleXMLKey]
|
||||
if len(xmlData) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
return parseLifecycleXML(xmlData)
|
||||
}
|
||||
|
||||
// parseLifecycleXML parses lifecycle configuration XML and converts it
|
||||
// to evaluator-friendly rules.
|
||||
func parseLifecycleXML(data []byte) ([]s3lifecycle.Rule, error) {
|
||||
var config lifecycleConfig
|
||||
if err := xml.NewDecoder(bytes.NewReader(data)).Decode(&config); err != nil {
|
||||
return nil, fmt.Errorf("decode lifecycle XML: %w", err)
|
||||
}
|
||||
|
||||
var rules []s3lifecycle.Rule
|
||||
for _, r := range config.Rules {
|
||||
rule := s3lifecycle.Rule{
|
||||
ID: r.ID,
|
||||
Status: r.Status,
|
||||
}
|
||||
|
||||
// Resolve prefix: Filter.And.Prefix > Filter.Prefix > Rule.Prefix
|
||||
switch {
|
||||
case r.Filter.And.Prefix != "" || len(r.Filter.And.Tags) > 0 ||
|
||||
r.Filter.And.ObjectSizeGreaterThan > 0 || r.Filter.And.ObjectSizeLessThan > 0:
|
||||
rule.Prefix = r.Filter.And.Prefix
|
||||
rule.FilterTags = tagsToMap(r.Filter.And.Tags)
|
||||
rule.FilterSizeGreaterThan = r.Filter.And.ObjectSizeGreaterThan
|
||||
rule.FilterSizeLessThan = r.Filter.And.ObjectSizeLessThan
|
||||
case r.Filter.Tag.Key != "":
|
||||
rule.Prefix = r.Filter.Prefix
|
||||
rule.FilterTags = map[string]string{r.Filter.Tag.Key: r.Filter.Tag.Value}
|
||||
rule.FilterSizeGreaterThan = r.Filter.ObjectSizeGreaterThan
|
||||
rule.FilterSizeLessThan = r.Filter.ObjectSizeLessThan
|
||||
default:
|
||||
if r.Filter.Prefix != "" {
|
||||
rule.Prefix = r.Filter.Prefix
|
||||
} else {
|
||||
rule.Prefix = r.Prefix
|
||||
}
|
||||
rule.FilterSizeGreaterThan = r.Filter.ObjectSizeGreaterThan
|
||||
rule.FilterSizeLessThan = r.Filter.ObjectSizeLessThan
|
||||
}
|
||||
|
||||
rule.ExpirationDays = r.Expiration.Days
|
||||
rule.ExpiredObjectDeleteMarker = r.Expiration.ExpiredObjectDeleteMarker
|
||||
rule.NoncurrentVersionExpirationDays = r.NoncurrentVersionExpiration.NoncurrentDays
|
||||
rule.NewerNoncurrentVersions = r.NoncurrentVersionExpiration.NewerNoncurrentVersions
|
||||
rule.AbortMPUDaysAfterInitiation = r.AbortIncompleteMultipartUpload.DaysAfterInitiation
|
||||
|
||||
// Parse Date if present.
|
||||
if r.Expiration.Date != "" {
|
||||
// Date may be RFC3339 or ISO 8601 date-only.
|
||||
parsed, parseErr := parseExpirationDate(r.Expiration.Date)
|
||||
if parseErr != nil {
|
||||
glog.V(1).Infof("s3_lifecycle: skipping rule %s: invalid expiration date %q: %v", r.ID, r.Expiration.Date, parseErr)
|
||||
continue
|
||||
}
|
||||
rule.ExpirationDate = parsed
|
||||
}
|
||||
|
||||
rules = append(rules, rule)
|
||||
}
|
||||
return rules, nil
|
||||
}
|
||||
|
||||
func tagsToMap(tags []lifecycleTag) map[string]string {
|
||||
if len(tags) == 0 {
|
||||
return nil
|
||||
}
|
||||
m := make(map[string]string, len(tags))
|
||||
for _, t := range tags {
|
||||
m[t.Key] = t.Value
|
||||
}
|
||||
return m
|
||||
}
|
||||
|
||||
func parseExpirationDate(s string) (time.Time, error) {
|
||||
// Try RFC3339 first, then ISO 8601 date-only.
|
||||
formats := []string{
|
||||
"2006-01-02T15:04:05Z07:00",
|
||||
"2006-01-02",
|
||||
}
|
||||
for _, f := range formats {
|
||||
t, err := time.Parse(f, s)
|
||||
if err == nil {
|
||||
return t, nil
|
||||
}
|
||||
}
|
||||
return time.Time{}, fmt.Errorf("unrecognized date format: %s", s)
|
||||
}
|
||||
Reference in New Issue
Block a user