Files
seaweedFS/weed/filer/filer_notify.go
Chris Lu 39ba19eea6 filer: async empty folder cleanup via metadata events (#7614)
* filer: async empty folder cleanup via metadata events

Implements asynchronous empty folder cleanup when files are deleted in S3.

Key changes:

1. EmptyFolderCleaner - New component that handles folder cleanup:
   - Uses consistent hashing (LockRing) to determine folder ownership
   - Each filer owns specific folders, avoiding duplicate cleanup work
   - Debounces delete events (10s delay) to batch multiple deletes
   - Caches rough folder counts to skip unnecessary checks
   - Cancels pending cleanup when new files are created
   - Handles both file and subdirectory deletions

2. Integration with metadata events:
   - Listens to both local and remote filer metadata events
   - Processes create/delete/rename events to track folder state
   - Only processes folders under /buckets/<bucket>/...

3. Removed synchronous empty folder cleanup from S3 handlers:
   - DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
   - DeleteMultipleObjectsHandler no longer tracks/cleans directories
   - Cleanup now happens asynchronously via metadata events

Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically

* filer: add CleanupQueue data structure for deduplicated folder cleanup

CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)

Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection

* filer: use CleanupQueue for empty folder cleanup

Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup

Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios

* filer: CleanupQueue.Add moves duplicate to back with updated time

When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time

This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.

* filer: CleanupQueue uses event time and inserts in sorted order

Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)

This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.

* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)

Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order

Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging

* filer: initialize cache entry on first delete/create event

Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.

Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.

* filer: skip adding to cleanup queue if roughCount > 0

If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.

* filer: don't create cache entry on create event

Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.

* filer: move empty folder cleanup to its own package

- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions

* filer: make isUnderPath and isUnderBucketPath private

These helpers are only used within the empty_folder_cleanup package.
2025-12-03 21:12:19 -08:00

191 lines
5.2 KiB
Go

package filer
import (
"context"
"fmt"
"io"
"regexp"
"strings"
"time"
"github.com/seaweedfs/seaweedfs/weed/util/log_buffer"
"google.golang.org/protobuf/proto"
"github.com/seaweedfs/seaweedfs/weed/glog"
"github.com/seaweedfs/seaweedfs/weed/notification"
"github.com/seaweedfs/seaweedfs/weed/pb/filer_pb"
"github.com/seaweedfs/seaweedfs/weed/util"
)
func (f *Filer) NotifyUpdateEvent(ctx context.Context, oldEntry, newEntry *Entry, deleteChunks, isFromOtherCluster bool, signatures []int32) {
var fullpath string
if oldEntry != nil {
fullpath = string(oldEntry.FullPath)
} else if newEntry != nil {
fullpath = string(newEntry.FullPath)
} else {
return
}
// println("fullpath:", fullpath)
if strings.HasPrefix(fullpath, SystemLogDir) {
return
}
foundSelf := false
for _, sig := range signatures {
if sig == f.Signature {
foundSelf = true
}
}
if !foundSelf {
signatures = append(signatures, f.Signature)
}
newParentPath := ""
if newEntry != nil {
newParentPath, _ = newEntry.FullPath.DirAndName()
}
eventNotification := &filer_pb.EventNotification{
OldEntry: oldEntry.ToProtoEntry(),
NewEntry: newEntry.ToProtoEntry(),
DeleteChunks: deleteChunks,
NewParentPath: newParentPath,
IsFromOtherCluster: isFromOtherCluster,
Signatures: signatures,
}
if notification.Queue != nil {
glog.V(3).Infof("notifying entry update %v", fullpath)
if err := notification.Queue.SendMessage(fullpath, eventNotification); err != nil {
// throw message
glog.Error(err)
}
}
f.logMetaEvent(ctx, fullpath, eventNotification)
// Trigger empty folder cleanup for local events
// Remote events are handled via MetaAggregator.onMetadataChangeEvent
f.triggerLocalEmptyFolderCleanup(oldEntry, newEntry)
}
func (f *Filer) logMetaEvent(ctx context.Context, fullpath string, eventNotification *filer_pb.EventNotification) {
dir, _ := util.FullPath(fullpath).DirAndName()
event := &filer_pb.SubscribeMetadataResponse{
Directory: dir,
EventNotification: eventNotification,
TsNs: time.Now().UnixNano(),
}
data, err := proto.Marshal(event)
if err != nil {
glog.Errorf("failed to marshal filer_pb.SubscribeMetadataResponse %+v: %v", event, err)
return
}
if err := f.LocalMetaLogBuffer.AddDataToBuffer([]byte(dir), data, event.TsNs); err != nil {
glog.Errorf("failed to add data to log buffer for %s: %v", dir, err)
}
}
// triggerLocalEmptyFolderCleanup triggers empty folder cleanup for local events
// This is needed because onMetadataChangeEvent is only called for remote peer events
func (f *Filer) triggerLocalEmptyFolderCleanup(oldEntry, newEntry *Entry) {
if f.EmptyFolderCleaner == nil || !f.EmptyFolderCleaner.IsEnabled() {
return
}
eventTime := time.Now()
// Handle delete events (oldEntry exists, newEntry is nil)
if oldEntry != nil && newEntry == nil {
dir, name := oldEntry.FullPath.DirAndName()
f.EmptyFolderCleaner.OnDeleteEvent(dir, name, oldEntry.IsDirectory(), eventTime)
}
// Handle create events (oldEntry is nil, newEntry exists)
if oldEntry == nil && newEntry != nil {
dir, name := newEntry.FullPath.DirAndName()
f.EmptyFolderCleaner.OnCreateEvent(dir, name, newEntry.IsDirectory())
}
// Handle rename/move events (both exist but paths differ)
if oldEntry != nil && newEntry != nil {
oldDir, oldName := oldEntry.FullPath.DirAndName()
newDir, newName := newEntry.FullPath.DirAndName()
if oldDir != newDir || oldName != newName {
// Treat old location as delete
f.EmptyFolderCleaner.OnDeleteEvent(oldDir, oldName, oldEntry.IsDirectory(), eventTime)
// Treat new location as create
f.EmptyFolderCleaner.OnCreateEvent(newDir, newName, newEntry.IsDirectory())
}
}
}
func (f *Filer) logFlushFunc(logBuffer *log_buffer.LogBuffer, startTime, stopTime time.Time, buf []byte, minOffset, maxOffset int64) {
if len(buf) == 0 {
return
}
startTime, stopTime = startTime.UTC(), stopTime.UTC()
targetFile := fmt.Sprintf("%s/%04d-%02d-%02d/%02d-%02d.%08x", SystemLogDir,
startTime.Year(), startTime.Month(), startTime.Day(), startTime.Hour(), startTime.Minute(), f.UniqueFilerId,
// startTime.Second(), startTime.Nanosecond(),
)
for {
if err := f.appendToFile(targetFile, buf); err != nil {
glog.V(0).Infof("metadata log write failed %s: %v", targetFile, err)
time.Sleep(737 * time.Millisecond)
} else {
break
}
}
}
var (
VolumeNotFoundPattern = regexp.MustCompile(`volume \d+? not found`)
)
func (f *Filer) ReadPersistedLogBuffer(startPosition log_buffer.MessagePosition, stopTsNs int64, eachLogEntryFn log_buffer.EachLogEntryFuncType) (lastTsNs int64, isDone bool, err error) {
visitor, visitErr := f.collectPersistedLogBuffer(startPosition, stopTsNs)
if visitErr != nil {
if visitErr == io.EOF {
return
}
err = fmt.Errorf("reading from persisted logs: %w", visitErr)
return
}
var logEntry *filer_pb.LogEntry
for {
logEntry, visitErr = visitor.GetNext()
if visitErr != nil {
if visitErr == io.EOF {
break
}
err = fmt.Errorf("read next from persisted logs: %w", visitErr)
return
}
isDone, visitErr = eachLogEntryFn(logEntry)
if visitErr != nil {
err = fmt.Errorf("process persisted log entry: %w", visitErr)
return
}
lastTsNs = logEntry.TsNs
if isDone {
return
}
}
return
}