Files
seaweedFS/weed/filer/empty_folder_cleanup/cleanup_queue_test.go
Chris Lu 39ba19eea6 filer: async empty folder cleanup via metadata events (#7614)
* filer: async empty folder cleanup via metadata events

Implements asynchronous empty folder cleanup when files are deleted in S3.

Key changes:

1. EmptyFolderCleaner - New component that handles folder cleanup:
   - Uses consistent hashing (LockRing) to determine folder ownership
   - Each filer owns specific folders, avoiding duplicate cleanup work
   - Debounces delete events (10s delay) to batch multiple deletes
   - Caches rough folder counts to skip unnecessary checks
   - Cancels pending cleanup when new files are created
   - Handles both file and subdirectory deletions

2. Integration with metadata events:
   - Listens to both local and remote filer metadata events
   - Processes create/delete/rename events to track folder state
   - Only processes folders under /buckets/<bucket>/...

3. Removed synchronous empty folder cleanup from S3 handlers:
   - DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
   - DeleteMultipleObjectsHandler no longer tracks/cleans directories
   - Cleanup now happens asynchronously via metadata events

Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically

* filer: add CleanupQueue data structure for deduplicated folder cleanup

CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)

Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection

* filer: use CleanupQueue for empty folder cleanup

Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup

Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios

* filer: CleanupQueue.Add moves duplicate to back with updated time

When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time

This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.

* filer: CleanupQueue uses event time and inserts in sorted order

Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)

This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.

* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)

Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order

Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging

* filer: initialize cache entry on first delete/create event

Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.

Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.

* filer: skip adding to cleanup queue if roughCount > 0

If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.

* filer: don't create cache entry on create event

Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.

* filer: move empty folder cleanup to its own package

- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions

* filer: make isUnderPath and isUnderBucketPath private

These helpers are only used within the empty_folder_cleanup package.
2025-12-03 21:12:19 -08:00

371 lines
9.3 KiB
Go

package empty_folder_cleanup
import (
"testing"
"time"
)
func TestCleanupQueue_Add(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
now := time.Now()
// Add first item
if !q.Add("/buckets/b1/folder1", now) {
t.Error("expected Add to return true for new item")
}
if q.Len() != 1 {
t.Errorf("expected len 1, got %d", q.Len())
}
// Add second item with later time
if !q.Add("/buckets/b1/folder2", now.Add(1*time.Second)) {
t.Error("expected Add to return true for new item")
}
if q.Len() != 2 {
t.Errorf("expected len 2, got %d", q.Len())
}
// Add duplicate with newer time - should update and reposition
if q.Add("/buckets/b1/folder1", now.Add(2*time.Second)) {
t.Error("expected Add to return false for existing item")
}
if q.Len() != 2 {
t.Errorf("expected len 2 after duplicate, got %d", q.Len())
}
// folder1 should now be at the back (newer time) - verify by popping
folder1, _ := q.Pop()
folder2, _ := q.Pop()
if folder1 != "/buckets/b1/folder2" || folder2 != "/buckets/b1/folder1" {
t.Errorf("expected folder1 to be moved to back, got %s, %s", folder1, folder2)
}
}
func TestCleanupQueue_Add_OutOfOrder(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
baseTime := time.Now()
// Add items out of order
q.Add("/buckets/b1/folder3", baseTime.Add(3*time.Second))
q.Add("/buckets/b1/folder1", baseTime.Add(1*time.Second))
q.Add("/buckets/b1/folder2", baseTime.Add(2*time.Second))
// Items should be in time order (oldest first) - verify by popping
expected := []string{"/buckets/b1/folder1", "/buckets/b1/folder2", "/buckets/b1/folder3"}
for i, exp := range expected {
folder, ok := q.Pop()
if !ok || folder != exp {
t.Errorf("at index %d: expected %s, got %s", i, exp, folder)
}
}
}
func TestCleanupQueue_Add_DuplicateWithOlderTime(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
baseTime := time.Now()
// Add folder at t=5
q.Add("/buckets/b1/folder1", baseTime.Add(5*time.Second))
// Try to add same folder with older time - should NOT update
q.Add("/buckets/b1/folder1", baseTime.Add(2*time.Second))
// Time should remain at t=5
_, queueTime, _ := q.Peek()
if queueTime != baseTime.Add(5*time.Second) {
t.Errorf("expected time to remain unchanged, got %v", queueTime)
}
}
func TestCleanupQueue_Remove(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
now := time.Now()
q.Add("/buckets/b1/folder1", now)
q.Add("/buckets/b1/folder2", now.Add(1*time.Second))
q.Add("/buckets/b1/folder3", now.Add(2*time.Second))
// Remove middle item
if !q.Remove("/buckets/b1/folder2") {
t.Error("expected Remove to return true for existing item")
}
if q.Len() != 2 {
t.Errorf("expected len 2, got %d", q.Len())
}
if q.Contains("/buckets/b1/folder2") {
t.Error("removed item should not be in queue")
}
// Remove non-existent item
if q.Remove("/buckets/b1/nonexistent") {
t.Error("expected Remove to return false for non-existent item")
}
// Verify order is preserved by popping
folder1, _ := q.Pop()
folder3, _ := q.Pop()
if folder1 != "/buckets/b1/folder1" || folder3 != "/buckets/b1/folder3" {
t.Errorf("unexpected order: %s, %s", folder1, folder3)
}
}
func TestCleanupQueue_Pop(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
now := time.Now()
// Pop from empty queue
folder, ok := q.Pop()
if ok {
t.Error("expected Pop to return false for empty queue")
}
if folder != "" {
t.Errorf("expected empty folder, got %s", folder)
}
// Add items and pop in order
q.Add("/buckets/b1/folder1", now)
q.Add("/buckets/b1/folder2", now.Add(1*time.Second))
q.Add("/buckets/b1/folder3", now.Add(2*time.Second))
folder, ok = q.Pop()
if !ok || folder != "/buckets/b1/folder1" {
t.Errorf("expected folder1, got %s (ok=%v)", folder, ok)
}
folder, ok = q.Pop()
if !ok || folder != "/buckets/b1/folder2" {
t.Errorf("expected folder2, got %s (ok=%v)", folder, ok)
}
folder, ok = q.Pop()
if !ok || folder != "/buckets/b1/folder3" {
t.Errorf("expected folder3, got %s (ok=%v)", folder, ok)
}
// Queue should be empty now
if q.Len() != 0 {
t.Errorf("expected empty queue, got len %d", q.Len())
}
}
func TestCleanupQueue_Peek(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
now := time.Now()
// Peek empty queue
folder, _, ok := q.Peek()
if ok {
t.Error("expected Peek to return false for empty queue")
}
// Add item and peek
q.Add("/buckets/b1/folder1", now)
folder, queueTime, ok := q.Peek()
if !ok || folder != "/buckets/b1/folder1" {
t.Errorf("expected folder1, got %s (ok=%v)", folder, ok)
}
if queueTime != now {
t.Errorf("expected queue time %v, got %v", now, queueTime)
}
// Peek should not remove item
if q.Len() != 1 {
t.Errorf("Peek should not remove item, len=%d", q.Len())
}
}
func TestCleanupQueue_Contains(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
now := time.Now()
q.Add("/buckets/b1/folder1", now)
if !q.Contains("/buckets/b1/folder1") {
t.Error("expected Contains to return true")
}
if q.Contains("/buckets/b1/folder2") {
t.Error("expected Contains to return false for non-existent")
}
}
func TestCleanupQueue_ShouldProcess_MaxSize(t *testing.T) {
q := NewCleanupQueue(3, 10*time.Minute)
now := time.Now()
// Empty queue
if q.ShouldProcess() {
t.Error("empty queue should not need processing")
}
// Add items below max
q.Add("/buckets/b1/folder1", now)
q.Add("/buckets/b1/folder2", now.Add(1*time.Second))
if q.ShouldProcess() {
t.Error("queue below max should not need processing")
}
// Add item to reach max
q.Add("/buckets/b1/folder3", now.Add(2*time.Second))
if !q.ShouldProcess() {
t.Error("queue at max should need processing")
}
}
func TestCleanupQueue_ShouldProcess_MaxAge(t *testing.T) {
q := NewCleanupQueue(100, 100*time.Millisecond) // Short max age for testing
// Add item with old event time
oldTime := time.Now().Add(-1 * time.Second) // 1 second ago
q.Add("/buckets/b1/folder1", oldTime)
// Item is older than maxAge, should need processing
if !q.ShouldProcess() {
t.Error("old item should trigger processing")
}
// Clear and add fresh item
q.Clear()
q.Add("/buckets/b1/folder2", time.Now())
// Fresh item should not trigger processing
if q.ShouldProcess() {
t.Error("fresh item should not trigger processing")
}
}
func TestCleanupQueue_Clear(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
now := time.Now()
q.Add("/buckets/b1/folder1", now)
q.Add("/buckets/b1/folder2", now.Add(1*time.Second))
q.Add("/buckets/b1/folder3", now.Add(2*time.Second))
q.Clear()
if q.Len() != 0 {
t.Errorf("expected empty queue after Clear, got len %d", q.Len())
}
if q.Contains("/buckets/b1/folder1") {
t.Error("queue should not contain items after Clear")
}
}
func TestCleanupQueue_OldestAge(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
// Empty queue
if q.OldestAge() != 0 {
t.Error("empty queue should have zero oldest age")
}
// Add item with time in the past
oldTime := time.Now().Add(-5 * time.Minute)
q.Add("/buckets/b1/folder1", oldTime)
// Age should be approximately 5 minutes
age := q.OldestAge()
if age < 4*time.Minute || age > 6*time.Minute {
t.Errorf("expected ~5m age, got %v", age)
}
}
func TestCleanupQueue_TimeOrder(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
baseTime := time.Now()
// Add items in order
items := []string{
"/buckets/b1/a",
"/buckets/b1/b",
"/buckets/b1/c",
"/buckets/b1/d",
"/buckets/b1/e",
}
for i, item := range items {
q.Add(item, baseTime.Add(time.Duration(i)*time.Second))
}
// Pop should return in time order
for i, expected := range items {
got, ok := q.Pop()
if !ok {
t.Errorf("Pop %d: expected item, got empty", i)
}
if got != expected {
t.Errorf("Pop %d: expected %s, got %s", i, expected, got)
}
}
}
func TestCleanupQueue_DuplicateWithNewerTime(t *testing.T) {
q := NewCleanupQueue(100, 10*time.Minute)
baseTime := time.Now()
// Add items
q.Add("/buckets/b1/folder1", baseTime)
q.Add("/buckets/b1/folder2", baseTime.Add(1*time.Second))
q.Add("/buckets/b1/folder3", baseTime.Add(2*time.Second))
// Add duplicate with newer time - should update and reposition
q.Add("/buckets/b1/folder1", baseTime.Add(3*time.Second))
// folder1 should now be at the back (newest time) - verify by popping
expected := []string{"/buckets/b1/folder2", "/buckets/b1/folder3", "/buckets/b1/folder1"}
for i, exp := range expected {
folder, ok := q.Pop()
if !ok || folder != exp {
t.Errorf("at index %d: expected %s, got %s", i, exp, folder)
}
}
}
func TestCleanupQueue_Concurrent(t *testing.T) {
q := NewCleanupQueue(1000, 10*time.Minute)
done := make(chan bool)
now := time.Now()
// Concurrent adds
go func() {
for i := 0; i < 100; i++ {
q.Add("/buckets/b1/folder"+string(rune('A'+i%26)), now.Add(time.Duration(i)*time.Millisecond))
}
done <- true
}()
// Concurrent removes
go func() {
for i := 0; i < 50; i++ {
q.Remove("/buckets/b1/folder" + string(rune('A'+i%26)))
}
done <- true
}()
// Concurrent pops
go func() {
for i := 0; i < 30; i++ {
q.Pop()
}
done <- true
}()
// Concurrent reads
go func() {
for i := 0; i < 100; i++ {
q.Len()
q.Contains("/buckets/b1/folderA")
q.ShouldProcess()
}
done <- true
}()
// Wait for all goroutines
for i := 0; i < 4; i++ {
<-done
}
// Just verify no panic occurred and queue is in consistent state
_ = q.Len()
}