Commit Graph

15 Commits

Author SHA1 Message Date
Chris Lu
39ba19eea6 filer: async empty folder cleanup via metadata events (#7614)
* filer: async empty folder cleanup via metadata events

Implements asynchronous empty folder cleanup when files are deleted in S3.

Key changes:

1. EmptyFolderCleaner - New component that handles folder cleanup:
   - Uses consistent hashing (LockRing) to determine folder ownership
   - Each filer owns specific folders, avoiding duplicate cleanup work
   - Debounces delete events (10s delay) to batch multiple deletes
   - Caches rough folder counts to skip unnecessary checks
   - Cancels pending cleanup when new files are created
   - Handles both file and subdirectory deletions

2. Integration with metadata events:
   - Listens to both local and remote filer metadata events
   - Processes create/delete/rename events to track folder state
   - Only processes folders under /buckets/<bucket>/...

3. Removed synchronous empty folder cleanup from S3 handlers:
   - DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
   - DeleteMultipleObjectsHandler no longer tracks/cleans directories
   - Cleanup now happens asynchronously via metadata events

Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically

* filer: add CleanupQueue data structure for deduplicated folder cleanup

CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)

Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection

* filer: use CleanupQueue for empty folder cleanup

Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup

Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios

* filer: CleanupQueue.Add moves duplicate to back with updated time

When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time

This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.

* filer: CleanupQueue uses event time and inserts in sorted order

Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)

This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.

* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)

Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order

Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging

* filer: initialize cache entry on first delete/create event

Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.

Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.

* filer: skip adding to cleanup queue if roughCount > 0

If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.

* filer: don't create cache entry on create event

Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.

* filer: move empty folder cleanup to its own package

- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions

* filer: make isUnderPath and isUnderBucketPath private

These helpers are only used within the empty_folder_cleanup package.
2025-12-03 21:12:19 -08:00
tam-i13
b669607fcd Add error list each entry func (#7485)
* added error return in type ListEachEntryFunc

* return error if errClose

* fix fmt.Errorf

* fix return errClose

* use %w fmt.Errorf

* added entry in messege error

* add callbackErr in ListDirectoryEntries

* fix error

* add log

* clear err when the scanner stops on io.EOF, so returning err doesn’t surface EOF as a failure.

* more info in error

* add ctx to logs, error handling

* fix return eachEntryFunc

* fix

* fix log

* fix return

* fix foundationdb test s

* fix eachEntryFunc

* fix return resEachEntryFuncErr

* Update weed/filer/filer.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/filer/elastic/v7/elastic_store.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/filer/hbase/hbase_store.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/filer/foundationdb/foundationdb_store.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/filer/ydb/ydb_store.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix

* add scanErr

---------

Co-authored-by: Roman Tamarov <r.tamarov@kryptonite.ru>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-25 19:35:19 -08:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
Chris Lu
0a856241fe avoid int bigger than math.MaxInt32
fix https://github.com/chrislusf/seaweedfs/issues/2363
2021-10-07 21:12:57 -07:00
Chris Lu
899963ac20 remote storage location changed to struct 2021-07-29 02:08:55 -07:00
Chris Lu
55a8f57381 go fmt 2021-05-06 03:37:51 -07:00
Chris Lu
79f2e780c1 ensure name pattern checking is case sensitive 2021-04-24 11:51:23 -07:00
Chris Lu
ddc8643ee0 filer: directory listing adds namePatternExclude
fix https://github.com/chrislusf/seaweedfs/issues/2023
2021-04-24 11:49:03 -07:00
Chris Lu
09f49d1c04 refactoring 2021-01-16 19:52:15 -08:00
Chris Lu
a4063a5437 add stream list directory entries 2021-01-15 23:56:24 -08:00
Chris Lu
16ad74f477 go fmt 2021-01-14 23:11:27 -08:00
Chris Lu
f002e668de change limit to int64 in case of overflow 2021-01-14 23:10:37 -08:00
Chris Lu
2c3c2c27d7 separate prefix from namePattern
fix https://github.com/chrislusf/seaweedfs/issues/1722
2021-01-01 20:23:23 -08:00
Chris Lu
da87f6b265 remove unused code 2020-12-26 15:21:12 -08:00
Chris Lu
0a7c5f85a9 filer: add namePattern to search in current folder 2020-12-26 15:05:31 -08:00