Commit Graph

82 Commits

Author SHA1 Message Date
Konstantin Lebedev
084b377f87 do delete expired entries on s3 list request (#7426)
* do delete expired entries on s3 list request
https://github.com/seaweedfs/seaweedfs/issues/6837

* disable delete expires s3 entry in filer

* pass opt allowDeleteObjectsByTTL to all servers

* delete on get and head

* add lifecycle expiration s3 tests

* fix opt allowDeleteObjectsByTTL for server

* fix test lifecycle expiration

* fix IsExpired

* fix locationPrefix for updateEntriesTTL

* fix s3tests

* resolv  coderabbitai

* GetS3ExpireTime on filer

* go mod

* clear TtlSeconds for volume

* move s3 delete expired entry to filer

* filer delete meta and data

* del unusing func removeExpiredObject

* test s3 put

* test s3 put multipart

* allowDeleteObjectsByTTL by default

* fix pipline tests

* rm dublicate SeaweedFSExpiresS3

* revert expiration tests

* fix updateTTL

* rm log

* resolv comment

* fix delete version object

* fix S3Versioning

* fix delete on FindEntry

* fix delete chunks

* fix sqlite not support concurrent writes/reads

* move deletion out of listing transaction; delete entries and empty folders

* Revert "fix sqlite not support concurrent writes/reads"

This reverts commit 5d5da14e0ed91c613fe5c0ed058f58bb04fba6f0.

* clearer handling on recursive empty directory deletion

* handle listing errors

* strut copying

* reuse code to delete empty folders

* use iterative approach with a queue to avoid recursive WithFilerClient calls

* stop a gRPC stream from the client-side callback is to return a specific error, e.g., io.EOF

* still issue UpdateEntry when the flag must be added

* errors join

* join path

* cleaner

* add context, sort directories by depth (deepest first) to avoid redundant checks

* batched operation, refactoring

* prevent deleting bucket

* constant

* reuse code

* more logging

* refactoring

* s3 TTL time

* Safety check

---------

Co-authored-by: chrislu <chris.lu@gmail.com>
2025-11-05 22:05:54 -08:00
Dmitriy Pavlov
9b6b564235 Filer: Add retry mechanism for failed file deletions (#7402)
* Filer: Add retry mechanism for failed file deletions

Implement a retry queue with exponential backoff for handling transient
deletion failures, particularly when volumes are temporarily read-only.

Key features:
- Automatic retry for retryable errors (read-only volumes, network issues)
- Exponential backoff: 5min → 10min → 20min → ... (max 6 hours)
- Maximum 10 retry attempts per file before giving up
- Separate goroutine processing retry queue every minute
- Enhanced logging with retry/permanent error classification

This addresses the issue where file deletions fail when volumes are
temporarily read-only (tiered volumes, maintenance, etc.) and these
deletions were previously lost.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update weed/filer/filer_deletion.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Filer: Add retry mechanism for failed file deletions

Implement a retry queue with exponential backoff for handling transient
deletion failures, particularly when volumes are temporarily read-only.

Key features:
- Automatic retry for retryable errors (read-only volumes, network issues)
- Exponential backoff: 5min → 10min → 20min → ... (max 6 hours)
- Maximum 10 retry attempts per file before giving up
- Separate goroutine processing retry queue every minute
- Map-based retry queue for O(1) lookups and deletions
- Enhanced logging with retry/permanent error classification
- Consistent error detail limiting (max 10 total errors logged)
- Graceful shutdown support with quit channel for both processors

This addresses the issue where file deletions fail when volumes are
temporarily read-only (tiered volumes, maintenance, etc.) and these
deletions were previously lost.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Filer: Replace magic numbers with named constants in retry processor

Replace hardcoded values with package-level constants for better
maintainability:
- DeletionRetryPollInterval (1 minute): interval for checking retry queue
- DeletionRetryBatchSize (1000): max items to process per iteration

This improves code readability and makes configuration changes easier.

* Filer: Optimize retry queue with min-heap data structure

Replace map-based retry queue with a min-heap for better scalability
and deterministic ordering.

Performance improvements:
- GetReadyItems: O(N) → O(K log N) where K is items retrieved
- AddOrUpdate: O(1) → O(log N) (acceptable trade-off)
- Early exit when checking ready items (heap top is earliest)
- No full iteration over all items while holding lock

Benefits:
- Deterministic processing order (earliest NextRetryAt first)
- Better scalability for large retry queues (thousands of items)
- Reduced lock contention duration
- Memory efficient (no separate slice reconstruction)

Implementation:
- Min-heap ordered by NextRetryAt using container/heap
- Dual index: heap for ordering + map for O(1) FileId lookups
- heap.Fix() used when updating existing items
- Comprehensive complexity documentation in comments

This addresses the performance bottleneck identified in GetReadyItems
where iterating over the entire map with a write lock could block
other goroutines in high-failure scenarios.

* Filer: Modernize heap interface and improve error handling docs

1. Replace interface{} with any in heap methods
   - Addresses modern Go style (Go 1.18+)
   - Improves code readability

2. Enhance isRetryableError documentation
   - Acknowledge string matching brittleness
   - Add comprehensive TODO for future improvements:
     * Use HTTP status codes (503, 429, etc.)
     * Implement structured error types with errors.Is/As
     * Extract gRPC status codes
     * Add error wrapping for better context
   - Document each error pattern with context
   - Add defensive check for empty error strings

Current implementation remains pragmatic for initial release while
documenting a clear path for future robustness improvements. String
matching is acceptable for now but should be replaced with structured
error checking when refactoring the deletion pipeline.

* Filer: Refactor deletion processors for better readability

Extract large callback functions into dedicated private methods to
improve code organization and maintainability.

Changes:
1. Extract processDeletionBatch method
   - Handles deletion of a batch of file IDs
   - Classifies errors (success, not found, retryable, permanent)
   - Manages retry queue additions
   - Consolidates logging logic

2. Extract processRetryBatch method
   - Handles retry attempts for previously failed deletions
   - Processes retry results and updates queue
   - Symmetric to processDeletionBatch for consistency

Benefits:
- Main loop functions (loopProcessingDeletion, loopProcessingDeletionRetry)
  are now concise and focused on orchestration
- Business logic is separated into testable methods
- Reduced nesting depth improves readability
- Easier to understand control flow at a glance
- Better separation of concerns

The refactored methods follow the single responsibility principle,
making the codebase more maintainable and easier to extend.

* Update weed/filer/filer_deletion.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Filer: Fix critical retry count bug and add comprehensive error patterns

Critical bug fixes from PR review:

1. Fix RetryCount reset bug (CRITICAL)
   - Problem: When items are re-queued via AddOrUpdate, RetryCount
     resets to 1, breaking exponential backoff
   - Solution: Add RequeueForRetry() method that preserves retry state
   - Impact: Ensures proper exponential backoff progression

2. Add overflow protection in backoff calculation
   - Check shift amount > 63 to prevent bit-shift overflow
   - Additional safety: check if delay <= 0 or > MaxRetryDelay
   - Protects against arithmetic overflow in extreme cases

3. Expand retryable error patterns
   - Added: timeout, deadline exceeded, context canceled
   - Added: lookup error/failed (volume discovery issues)
   - Added: connection refused, broken pipe (network errors)
   - Added: too many requests, service unavailable (backpressure)
   - Added: temporarily unavailable, try again (transient errors)
   - Added: i/o timeout (network timeouts)

Benefits:
- Retry mechanism now works correctly across restarts
- More robust against edge cases and overflow
- Better coverage of transient failure scenarios
- Improved resilience in high-failure environments

Addresses feedback from CodeRabbit and Gemini Code Assist in PR #7402.

* Filer: Add persistence docs and comprehensive unit tests

Documentation improvements:

1. Document in-memory queue limitation
   - Acknowledge that retry queue is volatile (lost on restart)
   - Document trade-offs and future persistence options
   - Provide clear path for production hardening
   - Note eventual consistency through main deletion queue

Unit test coverage:

1. TestDeletionRetryQueue_AddAndRetrieve
   - Basic add/retrieve operations
   - Verify items not ready before delay elapsed

2. TestDeletionRetryQueue_ExponentialBackoff
   - Verify exponential backoff progression (5m→10m→20m→40m→80m)
   - Validate delay calculations with timing tolerance

3. TestDeletionRetryQueue_OverflowProtection
   - Test high retry counts (60+) that could cause overflow
   - Verify capping at MaxRetryDelay

4. TestDeletionRetryQueue_MaxAttemptsReached
   - Verify items discarded after MaxRetryAttempts
   - Confirm proper queue cleanup

5. TestIsRetryableError
   - Comprehensive error pattern coverage
   - Test all retryable error types (timeout, connection, lookup, etc.)
   - Verify non-retryable errors correctly identified

6. TestDeletionRetryQueue_HeapOrdering
   - Verify min-heap property maintained
   - Test items processed in NextRetryAt order
   - Validate heap.Init() integration

All tests passing. Addresses PR feedback on testing requirements.

* Filer: Add code quality improvements for deletion retry

Address PR feedback with minor optimizations:
- Add MaxLoggedErrorDetails constant (replaces magic number 10)
- Pre-allocate slices and maps in processRetryBatch for efficiency
- Improve log message formatting to use constant

These changes improve code maintainability and runtime performance
without altering functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactoring retrying

* use constant

* assert

* address comment

* refactor

* address comments

* dedup

* process retried deletions

* address comment

* check in-flight items also; dedup code

* refactoring

* refactoring

* simplify

* reset heap

* more efficient

* add DeletionBatchSize as a constant;Permanent > Retryable > Success > Not Found

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: chrislu <chris.lu@gmail.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2025-10-29 18:31:23 -07:00
Aleksey Kosov
4511c2cc1f Changes logging function (#6919)
* updated logging methods for stores

* updated logging methods for stores

* updated logging methods for filer

* updated logging methods for uploader and http_util

* updated logging methods for weed server

---------

Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-06-24 08:44:06 -07:00
FQHSLycopene
ee0c14673d Fix TTL Behavior for Directories in Path-Specific Configuration (#6827) 2025-05-29 02:38:12 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
Aleksey Kosov
43c3e80970 added a check for the nil value when requesting FindEntry. (#6651)
Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-03-21 00:40:28 -07:00
chrislu
42efade0dc adjust fix
fix https://github.com/seaweedfs/seaweedfs/issues/6497
2025-02-01 14:11:57 -08:00
chrislu
a75271dd43 ensure correct auto bucket creation
fix https://github.com/seaweedfs/seaweedfs/issues/6497
2025-02-01 13:26:12 -08:00
chrislu
c030cb3ce9 bootstrap filer from one peer 2024-06-28 14:57:20 -07:00
vadimartynov
8aae82dd71 Added context for the MasterClient's methods to avoid endless loops (#5628)
* Added context for the MasterClient's methods to avoid endless loops

* Returned WithClient function. Added WithClientCustomGetMaster function

* Hid unused ctx arguments

* Using a common context for the KeepConnectedToMaster and WaitUntilConnected functions

* Changed the context termination check in the tryConnectToMaster function

* Added a child context to the tryConnectToMaster function

* Added a common context for KeepConnectedToMaster and WaitUntilConnected functions in benchmark
2024-06-14 11:40:34 -07:00
chrislu
1877ce5126 rename 2024-01-15 21:31:21 -08:00
chrislu
fa59a5d67e read from disk if not in memory 2024-01-15 00:20:12 -08:00
jerebear12
06343f8976 Set allowed origins in config (#5109)
* Add a way to use a JWT in an HTTP only cookie

If a JWT is not included in the Authorization header or a query string, attempt to get a JWT from an HTTP only cookie.

* Added a way to specify allowed origins header from config

* Removed unecessary log

* Check list of domains from config or command flag

* Handle default wildcard and change name of config value to cors
2023-12-20 16:21:11 -08:00
Konstantin Lebedev
1cac5d983d fix: disallow file name too long when writing a file (#4881)
* fix: disallow file name too long when writing a file

* bool LongerName to MaxFilenameLength

---------

Co-authored-by: Konstantin Lebedev <9497591+kmlebedev@users.noreply.github.co>
2023-10-12 14:29:55 -07:00
Nico D'Cotta
796b7508f3 Implement SRV lookups for filer (#4767) 2023-08-24 07:08:56 -07:00
chrislu
6f588b5b18 fix refactoring mistake
fix https://github.com/seaweedfs/seaweedfs/issues/4639
2023-07-16 11:22:48 -07:00
chrislu
61c42f9991 adjust lock APIs 2023-06-25 20:30:20 -07:00
chrislu
06471dac9d init lock ring 2023-06-25 15:28:16 -07:00
chrislu
868f7875d7 refactor 2023-06-25 14:30:58 -07:00
chrislu
3fd659df2a add distributed lock manager 2023-06-25 00:58:21 -07:00
chrislu
5db9fcccd4 refactoring 2023-03-21 23:01:49 -07:00
Neo
b9b613a78e filter system log dir does not make subscribe event (#4172) 2023-01-31 18:56:11 -08:00
Ryan Russell
d54eb9966f refactor: Directory readability (#3665) 2022-09-14 10:11:31 -07:00
chrislu
10efdc7aab align memory 2022-09-01 09:02:44 -07:00
Konstantin Lebedev
e90ab4ac60 avoid race conditions for OnPeerUpdate (#3525)
https://github.com/seaweedfs/seaweedfs/issues/3524
2022-08-26 10:18:49 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
74f60f246f dynamically connect to a filer 2022-07-28 23:24:38 -07:00
chrislu
68065128b8 add dc and rack 2022-07-28 23:22:51 -07:00
chrislu
64f3d6fb6e metadata subscription uses client epoch 2022-07-23 10:50:28 -07:00
chrislu
fa61074513 add clientId logging 2022-07-14 12:27:34 -07:00
chrislu
4fd5f96598 filer: remove replication, collection, disk_type info from entry metadata
these metadata can change and are not used
2022-06-06 00:39:35 -07:00
creeew
02ae102731 fix filer.sync missing source srv uploaded files to target when target down 2022-06-02 01:28:47 +08:00
chrislu
6adc42147f fresh filer store bootstrap from the oldest peer 2022-05-30 21:27:48 -07:00
chrislu
c59068d0f3 refactor 2022-05-30 16:28:36 -07:00
chrislu
682382648e collect cluster node start time 2022-05-30 16:23:52 -07:00
chrislu
94635e9b5c filer: add filer group 2022-05-01 21:59:16 -07:00
chrislu
21e0898631 refactor: change masters from a slice to a map 2022-03-26 13:33:17 -07:00
chrislu
4042fdf3bb rename to skipCheckParentDir
related to https://github.com/chrislusf/seaweedfs/pull/2761

It's better to default to false.
2022-03-16 23:55:31 -07:00
zzq09494
40b0033fa7 go fmt 2022-03-17 14:19:48 +08:00
zzq09494
81cce4b4c3 filer: support uploading file without needEnsureParentDir 2022-03-17 10:53:47 +08:00
zzq09494
a6a8892255 Revert "filer: support uploading file without needEnsureParentDir"
This reverts commit a93c4947ba.
2022-03-17 10:27:17 +08:00
zzq09494
a93c4947ba filer: support uploading file without needEnsureParentDir 2022-03-17 10:18:23 +08:00
chenkai
47c30e3add filer list entries use context to break job 2021-12-28 15:03:41 +08:00
chrislu
9f9ef1340c use streaming mode for long poll grpc calls
streaming mode would create separate grpc connections for each call.
this is to ensure the long poll connections are properly closed.
2021-12-26 00:15:03 -08:00
Chris Lu
4729a57cc0 use constants 2021-11-08 17:47:56 -08:00
Chris Lu
e0fc2898e9 auto updated filer peer list 2021-11-06 14:23:35 -07:00
Chris Lu
5ea86ef1da Revert "master: rename grpc function KeepConnected() to SubscribeVolumeLocationUpdates()"
This reverts commit af71ae11aa.
2021-11-05 17:52:15 -07:00
Chris Lu
af71ae11aa master: rename grpc function KeepConnected() to SubscribeVolumeLocationUpdates() 2021-11-03 01:09:48 -07:00
Chris Lu
2baed2e1e9 avoid possible metadata subscription data loss
Previous implementation append filer logs into one file. So one file is not always sorted, which can lead to miss reading some entries, especially when different filers have different write throughput.
2021-09-25 01:18:44 -07:00
Chris Lu
e5fc35ed0c change server address from string to a type 2021-09-12 22:47:52 -07:00