95 Commits

Author SHA1 Message Date
Chris Lu
cd6832249b Fix volume.fsck crashing on EC volumes and add multi-volume vacuum support (#8406)
* helm: refine openshift-values.yaml to remove hardcoded UIDs

Remove hardcoded runAsUser, runAsGroup, and fsGroup from the
openshift-values.yaml example. This allows OpenShift's admission
controller to automatically assign a valid UID from the namespace's
allocated range, avoiding "forbidden" errors when UID 1000 is
outside the permissible range.

Updates #8381, #8390.

* helm: fix volume.logs and add consistent security context comments

* Update README.md

* fix volume.fsck crashing on EC volumes and add multi-volume vacuum support

* address comments
2026-02-22 22:07:15 -08:00
Chris Lu
a3136c523f Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests (#8306)
* Fix volume.fsck 401 Unauthorized by adding JWT to HTTP delete requests

* Additionally, for performance, consider fetching the jwt.filer_signing.key once before any loops that call httpDelete, rather than inside httpDelete itself, to avoid repeated configuration lookups.
2026-02-11 13:32:56 -08:00
Chris Lu
6a61037333 fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag (#8266)
* fix issue #8230: volume.fsck deletion logic to respect purgeAbsent flag

This commit fixes two issues in volume.fsck:
1. Missing chunks in existing volumes are now deleted if -reallyDeleteFilerEntries is set.
2. Missing volumes are now properly handled when a -volumeId filter is specified, allowing deletion of filer entries for those volumes.

* address PR feedback for issue #8230

- Ensure volume filter is applied before reporting missing volumes
- Fix potential nil-pointer dereferences in httpDelete method
- Use proper error checking throughout httpDelete

* address second round PR feedback for issue #8230

- Use fmt.Fprintf(c.writer, ...) instead of fmt.Printf
- Add missing newline in "deleting path" log message
2026-02-09 13:23:17 -08:00
Chris Lu
94e0b902f9 shell: update fs.verify and volume.fsck for new BFS signature
Updated dependent commands to match the refactored
doTraverseBfsAndSaving signature and use context for channel sends.
2026-01-29 14:42:10 -08:00
Jaehoon Kim
f2e7af257d Fix volume.fsck -forcePurging -reallyDeleteFromVolume to fail fast on filer traversal errors (#8015)
* Add TraverseBfsWithContext and fix race conditions in error handling

- Add TraverseBfsWithContext function to support context cancellation
- Fix race condition in doTraverseBfsAndSaving using atomic.Bool and sync.Once
- Improve error handling with fail-fast behavior and proper error propagation
- Update command_volume_fsck to use error-returning saveFn callback
- Enhance error messages in readFilerFileIdFile with detailed context

* refactoring

* fix error format

* atomic

* filer_pb: make enqueue return void

* shell: simplify fs.meta.save error handling

* filer_pb: handle enqueue return value

* Revert "atomic"

This reverts commit 712648bc354b186d6654fdb8a46fd4848fdc4e00.

* shell: refine fs.meta.save logic

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 21:37:50 -08:00
Chris Lu
93cca3a96b volume.fsck: increase default cutoffTimeAgo from 5 minutes to 5 hours (#7730)
* volume.fsck: increase default cutoffTimeAgo from 5 minutes to 5 hours

This change makes the fsck check more conservative by only considering
chunks older than 5 hours as potential orphans. A 5 minute window was
too aggressive and could incorrectly flag recently written chunks,
especially in busy systems or during backup operations.

Addresses #7649

* Update command_volume_fsck.go

* volume.fsck: add help text explaining cutoffTimeAgo parameter

* Update command_volume_fsck.go
2025-12-12 23:42:27 -08:00
Chris Lu
6a8c53bc44 Filer: batch deletion operations to return individual error results (#7382)
* batch deletion operations to return individual error results

Modify batch deletion operations to return individual error results instead of one aggregated error, enabling better tracking of which specific files failed to delete (helping reduce orphan file issues).

* Simplified logging logic

* Optimized nested loop

* handles the edge case where the RPC succeeds but connection cleanup fails

* simplify

* simplify

* ignore 'not found' errors here
2025-10-25 00:09:18 -07:00
Yavor Konstantinov
832df5265f Fix 'NaN%' issue when running volume.fsck (#7368)
* Fix 'NaN%' issue when running volume.fsck

- Running `volume.fsck` on an empty cluster will display 'NaN%'.

* Refactor

- Extract cound of orphan chunks in summary to new var.
- Restore handling for 'NaN' for individual volumes. Its not necessary
  because the check is already done.

* Make code more idiomatic
2025-10-23 21:44:19 -07:00
Chris Lu
97f3028782 Clean up logs and deprecated functions (#7339)
* less logs

* fix deprecated grpc.Dial
2025-10-17 22:11:50 -07:00
Chris Lu
69553e5ba6 convert error fromating to %w everywhere (#6995) 2025-07-16 23:39:27 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
Lisandro Pin
0d5393641e Unify usage of shell.EcNode.dc as DataCenterId. (#6258) 2024-11-19 06:33:18 -08:00
chrislu
20929f2a57 adjust resource heavy for volume.fix.replication 2024-09-29 11:32:18 -07:00
chrislu
6564ceda91 skip resource heavy commands from running on master nodes 2024-09-29 10:51:17 -07:00
chrislu
ec30a504ba refactor 2024-09-29 10:38:22 -07:00
chrislu
701abbb9df add IsResourceHeavy() to command interface 2024-09-28 20:23:01 -07:00
Max Denushev
d056c0ddf2 fix(volume): don't persist RO state in specific cases (#6058)
* fix(volume): don't persist RO state in specific cases

* fix(volume): writable always persist
2024-09-24 16:15:54 -07:00
chrislu
8378a5b70b rename 2024-08-01 23:54:42 -07:00
wyang
31b89c1062 fsck: only check the appendNs of deleted needle (#5841)
increase fsck speed

Co-authored-by: Yang Wang <yangwang@weride.ai>
2024-07-31 01:12:57 -07:00
vadimartynov
86d92a42b4 Added tls for http clients (#5766)
* Added global http client

* Added Do func for global http client

* Changed the code to use the global http client

* Fix http client in volume uploader

* Fixed pkg name

* Fixed http util funcs

* Fixed http client for bench_filer_upload

* Fixed http client for stress_filer_upload

* Fixed http client for filer_server_handlers_proxy

* Fixed http client for command_fs_merge_volumes

* Fixed http client for command_fs_merge_volumes and command_volume_fsck

* Fixed http client for s3api_server

* Added init global client for main funcs

* Rename global_client to client

* Changed:
- fixed NewHttpClient;
- added CheckIsHttpsClientEnabled func
- updated security.toml in scaffold

* Reduce the visibility of some functions in the util/http/client pkg

* Added the loadSecurityConfig function

* Use util.LoadSecurityConfiguration() in NewHttpClient func
2024-07-16 23:14:09 -07:00
Taehyung Lim
4744889973 fix issue: sometimes volume.fsck report 'volume not found' (#5537)
* fix issue: sometimes volume.fsck report 'volume not found' when a volume server has multiple disk types

* rename variable

* adjust counters

---------

Co-authored-by: chrislu <chris.lu@gmail.com>
2024-06-11 22:22:57 -07:00
NyaMisty
579ebbdf60 Support concurrent volume.fsck & support disabling -cutoffTimeAgo to improve speed (#5636) 2024-06-02 14:25:42 -07:00
Seyed Mahdi Sadegh Shobeiri
97236389e8 Add modifyTimeAgo to volume.fsck (#5133)
* Add modifyTimeAgo to volume.fsck

* Fix AppendAtNs
2023-12-23 12:17:30 -08:00
Seyed Mahdi Sadegh Shobeiri
54ba2c8868 Fix cutoffTimeAgo in findMissingChunksInFiler (#5132) 2023-12-23 09:18:16 -08:00
zemul
0bf56298d5 fix chunk.ModifiedTsNs (#4264)
* fix

* fix mtime s > ns

---------

Co-authored-by: zemul <zhouzemiao@ihuman.com>
2023-03-02 08:24:36 -08:00
Zachary Walters
ef2f741823 Updated the deprecated ioutil dependency (#4239) 2023-02-21 19:47:33 -08:00
chrislu
e037c71ec3 adjust text 2023-02-10 13:04:29 -08:00
chrislu
67b8c2853a add line return 2023-02-10 12:53:43 -08:00
Chris Lu
d4566d4aaa more solid weed mount (#4089)
* compare chunks by timestamp

* fix slab clearing error

* fix test compilation

* move oldest chunk to sealed, instead of by fullness

* lock on fh.entryViewCache

* remove verbose logs

* revert slat clearing

* less logs

* less logs

* track write and read by timestamp

* remove useless logic

* add entry lock on file handle release

* use mem chunk only, swap file chunk has problems

* comment out code that maybe used later

* add debug mode to compare data read and write

* more efficient readResolvedChunks with linked list

* small optimization

* fix test compilation

* minor fix on writer

* add SeparateGarbageChunks

* group chunks into sections

* turn off debug mode

* fix tests

* fix tests

* tmp enable swap file chunk

* Revert "tmp enable swap file chunk"

This reverts commit 985137ec472924e4815f258189f6ca9f2168a0a7.

* simple refactoring

* simple refactoring

* do not re-use swap file chunk. Sealed chunks should not be re-used.

* comment out debugging facilities

* either mem chunk or swap file chunk is fine now

* remove orderedMutex  as *semaphore.Weighted

not found impactful

* optimize size calculation for changing large files

* optimize performance to avoid going through the long list of chunks

* still problems with swap file chunk

* rename

* tiny optimization

* swap file chunk save only successfully read data

* fix

* enable both mem and swap file chunk

* resolve chunks with range

* rename

* fix chunk interval list

* also change file handle chunk group when adding chunks

* pick in-active chunk with time-decayed counter

* fix compilation

* avoid nil with empty fh.entry

* refactoring

* rename

* rename

* refactor visible intervals to *list.List

* refactor chunkViews to *list.List

* add IntervalList for generic interval list

* change visible interval to use IntervalList in generics

* cahnge chunkViews to *IntervalList[*ChunkView]

* use NewFileChunkSection to create

* rename variables

* refactor

* fix renaming leftover

* renaming

* renaming

* add insert interval

* interval list adds lock

* incrementally add chunks to readers

Fixes:
1. set start and stop offset for the value object
2. clone the value object
3. use pointer instead of copy-by-value when passing to interval.Value
4. use insert interval since adding chunk could be out of order

* fix tests compilation

* fix tests compilation
2023-01-02 23:20:45 -08:00
chrislu
70a4c98b00 refactor filer_pb.Entry and filer.Entry to use GetChunks()
for later locking on reading chunks
2022-11-15 06:33:36 -08:00
Konstantin Lebedev
0999f9b7ff [volume.fsck] collect ids without cut off time for finding missing data from volumes (#3934)
collect all file ids from the file without cut off time for finding missing data from volumes
2022-10-31 11:38:12 -07:00
Konstantin Lebedev
a322ba042e [volume.fsck] param volumeId is comma separated the volume id (#3933)
volume.fsck param volumeId is comma separated the volume id

Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2022-10-31 11:36:26 -07:00
Konstantin Lebedev
c0deaa4948 [volume.fsck] check needles status from volume server (#3926)
check needles status from volume server
2022-10-31 11:33:04 -07:00
Konstantin Lebedev
bf8a9d2db1 [volume.chek.disk] sync of deletions the fix (#3923)
* sync of deletions the fix

* avoid return if only partiallyDeletedNeedles

* refactor sync deletions
2022-10-30 20:32:46 -07:00
chrislu
ea2637734a refactor filer proto chunk variable from mtime to modified_ts_ns 2022-10-28 12:53:19 -07:00
Eric Yang
51d462f204 ADHOC: volume fsck using append at ns (#3906)
* ADHOC: volume fsck using append at ns

* nit

* nit

Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>
2022-10-24 22:09:38 -07:00
chrislu
377870f4a9 keep system log data 2022-10-24 16:50:39 -07:00
Konstantin Lebedev
7836f7574e [volume.fsck] hotfix apply purging and add option verifyNeedle #3860 (#3861)
* fix apply purging and add verifyNeedle

* common readSourceNeedleBlob

* use consts
2022-10-15 20:38:46 -07:00
Konstantin Lebedev
f19c9e3d9d Volume fsck by volume (#3851)
* refactor

* refactor args verbose and writer

* refactor readFilerFileIdFile

* fix filter by collectMtime

* skip system log collection
2022-10-13 23:30:30 -07:00
Eric Yang
56c94cc08e ADHOC: filter deleted files from idx file binary search (#3763)
* ADHOC: filter deleted files from idx file binary search

* remove unwanted check

Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>
2022-09-29 12:48:36 -07:00
chrislu
b6d7556dda skip truncation on error
fix https://github.com/seaweedfs/seaweedfs/issues/3746
2022-09-27 09:48:23 -07:00
Eric Yang
ddd6bee970 ADHOC: Volume fsck use a time cutoff param (#3626)
* ADHOC: cut off volumn fsck

* more

* fix typo

* add test

* modify name

* fix comment

* fix comments

* nit

* fix typo

* Update weed/shell/command_volume_fsck.go

Co-authored-by: root <root@HQ-10MSTD3EY.roblox.local>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2022-09-10 15:29:17 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
271b5aed96 shell: volume.fsck add a note for -reallyDeleteFromVolume option 2022-05-15 11:07:04 -07:00
Konstantin Lebedev
d4343ab7da forcePurging desc 2022-04-25 23:11:56 +05:00
Konstantin Lebedev
ae56b2c00f change forcePurging to a pointer 2022-04-25 23:10:01 +05:00
Konstantin Lebedev
6d2fda27d2 delete missing data from volumes in one replica 2022-04-25 22:59:46 +05:00
Konstantin Lebedev
7f1383a41e findExtraChunksInVolumeServers in consideration of replication 2022-04-01 14:45:41 +05:00
Konstantin Lebedev
3817e05dd0 fix collect filer files 2022-04-01 10:17:09 +05:00
Konstantin Lebedev
3cedb21bb7 skip new entities 2022-03-31 21:36:10 +05:00