Commit Graph

126 Commits

Author SHA1 Message Date
Chris Lu
4c88fbfd5e Fix nil pointer crash during concurrent vacuum compaction (#8592)
* check for nil needle map before compaction sync

When CommitCompact runs concurrently, it sets v.nm = nil under
dataFileAccessLock. CompactByIndex does not hold that lock, so
v.nm.Sync() can hit a nil pointer. Add an early nil check to
return an error instead of crashing.

Fixes #8591

* guard copyDataBasedOnIndexFile size check against nil needle map

The post-compaction size validation at line 538 accesses
v.nm.ContentSize() and v.nm.DeletedSize(). If CommitCompact has
concurrently set v.nm to nil, this causes a SIGSEGV. Skip the
validation when v.nm is nil since the actual data copy uses local
needle maps (oldNm/newNm) and is unaffected.

Fixes #8591

* use atomic.Bool for compaction flags to prevent concurrent vacuum races

The isCompacting and isCommitCompacting flags were plain bools
read and written from multiple goroutines without synchronization.
This allowed concurrent vacuums on the same volume to pass the
guard checks and run simultaneously, leading to the nil pointer
crash. Using atomic.Bool with CompareAndSwap ensures only one
compaction or commit can run per volume at a time.

Fixes #8591

* use go-version-file in CI workflows instead of hardcoded versions

Use go-version-file: 'go.mod' so CI automatically picks up the Go
version from go.mod, avoiding future version drift. Reordered
checkout before setup-go in go.yml and e2e.yml so go.mod is
available. Removed the now-unused GO_VERSION env vars.

* capture v.nm locally in CompactByIndex to close TOCTOU race

A bare nil check on v.nm followed by v.nm.Sync() has a race window
where CommitCompact can set v.nm = nil between the two. Snapshot
the pointer into a local variable so the nil check and Sync operate
on the same reference.

* add dynamic timeouts to plugin worker vacuum gRPC calls

All vacuum gRPC calls used context.Background() with no deadline,
so the plugin scheduler's execution timeout could kill a job while
a large volume compact was still in progress. Use volume-size-scaled
timeouts matching the topology vacuum approach: 3 min/GB for compact,
1 min/GB for check, commit, and cleanup.

Fixes #8591

* Revert "add dynamic timeouts to plugin worker vacuum gRPC calls"

This reverts commit 80951934c37416bc4f6c1472a5d3f8d204a637d9.

* unify compaction lifecycle into single atomic flag

Replace separate isCompacting and isCommitCompacting flags with a
single isCompactionInProgress atomic.Bool. This ensures CompactBy*,
CommitCompact, Close, and Destroy are mutually exclusive — only one
can run at a time per volume.

Key changes:
- All entry points use CompareAndSwap(false, true) to claim exclusive
  access. CompactByVolumeData and CompactByIndex now also guard v.nm
  and v.DataBackend with local captures.
- Close() waits for the flag outside dataFileAccessLock to avoid
  deadlocking with CommitCompact (which holds the flag while waiting
  for the lock). It claims the flag before acquiring the lock so no
  new compaction can start.
- Destroy() uses CAS instead of a racy Load check, preventing
  concurrent compaction from racing with volume teardown.
- unmountVolumeByCollection no longer deletes from the map;
  DeleteCollectionFromDiskLocation removes entries only after
  successful Destroy, preventing orphaned volumes on failure.

Fixes #8591
2026-03-10 13:31:45 -07:00
Lisandro Pin
11fdb68281 Fix superblock write error checks on volume compaction. (#8352) 2026-02-16 14:44:37 -08:00
Lisandro Pin
0721e3c1e9 Rework volume compaction (a.k.a vacuuming) logic to cleanly support new parameters. (#8337)
We'll leverage on this to support a "ignore broken needles" option, necessary
to properly recover damaged volumes, as described in
https://github.com/seaweedfs/seaweedfs/issues/7442#issuecomment-3897784283 .
2026-02-16 02:15:14 -08:00
Chris Lu
330ba7d9dc Fix disk errors handling in vacuum compaction (#8244)
When a disk reports IO errors during vacuum compaction (e.g., 'read /mnt/d1/weed/oc_xyz.dat: input/output error'), the vacuum task should signal the error to the master so it can:
1. Drop the faulty volume replica
2. Rebuild the replica from healthy copies

Changes:
- Add checkReadWriteError() calls in vacuum read paths (ReadNeedleBlob, ReadData, ScanVolumeFile) to flag EIO errors in volume.lastIoError
- Preserve error wrapping using %w format instead of %v so EIO propagates correctly
- The existing heartbeat logic will detect lastIoError and remove the bad volume

Fixes issue #8237
2026-02-07 21:33:02 -08:00
Konstantin Lebedev
d19eca71eb [master] vaccum fix warn (#7312) 2025-10-28 17:57:51 -07:00
chrislu
2f1b3d68d7 pass volume version when creating a volume 2025-06-19 01:15:25 -07:00
chrislu
e8462ba3ad prevent compacting on the same volume 2024-08-18 12:08:43 -07:00
Konstantin Lebedev
d389c5b27e fix: recreate index include deleted files (#5579)
* fix: recreate index include deleted files
https://github.com/seaweedfs/seaweedfs/issues/5508

* fix: counting the number of files

* fix: log
2024-05-12 11:31:34 -07:00
steve.wei
0bdf121e51 rename VolumeServerVolumeGauge (#5504) 2024-04-17 04:49:50 -07:00
Konstantin Lebedev
7187346cc1 avoid unexpected compact size (#5272)
https://github.com/seaweedfs/seaweedfs/issues/5215
2024-02-24 05:27:35 -08:00
chrislu
6ebe26a765 Revert "Revert "Revert "Add disk type to prometheus metrics" (#4777)""
This reverts commit 567d788928.
2023-10-03 08:28:52 -07:00
chrislu
567d788928 Revert "Revert "Add disk type to prometheus metrics" (#4777)"
This reverts commit 9215ba24be.
2023-10-02 11:49:54 -07:00
Konstantin Lebedev
1f7e52c63e vacuum metrics and force sync dst files (#3832) 2022-10-13 00:51:20 -07:00
Guo Lei
f95c25e113 types packages is imported more than onece (#3838) 2022-10-12 22:59:07 -07:00
Guo Lei
84c401e693 Optimiz leveldb metric (#3830)
* optimiz updating mapmetric for leveldb

* import loading leveldb

* add comments
2022-10-11 21:13:25 -07:00
Ryan Russell
277976bd76 refactor(storage): readability improvements (#3703)
Signed-off-by: Ryan Russell <git@ryanrussell.org>

Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-09-16 02:43:17 -07:00
Guo Lei
c57c79a0ab optimiz commitig compact (#3388)
* optimiz vacuuming volume

* fix bugx

* rename parameters

* fix conflict

* change copyDataBasedOnIndexFile to an instance method

* close needlemap

* optimiz commiting Vacuum volume for  leveldb index

* fix bugs

* fix leveldb loading bugs

* refactor

* fix leveldb loading bug

* add leveldb recovery

* add test case for levelDB

* modify test case to cover all the new branches

* use one tmpNm instead of two instances

* refactor

* refactor

* move setWatermark to the end

* add test for watermark and updating leveldb

* fix error logic

* refactor, add test

* check nil before close needlemapeer
add test case
fix metric bug

* add tests, fix bugs

* adjust log level
remove wrong test case
refactor

* avoid duplicate  updating metric for leveldb index
2022-08-23 23:53:35 -07:00
Konstantin Lebedev
3c75479e2b Merge branch 'master' into gentle_vacuum
# Conflicts:
#	weed/pb/messaging_pb/messaging.pb.go
#	weed/pb/messaging_pb/messaging_grpc.pb.go
#	weed/pb/s3_pb/s3.pb.go
#	weed/pb/volume_server_pb/volume_server.pb.go
#	weed/server/volume_grpc_vacuum.go
2022-08-01 14:45:22 +05:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
Konstantin Lebedev
2f0dda384d vacuum show LA 2022-07-29 11:59:33 +05:00
guosj
cc7a4b0a6e correct comment 2022-07-25 11:46:41 +08:00
chrislu
06a8b174b5 also remove Sync() for idx file 2022-06-30 13:50:53 -07:00
chrislu
81d6159290 volume: report error if a volume has nil data backend
fix https://github.com/chrislusf/seaweedfs/issues/3105
2022-05-29 16:59:30 -07:00
chrislu
37ab8909b0 use two flags: v.isCompacting and v.isCommitCompacting 2022-04-26 23:28:34 -07:00
Konstantin Lebedev
1e35b4929f shell vacuum volume by collection and volume id 2022-04-18 18:40:58 +05:00
Chris Lu
97a4b66df7 Merge pull request #2704 from guo-sj/fix_bugs_in_return_value
fix return value in storage/volume_vacuum.go:444
2022-02-24 00:49:29 -08:00
guosj
3e7aa1caf5 fix return value in storage/volume_vacuum.go:444 2022-02-24 15:54:36 +08:00
chrislu
c29bc9a367 fix error handling 2022-02-23 15:34:25 -08:00
guosj
d68c27f82d fix another return value bug 2022-02-23 16:21:25 +08:00
guosj
8f9aa0cddd fix bugs in return value 2022-02-23 16:17:48 +08:00
Konstantin Lebedev
ef541972f8 updated needle with fsync 2022-02-08 00:10:53 +05:00
Chris Lu
3be3c17f59 volume vacuum: avoid timeout with streaming progress report
fix https://github.com/chrislusf/seaweedfs/issues/2396
2021-10-24 01:55:34 -07:00
Chris Lu
1aa7e99a89 skip file not found error when deleting 2021-05-15 09:37:39 -07:00
Chris Lu
400de380f4 volume server: support tcp direct put/get/delete 2021-03-05 02:29:38 -08:00
Chris Lu
7635f6b9fa disk file avoid file.Stat() 2021-02-20 20:06:06 -08:00
bingoohuang
7256902fb0 fix typo offset.ToAcutalOffset to offset.ToActualOffset 2021-02-07 12:11:51 +08:00
Chris Lu
f3bb645018 file open error 2020-12-01 23:37:49 -08:00
Chris Lu
dc0bc48257 return file open error 2020-12-01 23:36:49 -08:00
Chris Lu
6d30b21b10 volume: add "-dir.idx" option for separate index storage
fix https://github.com/chrislusf/seaweedfs/issues/1265
2020-11-27 03:17:10 -08:00
Chris Lu
f2723c1bc8 do not idx file format
revert c9ab8d05fa
2020-09-12 12:42:36 -07:00
Chris Lu
c9ab8d05fa fixes for reading deleted fid 2020-09-10 14:42:52 -07:00
Chris Lu
6ccd7f0a4d refactoring 2020-08-18 18:01:37 -07:00
Chris Lu
ee11d98650 refactoring 2020-08-18 17:35:19 -07:00
Chris Lu
6a92f0bc7a refactoring to typed Size
Go is amazing with refactoring!
2020-08-18 17:04:28 -07:00
Chris Lu
6190fd665d printout error 2020-04-27 12:41:31 -07:00
Chris Lu
c8ca234773 refactoring 2020-04-11 14:27:25 -07:00
Chris Lu
df9d538044 rename function 2020-04-11 14:19:44 -07:00
Chris Lu
cbfe31a9a8 idx file sync before compaction 2020-03-20 23:38:46 -07:00
Chris Lu
81797a059a volume: sync volume file right before compaction
fix https://github.com/chrislusf/seaweedfs/issues/1237
2020-03-19 23:54:52 -07:00
Chris Lu
d439d83772 volume: follow compactionBytePerSecond
related to https://github.com/chrislusf/seaweedfs/issues/1108
2020-03-11 10:32:17 -07:00