68 Commits

Author SHA1 Message Date
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
e1e5b4a8a6 add admin script worker (#8491)
* admin: add plugin lock coordination

* shell: allow bypassing lock checks

* plugin worker: add admin script handler

* mini: include admin_script in plugin defaults

* admin script UI: drop name and enlarge text

* admin script: add default script

* admin_script: make run interval configurable

* plugin: gate other jobs during admin_script runs

* plugin: use last completed admin_script run

* admin: backfill plugin config defaults

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* comparable to default version

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* default to run

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* format

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: respect pre-set noLock for fix.replication

* shell: add force no-lock mode for admin scripts

* volume balance worker already exists

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* admin: expose scheduler status JSON

* shell: add sleep command

* shell: restrict sleep syntax

* Revert "shell: respect pre-set noLock for fix.replication"

This reverts commit 2b14e8b82602a740d3a473c085e3b3a14f1ddbb3.

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* fix import

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* less logs

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* Reduce master client logs on canceled contexts

* Update mini default job type count

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-03 15:10:40 -08:00
Konstantin Lebedev
01b3125815 [shell]: volume balance capacity by min volume density (#8026)
volume balance by min volume density and active volumes
2026-02-19 13:30:59 -08:00
Chris Lu
d745e6e41d Fix masterclient vidmap race condition (#7412)
* fallback to check master

* clean up

* parsing

* refactor

* handle parse error

* return error

* avoid dup lookup

* use batch key

* dedup lookup logic

* address comments

* errors.Join(lookupErrors...)

* add a comment

* Fix: Critical data race in MasterClient vidMap

Fixes a critical data race where resetVidMap() was writing to the vidMap
pointer while other methods were reading it concurrently without synchronization.

Changes:
- Removed embedded *vidMap from MasterClient
- Added vidMapLock (sync.RWMutex) to protect vidMap pointer access
- Created safe accessor methods (GetLocations, GetDataCenter, etc.)
- Updated all direct vidMap accesses to use thread-safe methods
- Updated resetVidMap() to acquire write lock during pointer swap

The vidMap already has internal locking for its operations, but this fix
protects the vidMap pointer itself from concurrent read/write races.

Verified with: go test -race ./weed/wdclient/...

Impact:
- Prevents potential panics from concurrent pointer access
- No performance impact - uses RWMutex for read-heavy workloads
- Maintains backward compatibility through wrapper methods

* fmt

* Fix: Critical data race in MasterClient vidMap

Fixes a critical data race where resetVidMap() was writing to the vidMap
pointer while other methods were reading it concurrently without synchronization.

Changes:
- Removed embedded *vidMap from MasterClient struct
- Added vidMapLock (sync.RWMutex) to protect vidMap pointer access
- Created minimal public accessor methods for external packages:
  * GetLocations, GetLocationsClone, GetVidLocations
  * LookupFileId, LookupVolumeServerUrl
  * GetDataCenter
- Internal code directly locks and accesses vidMap (no extra indirection)
- Updated resetVidMap() to acquire write lock during pointer swap
- Updated shell/commands.go to use GetDataCenter() method

Design Philosophy:
- vidMap already has internal locking for its map operations
- This fix specifically protects the vidMap *pointer* from concurrent access
- Public methods for external callers, direct locking for internal use
- Minimizes wrapper overhead while maintaining thread safety

Verified with: go test -race ./weed/wdclient/... (passes)

Impact:
- Prevents potential panics/crashes from data races
- Minimal performance impact (RWMutex for read-heavy workload)
- Maintains full backward compatibility

* fix more concurrent access

* reduce lock scope

* Optimize vidMap locking for better concurrency

Improved locking strategy based on the understanding that:
- vidMapLock protects the vidMap pointer from concurrent swaps
- vidMap has internal locks that protect its data structures

Changes:
1. Read operations: Grab pointer with RLock, release immediately, then operate
   - Reduces lock hold time
   - Allows resetVidMap to proceed sooner
   - Methods: GetLocations, GetLocationsClone, GetVidLocations,
     LookupVolumeServerUrl, GetDataCenter

2. Write operations: Changed from Lock() to RLock()
   - RLock prevents pointer swap during operation
   - Allows concurrent readers and other writers (serialized by vidMap's lock)
   - Methods: addLocation, deleteLocation, addEcLocation, deleteEcLocation

Benefits:
- Significantly reduced lock contention
- Better concurrent performance under load
- Still prevents all race conditions

Verified with: go test -race ./weed/wdclient/... (passes)

* Further reduce lock contention in LookupVolumeIdsWithFallback

Optimized two loops that were holding RLock for extended periods:

Before:
- Held RLock during entire loop iteration
- Included string parsing and cache lookups
- Could block resetVidMap for significant time with large batches

After:
- Grab vidMap pointer with brief RLock
- Release lock immediately
- Perform all loop operations on local pointer

Impact:
- First loop: Cache check on initial volumeIds
- Second loop: Double-check after singleflight wait

Benefits:
- Minimal lock hold time (just pointer copy)
- resetVidMap no longer blocked by long loops
- Better concurrent performance with large volume ID lists
- Still thread-safe (vidMap methods have internal locks)

Verified with: go test -race ./weed/wdclient/... (passes)

* Add clarifying comments to vidMap helper functions

Added inline documentation to each helper function (addLocation, deleteLocation,
addEcLocation, deleteEcLocation) explaining the two-level locking strategy:

- RLock on vidMapLock prevents resetVidMap from swapping the pointer
- vidMap has internal locks that protect the actual map mutations
- This design provides optimal concurrency

The comments make it clear why RLock (not Lock) is correct and intentional,
preventing future confusion about the locking strategy.

* Improve encapsulation: Add shallowClone() method to vidMap

Added a shallowClone() method to vidMap to improve encapsulation and prevent
MasterClient from directly accessing vidMap's internal fields.

Changes:
1. Added vidMap.shallowClone() in vid_map.go
   - Encapsulates the shallow copy logic within vidMap
   - Makes vidMap responsible for its own state representation
   - Documented that caller is responsible for thread safety

2. Simplified resetVidMap() in masterclient.go
   - Uses tail := mc.vidMap.shallowClone() instead of manual field access
   - Cleaner, more maintainable code
   - Better adherence to encapsulation principles

Benefits:
- Improved code organization and maintainability
- vidMap internals are now properly encapsulated
- Easier to modify vidMap structure in the future
- More self-documenting code

Verified with: go test -race ./weed/wdclient/... (passes)

* Optimize locking: Reduce lock acquisitions and use helper methods

Two optimizations to further reduce lock contention and improve code consistency:

1. LookupFileIdWithFallback: Eliminated redundant lock acquisition
   - Before: Two separate locks to get vidMap and dataCenter
   - After: Single lock gets both values together
   - Benefit: 50% reduction in lock/unlock overhead for this hot path

2. KeepConnected: Use GetDataCenter() helper for consistency
   - Before: Manual lock/unlock to access DataCenter field
   - After: Use existing GetDataCenter() helper method
   - Benefit: Better encapsulation and code consistency

Impact:
- Reduced lock contention in high-traffic lookup path
- More consistent use of accessor methods throughout codebase
- Cleaner, more maintainable code

Verified with: go test -race ./weed/wdclient/... (passes)

* Refactor: Extract common locking patterns into helper methods

Eliminated code duplication by introducing two helper methods that encapsulate
the common locking patterns used throughout MasterClient:

1. getStableVidMap() - For read operations
   - Acquires lock, gets pointer, releases immediately
   - Returns stable snapshot for thread-safe reads
   - Used by: GetLocations, GetLocationsClone, GetVidLocations,
     LookupFileId, LookupVolumeServerUrl, GetDataCenter

2. withCurrentVidMap(f func(vm *vidMap)) - For write operations
   - Holds RLock during callback execution
   - Prevents pointer swap while allowing concurrent operations
   - Used by: addLocation, deleteLocation, addEcLocation, deleteEcLocation

Benefits:
- Reduced code duplication (eliminated 48 lines of repetitive locking code)
- Centralized locking logic makes it easier to understand and maintain
- Self-documenting pattern through named helper methods
- Easier to modify locking strategy in the future (single point of change)
- Improved readability - accessor methods are now one-liners

Code size reduction: ~40% fewer lines for accessor/helper methods

Verified with: go test -race ./weed/wdclient/... (passes)

* consistent

* Fix cache pointer race condition with atomic.Pointer

Use atomic.Pointer for vidMap cache field to prevent data races
during cache trimming in resetVidMap. This addresses the race condition
where concurrent GetLocations calls could read the cache pointer while
resetVidMap is modifying it during cache chain trimming.

Changes:
- Changed cache field from *vidMap to atomic.Pointer[vidMap]
- Updated all cache access to use Load() and Store() atomic operations
- Updated shallowClone, GetLocations, deleteLocation, deleteEcLocation
- Updated resetVidMap to use atomic operations for cache trimming

* Merge: Resolve conflict in deleteEcLocation - keep atomic.Pointer and fix bug

Resolved merge conflict by combining:
1. Atomic pointer access pattern (from HEAD): cache.Load()
2. Correct method call (from fix): deleteEcLocation (not deleteLocation)

Resolution:
- Before (HEAD): cachedMap.deleteLocation() - WRONG, reintroduced bug
- Before (fix): vc.cache.deleteEcLocation() - RIGHT method, old pattern
- After (merged): cachedMap.deleteEcLocation() - RIGHT method, new pattern

This preserves both improvements:
✓ Thread-safe atomic.Pointer access pattern
✓ Correct recursive call to deleteEcLocation

Verified with: go test -race ./weed/wdclient/... (passes)

* Update vid_map.go

* remove shallow clone

* simplify
2025-10-30 23:36:06 -07:00
Chris Lu
37af41fbfe Shell: Added a helper function isHelpRequest() (#7380)
* Added a helper function `isHelpRequest()`

* also handles combined short flags like -lh or -hl

* Created handleHelpRequest() helper function

encapsulates both:
Checking for help flags
Printing the help message

* Limit to reasonable length (2-4 chars total) to avoid matching long options like -verbose
2025-10-24 20:21:35 -07:00
Aleksey Kosov
165af32d6b added context to filer_client method calls (#6808)
Co-authored-by: akosov <a.kosov@kryptonite.ru>
2025-05-22 09:46:49 -07:00
Konstantin Lebedev
254ed8897e [shell] add noLock param for volume.move (#6261) 2024-11-20 08:35:24 -08:00
Konstantin Lebedev
9a5d3e7b31 [shell] add admin noLock for balance (#6209)
add admin noLock for balance
2024-11-05 19:09:22 -08:00
chrislu
9cd263b2ce refactor 2024-09-29 10:35:53 -07:00
chrislu
701abbb9df add IsResourceHeavy() to command interface 2024-09-28 20:23:01 -07:00
Nico D'Cotta
796b7508f3 Implement SRV lookups for filer (#4767) 2023-08-24 07:08:56 -07:00
SmsS4
17e91d2917 Use filerGroup for s3 buckets collection prefix (#4465)
* Use filerGroup for s3 buckets collection prefix

* Fix templates

* Remove flags

* Remove s3CollectionPrefix
2023-05-16 09:39:43 -07:00
chrislu
f1f14e28bc adjust name 2023-03-26 12:17:23 -07:00
chrislu
81fdf3651b grpc connection to filer add sw-client-id header 2023-01-20 01:48:12 -08:00
Konstantin Lebedev
c0deaa4948 [volume.fsck] check needles status from volume server (#3926)
check needles status from volume server
2022-10-31 11:33:04 -07:00
Konstantin Lebedev
764d9cb105 [voluche.chek.disk] needles older than the cutoff time are not missing yet (#3922)
needles older than the cutoff time are not missing yet

https://github.com/seaweedfs/seaweedfs/issues/3919
2022-10-28 12:12:20 -07:00
chrislu
f0b4a7659a fix test 2022-08-23 01:52:29 -07:00
famosss
911475526c fix: TestCommandEcBalanceSmall Unit test fails when CommandEnv is nil (#3497) 2022-08-22 23:54:51 -07:00
chrislu
57e7582c36 refactoring 2022-08-22 14:11:13 -07:00
Konstantin Lebedev
4d08393b7c filer prefer volume server in same data center (#3405)
* initial prefer same data center
https://github.com/seaweedfs/seaweedfs/issues/3404

* GetDataCenter

* prefer same data center for ReplicationSource

* GetDataCenterId

* remove glog
2022-08-04 17:35:00 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
chrislu
68065128b8 add dc and rack 2022-07-28 23:22:51 -07:00
chrislu
94635e9b5c filer: add filer group 2022-05-01 21:59:16 -07:00
chrislu
21e0898631 refactor: change masters from a slice to a map 2022-03-26 13:33:17 -07:00
chrislu
826a7b307e master: remove hard coded filer settings in master.toml
fix https://github.com/chrislusf/seaweedfs/issues/2529
2022-01-12 01:11:25 -08:00
chrislu
9f9ef1340c use streaming mode for long poll grpc calls
streaming mode would create separate grpc connections for each call.
this is to ensure the long poll connections are properly closed.
2021-12-26 00:15:03 -08:00
chrislu
a2d3f89c7b add lock messages 2021-12-10 13:24:38 -08:00
Chris Lu
ad5099e570 refactor 2021-09-19 12:02:23 -07:00
Chris Lu
e5fc35ed0c change server address from string to a type 2021-09-12 22:47:52 -07:00
Chris Lu
0128239c0f handle ipv6 addresses 2021-09-07 16:43:54 -07:00
Chris Lu
286e5dd375 Merge branch 'master' into add_remote_storage 2021-08-05 21:07:04 -07:00
Chris Lu
1e22166939 adjust error message 2021-08-05 21:06:55 -07:00
Chris Lu
d84c311699 refactoring 2021-08-04 12:30:18 -07:00
Chris Lu
990fa69bfe add back AdjustedUrl() related code 2021-01-28 14:36:29 -08:00
Chris Lu
00707ec00f mount: outsideContainerClusterMode proxy through filer
Running mount outside of the cluster would not need to expose all the volume servers to outside of the cluster. The chunk read and write will go through the filer.
2021-01-24 19:01:58 -08:00
Chris Lu
6ca10725b8 Revert "mount: when outside cluster network, use filer as proxy to access volume servers"
This reverts commit 096e088d7b.
2021-01-24 03:15:19 -08:00
Chris Lu
096e088d7b mount: when outside cluster network, use filer as proxy to access volume servers 2021-01-24 01:41:38 -08:00
Konstantin Lebedev
fc7baef5bb fiil serverUrls sorted by data center 2020-11-12 02:13:33 +05:00
Chris Lu
723ae11db4 refactoring in order to adjust volume server url later 2020-10-11 20:15:10 -07:00
Chris Lu
5d3ec22975 refactoring 2020-05-26 00:03:31 -07:00
Chris Lu
ed3cf811f5 refactoring 2020-04-29 13:26:02 -07:00
Chris Lu
73564e6a01 master: add cluster wide lock/unlock operation in weed shell
fix https://github.com/chrislusf/seaweedfs/issues/1286
2020-04-23 13:37:31 -07:00
Chris Lu
30ee4f3291 add exclusive lock library on shell 2020-04-23 02:31:04 -07:00
Chris Lu
076c8bd3bc filer master start up with default ip address instead of just localhost 2020-04-18 15:17:27 -07:00
Chris Lu
8e23dc078b refactoring 2020-04-12 20:48:21 -07:00
Chris Lu
91da7057b1 refactoring 2020-04-05 13:11:43 -07:00
Chris Lu
dd5b582d05 go fmt 2020-03-26 00:09:01 -07:00
Chris Lu
782d776d2a refactoring 2020-03-23 22:54:02 -07:00
Chris Lu
b51fa81f0e fix directory checking 2020-03-23 21:36:39 -07:00
Chris Lu
e666aeece2 simplify parsing filer host and port 2020-03-23 21:26:15 -07:00