Commit Graph

36 Commits

Author SHA1 Message Date
Lars Lehtonen
387b146edd Prune wdclient Functions (#8855)
* chore(weed/wdclient): prune unused functions

* chore(weed/wdclient): prune test-only functions and associated tests

* chore(weed/wdclient): remove dead cursor field

The cursor field and its initialization are no longer used after
the removal of getLocationIndex.

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-03-30 18:53:10 -07:00
Chris Lu
066410dbd0 Fix S3 Gateway Read Failover #8076 (#8087)
* fix s3 read failover #8076

- Implement cache invalidation in vidMapClient
- Add retry logic in shared PrepareStreamContentWithThrottler
- Update S3 Gateway to use FilerClient directly for invalidation support
- Remove obsolete simpleMasterClient struct

* improve observability for chunk re-lookup failures

Added a warning log when volume location re-lookup fails after cache invalidation in PrepareStreamContentWithThrottler.

* address code review feedback

- Prevent infinite retry loops by comparing old/new URLs before retry
- Update fileId2Url map after successful re-lookup for subsequent references
- Add comprehensive test coverage for failover logic
- Add tests for InvalidateCache method

* Fix: prevent data duplication in stream retry and improve VidMap robustness

* Cleanup: remove redundant check in InvalidateCache
2026-01-22 14:07:24 -08:00
Chris Lu
d745e6e41d Fix masterclient vidmap race condition (#7412)
* fallback to check master

* clean up

* parsing

* refactor

* handle parse error

* return error

* avoid dup lookup

* use batch key

* dedup lookup logic

* address comments

* errors.Join(lookupErrors...)

* add a comment

* Fix: Critical data race in MasterClient vidMap

Fixes a critical data race where resetVidMap() was writing to the vidMap
pointer while other methods were reading it concurrently without synchronization.

Changes:
- Removed embedded *vidMap from MasterClient
- Added vidMapLock (sync.RWMutex) to protect vidMap pointer access
- Created safe accessor methods (GetLocations, GetDataCenter, etc.)
- Updated all direct vidMap accesses to use thread-safe methods
- Updated resetVidMap() to acquire write lock during pointer swap

The vidMap already has internal locking for its operations, but this fix
protects the vidMap pointer itself from concurrent read/write races.

Verified with: go test -race ./weed/wdclient/...

Impact:
- Prevents potential panics from concurrent pointer access
- No performance impact - uses RWMutex for read-heavy workloads
- Maintains backward compatibility through wrapper methods

* fmt

* Fix: Critical data race in MasterClient vidMap

Fixes a critical data race where resetVidMap() was writing to the vidMap
pointer while other methods were reading it concurrently without synchronization.

Changes:
- Removed embedded *vidMap from MasterClient struct
- Added vidMapLock (sync.RWMutex) to protect vidMap pointer access
- Created minimal public accessor methods for external packages:
  * GetLocations, GetLocationsClone, GetVidLocations
  * LookupFileId, LookupVolumeServerUrl
  * GetDataCenter
- Internal code directly locks and accesses vidMap (no extra indirection)
- Updated resetVidMap() to acquire write lock during pointer swap
- Updated shell/commands.go to use GetDataCenter() method

Design Philosophy:
- vidMap already has internal locking for its map operations
- This fix specifically protects the vidMap *pointer* from concurrent access
- Public methods for external callers, direct locking for internal use
- Minimizes wrapper overhead while maintaining thread safety

Verified with: go test -race ./weed/wdclient/... (passes)

Impact:
- Prevents potential panics/crashes from data races
- Minimal performance impact (RWMutex for read-heavy workload)
- Maintains full backward compatibility

* fix more concurrent access

* reduce lock scope

* Optimize vidMap locking for better concurrency

Improved locking strategy based on the understanding that:
- vidMapLock protects the vidMap pointer from concurrent swaps
- vidMap has internal locks that protect its data structures

Changes:
1. Read operations: Grab pointer with RLock, release immediately, then operate
   - Reduces lock hold time
   - Allows resetVidMap to proceed sooner
   - Methods: GetLocations, GetLocationsClone, GetVidLocations,
     LookupVolumeServerUrl, GetDataCenter

2. Write operations: Changed from Lock() to RLock()
   - RLock prevents pointer swap during operation
   - Allows concurrent readers and other writers (serialized by vidMap's lock)
   - Methods: addLocation, deleteLocation, addEcLocation, deleteEcLocation

Benefits:
- Significantly reduced lock contention
- Better concurrent performance under load
- Still prevents all race conditions

Verified with: go test -race ./weed/wdclient/... (passes)

* Further reduce lock contention in LookupVolumeIdsWithFallback

Optimized two loops that were holding RLock for extended periods:

Before:
- Held RLock during entire loop iteration
- Included string parsing and cache lookups
- Could block resetVidMap for significant time with large batches

After:
- Grab vidMap pointer with brief RLock
- Release lock immediately
- Perform all loop operations on local pointer

Impact:
- First loop: Cache check on initial volumeIds
- Second loop: Double-check after singleflight wait

Benefits:
- Minimal lock hold time (just pointer copy)
- resetVidMap no longer blocked by long loops
- Better concurrent performance with large volume ID lists
- Still thread-safe (vidMap methods have internal locks)

Verified with: go test -race ./weed/wdclient/... (passes)

* Add clarifying comments to vidMap helper functions

Added inline documentation to each helper function (addLocation, deleteLocation,
addEcLocation, deleteEcLocation) explaining the two-level locking strategy:

- RLock on vidMapLock prevents resetVidMap from swapping the pointer
- vidMap has internal locks that protect the actual map mutations
- This design provides optimal concurrency

The comments make it clear why RLock (not Lock) is correct and intentional,
preventing future confusion about the locking strategy.

* Improve encapsulation: Add shallowClone() method to vidMap

Added a shallowClone() method to vidMap to improve encapsulation and prevent
MasterClient from directly accessing vidMap's internal fields.

Changes:
1. Added vidMap.shallowClone() in vid_map.go
   - Encapsulates the shallow copy logic within vidMap
   - Makes vidMap responsible for its own state representation
   - Documented that caller is responsible for thread safety

2. Simplified resetVidMap() in masterclient.go
   - Uses tail := mc.vidMap.shallowClone() instead of manual field access
   - Cleaner, more maintainable code
   - Better adherence to encapsulation principles

Benefits:
- Improved code organization and maintainability
- vidMap internals are now properly encapsulated
- Easier to modify vidMap structure in the future
- More self-documenting code

Verified with: go test -race ./weed/wdclient/... (passes)

* Optimize locking: Reduce lock acquisitions and use helper methods

Two optimizations to further reduce lock contention and improve code consistency:

1. LookupFileIdWithFallback: Eliminated redundant lock acquisition
   - Before: Two separate locks to get vidMap and dataCenter
   - After: Single lock gets both values together
   - Benefit: 50% reduction in lock/unlock overhead for this hot path

2. KeepConnected: Use GetDataCenter() helper for consistency
   - Before: Manual lock/unlock to access DataCenter field
   - After: Use existing GetDataCenter() helper method
   - Benefit: Better encapsulation and code consistency

Impact:
- Reduced lock contention in high-traffic lookup path
- More consistent use of accessor methods throughout codebase
- Cleaner, more maintainable code

Verified with: go test -race ./weed/wdclient/... (passes)

* Refactor: Extract common locking patterns into helper methods

Eliminated code duplication by introducing two helper methods that encapsulate
the common locking patterns used throughout MasterClient:

1. getStableVidMap() - For read operations
   - Acquires lock, gets pointer, releases immediately
   - Returns stable snapshot for thread-safe reads
   - Used by: GetLocations, GetLocationsClone, GetVidLocations,
     LookupFileId, LookupVolumeServerUrl, GetDataCenter

2. withCurrentVidMap(f func(vm *vidMap)) - For write operations
   - Holds RLock during callback execution
   - Prevents pointer swap while allowing concurrent operations
   - Used by: addLocation, deleteLocation, addEcLocation, deleteEcLocation

Benefits:
- Reduced code duplication (eliminated 48 lines of repetitive locking code)
- Centralized locking logic makes it easier to understand and maintain
- Self-documenting pattern through named helper methods
- Easier to modify locking strategy in the future (single point of change)
- Improved readability - accessor methods are now one-liners

Code size reduction: ~40% fewer lines for accessor/helper methods

Verified with: go test -race ./weed/wdclient/... (passes)

* consistent

* Fix cache pointer race condition with atomic.Pointer

Use atomic.Pointer for vidMap cache field to prevent data races
during cache trimming in resetVidMap. This addresses the race condition
where concurrent GetLocations calls could read the cache pointer while
resetVidMap is modifying it during cache chain trimming.

Changes:
- Changed cache field from *vidMap to atomic.Pointer[vidMap]
- Updated all cache access to use Load() and Store() atomic operations
- Updated shallowClone, GetLocations, deleteLocation, deleteEcLocation
- Updated resetVidMap to use atomic operations for cache trimming

* Merge: Resolve conflict in deleteEcLocation - keep atomic.Pointer and fix bug

Resolved merge conflict by combining:
1. Atomic pointer access pattern (from HEAD): cache.Load()
2. Correct method call (from fix): deleteEcLocation (not deleteLocation)

Resolution:
- Before (HEAD): cachedMap.deleteLocation() - WRONG, reintroduced bug
- Before (fix): vc.cache.deleteEcLocation() - RIGHT method, old pattern
- After (merged): cachedMap.deleteEcLocation() - RIGHT method, new pattern

This preserves both improvements:
✓ Thread-safe atomic.Pointer access pattern
✓ Correct recursive call to deleteEcLocation

Verified with: go test -race ./weed/wdclient/... (passes)

* Update vid_map.go

* remove shallow clone

* simplify
2025-10-30 23:36:06 -07:00
Aleksey Kosov
283d9e0079 Add context with request (#6824) 2025-05-28 11:34:02 -07:00
chrislu
31b2751aff clone volume locations in case they are changed
fix https://github.com/seaweedfs/seaweedfs/issues/4642
2023-07-06 00:32:58 -07:00
LHHDZ
bc629665de fix bug due to data racing on VidMap (#3606) 2022-09-05 20:00:16 -07:00
chrislu
cb476a53ff remove logs 2022-08-15 01:05:35 -07:00
Konstantin Lebedev
4d08393b7c filer prefer volume server in same data center (#3405)
* initial prefer same data center
https://github.com/seaweedfs/seaweedfs/issues/3404

* GetDataCenter

* prefer same data center for ReplicationSource

* GetDataCenterId

* remove glog
2022-08-04 17:35:00 -07:00
chrislu
26dbc6c905 move to https://github.com/seaweedfs/seaweedfs 2022-07-29 00:17:28 -07:00
LHHDZ
58c02d6429 Solve the problem that LookupFileId lookup urls is empty due to leader switching
The vidMap structure is modified to a linked list structure (the length is limited to 5). When the vidMap is reset, the current vidMap is added to the new vidMap as a cache node. When the query locations is empty, the cache node is searched to avoid problems when the master switches leaders.
2022-07-22 17:22:38 +08:00
chrislu
9c517d2b35 masterclient: fallback to directly querying master in case of missing volume id location 2022-06-24 02:08:57 -07:00
ningfd
338705f375 fix(wdclient): GetLocations return 2022-06-15 19:20:13 +08:00
chrislu
bc888226fc erasure coding: tracking encoded/decoded volumes
If an EC shard is created but not spread to other servers, the masterclient would think this shard is not located here.
2022-04-05 19:03:02 -07:00
Chris Lu
3d87aa767d fix same dc and other dc 2021-11-16 09:14:01 -08:00
Chris Lu
7bf891c00a randomize same-dc servers and other-dc servers 2021-11-12 11:30:11 -08:00
Chris Lu
e5fc35ed0c change server address from string to a type 2021-09-12 22:47:52 -07:00
Chris Lu
2b76854641 add "weed filer.cat" to read files directly from volume servers 2021-01-06 04:22:00 -08:00
Konstantin Lebedev
1eec5c8d5d gen pb 2020-11-12 04:10:06 +05:00
Konstantin Lebedev
fc7baef5bb fiil serverUrls sorted by data center 2020-11-12 02:13:33 +05:00
Konstantin Lebedev
dc26012a3b initial 2020-11-11 15:03:47 +05:00
Chris Lu
a8624c2e4f read from alternative replica
related to https://github.com/chrislusf/seaweedfs/issues/1512
2020-10-07 22:49:04 -07:00
Chris Lu
ad3efbb197 tweaking data types 2019-09-14 01:21:51 -07:00
divinerapier
bb31462b52 fix: thread unsafe
Signed-off-by: divinerapier <poriter.coco@gmail.com>
2019-09-13 20:06:02 +08:00
Chris Lu
8afd8d35b3 master: followers can also lookup and redirect
improve scalability
2019-07-28 03:58:13 -07:00
Chris Lu
92c7f7e069 fix compilation error 2019-06-28 11:19:08 -07:00
Chris Lu
9f3f2f7c79 protect locations slice
fix https://github.com/chrislusf/seaweedfs/issues/995
2019-06-28 09:47:29 -07:00
Chris Lu
5c411f3e5f minor 2019-04-21 13:33:32 -07:00
Chris Lu
5ae4b963a4 avoid using global rand 2019-03-19 22:20:14 -07:00
Chris Lu
b282e34dc2 async file chunk deletion 2018-11-20 20:56:28 -08:00
Chris Lu
91ac2e0dd9 go fmt 2018-10-14 00:30:20 -07:00
Chris Lu
ff66269b62 use grpc to replace http APIs for batch volume id lookup and batch delete
1. remove batch volume id lookup http API /vol/lookup
2. remove batch delete http API /delete
2018-10-14 00:12:28 -07:00
Chris Lu
d3205a0070 go fmt 2018-07-28 21:02:56 -07:00
Chris Lu
7214a8e265 fix init error 2018-07-28 18:40:31 -07:00
Chris Lu
cfbfc7cb67 fix compilation error 2018-07-28 18:34:15 -07:00
Chris Lu
888eb2abb5 filer read write all via locations from MasterClient 2018-07-28 14:51:36 -07:00
Chris Lu
1d779389cb MasterClient replicates all vid locations 2018-07-28 14:22:46 -07:00