mount: improve small file write performance (#8769)
* mount: defer file creation gRPC to flush time for faster small file writes When creating a file via FUSE Create(), skip the synchronous gRPC CreateEntry call to the filer. Instead, allocate the inode and build the entry locally, deferring the filer create to the Flush/Release path where flushMetadataToFiler already sends a CreateEntry with chunk data. This eliminates one synchronous gRPC round-trip per file during creation. For workloads with many small files (e.g. 30K files), this reduces the per-file overhead from ~2 gRPC calls to ~1. Mknod retains synchronous filer creation since it has no file handle and thus no flush path. * mount: use bounded worker pool for async flush operations Replace unbounded goroutine spawning in writebackCache async flush with a fixed-size worker pool backed by a channel. When many files are closed rapidly (e.g., cp -r of 30K files), the previous approach spawned one goroutine per file, leading to resource contention on gRPC/HTTP connections and high goroutine overhead. The worker pool size matches ConcurrentWriters (default 128), which provides good parallelism while bounding resource usage. Work items are queued into a buffered channel and processed by persistent worker goroutines. * mount: fix deferred create cache visibility and async flush race Three fixes for the deferred create and async flush changes: 1. Insert a local placeholder entry into the metadata cache during deferred file creation so that maybeLoadEntry() can find the file for duplicate-create checks, stat, and readdir. Uses InsertEntry directly (not applyLocalMetadataEvent) to avoid triggering the directory hot-threshold eviction that would wipe the entry. 2. Fix race in ReleaseHandle where asyncFlushWg.Add(1) and the channel send happened after pendingAsyncFlushMu was unlocked. A concurrent WaitForAsyncFlush could observe a zero counter, close the channel, and cause a send-on-closed panic. Move Add(1) before the unlock; keep the send after unlock to avoid deadlock with workers that acquire the same mutex during cleanup. 3. Update TestCreateCreatesAndOpensFile to flush the file handle before verifying the CreateEntry gRPC call, since file creation is now deferred to flush time.
This commit is contained in:
@@ -119,10 +119,15 @@ type WFS struct {
|
||||
dirHotThreshold int
|
||||
dirIdleEvict time.Duration
|
||||
|
||||
// asyncFlushWg tracks pending background flush goroutines for writebackCache mode.
|
||||
// asyncFlushWg tracks pending background flush work items for writebackCache mode.
|
||||
// Must be waited on before unmount cleanup to prevent data loss.
|
||||
asyncFlushWg sync.WaitGroup
|
||||
|
||||
// asyncFlushCh is a bounded work queue for background flush operations.
|
||||
// A fixed pool of worker goroutines processes items from this channel,
|
||||
// preventing resource exhaustion from unbounded goroutine creation.
|
||||
asyncFlushCh chan *asyncFlushItem
|
||||
|
||||
// pendingAsyncFlush tracks in-flight async flush goroutines by inode.
|
||||
// AcquireHandle checks this to wait for a pending flush before reopening
|
||||
// the same inode, preventing stale metadata from overwriting the async flush.
|
||||
@@ -277,6 +282,13 @@ func NewSeaweedFileSystem(option *Option) *WFS {
|
||||
wfs.concurrentWriters = util.NewLimitedConcurrentExecutor(wfs.option.ConcurrentWriters)
|
||||
wfs.concurrentCopiersSem = make(chan struct{}, wfs.option.ConcurrentWriters)
|
||||
}
|
||||
if wfs.option.WritebackCache {
|
||||
numWorkers := wfs.option.ConcurrentWriters
|
||||
if numWorkers <= 0 {
|
||||
numWorkers = 128
|
||||
}
|
||||
wfs.startAsyncFlushWorkers(numWorkers)
|
||||
}
|
||||
wfs.copyBufferPool.New = func() any {
|
||||
return make([]byte, option.ChunkSizeLimit)
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user