Files
seaweedFS/weed/util/http/http_global_client_util.go
Chris Lu 5c1de633cb mount: improve read throughput with parallel chunk fetching (#7627)
* filer: remove lock contention during chunk download

This addresses issue #7504 where a single weed mount FUSE instance
does not fully utilize node network bandwidth when reading large files.

The SingleChunkCacher was holding a mutex during the entire HTTP download,
causing readers to block until the download completed. This serialized
chunk reads even when multiple goroutines were downloading in parallel.

Changes:
- Add sync.Cond to SingleChunkCacher for efficient waiting
- Move HTTP download outside the critical section in startCaching()
- Use condition variable in readChunkAt() to wait for download completion
- Add isComplete flag to track download state

Now multiple chunk downloads can proceed truly in parallel, and readers
wait efficiently using the condition variable instead of blocking on
a mutex held during I/O operations.

Ref: #7504

* filer: parallel chunk fetching within doReadAt

This addresses issue #7504 by enabling parallel chunk downloads within
a single read operation.

Previously, doReadAt() processed chunks sequentially in a loop, meaning
each chunk had to be fully downloaded before the next one started.
This left significant network bandwidth unused when chunks resided on
different volume servers.

Changes:
- Collect all chunk read tasks upfront
- Use errgroup to fetch multiple chunks in parallel
- Each chunk reads directly into its correct buffer position
- Limit concurrency to prefetchCount (min 4) to avoid overwhelming the system
- Handle gaps and zero-filling before parallel fetch
- Trigger prefetch after parallel reads complete

For a read spanning N chunks on different volume servers, this can
now utilize up to N times the bandwidth of a single connection.

Ref: #7504

* http: direct buffer read to reduce memory copies

This addresses issue #7504 by reducing memory copy overhead during
chunk downloads.

Previously, RetriedFetchChunkData used ReadUrlAsStream which:
1. Allocated a 64KB intermediate buffer
2. Read data in 64KB chunks
3. Called a callback to copy each chunk to the destination

For a 16MB chunk, this meant 256 copy operations plus the callback
overhead. Profiling showed significant time spent in memmove.

Changes:
- Add readUrlDirectToBuffer() that reads directly into the destination
- Add retriedFetchChunkDataDirect() for unencrypted, non-gzipped chunks
- Automatically use direct read path when possible (cipher=nil, gzip=false)
- Use http.NewRequestWithContext for proper cancellation

For unencrypted chunks (the common case), this eliminates the
intermediate buffer entirely, reading HTTP response bytes directly
into the final destination buffer.

Ref: #7504

* address review comments

- Use channel (done) instead of sync.Cond for download completion signaling
  This integrates better with context cancellation patterns
- Remove redundant groupErr check in reader_at.go (errors are already captured in task.err)
- Remove buggy URL encoding logic from retriedFetchChunkDataDirect
  (The existing url.PathEscape on full URL is a pre-existing bug that should be fixed separately)

* address review comments (round 2)

- Return io.ErrUnexpectedEOF when HTTP response is truncated
  This prevents silent data corruption from incomplete reads
- Simplify errgroup error handling by using g.Wait() error directly
  Remove redundant task.err field and manual error aggregation loop
- Define minReadConcurrency constant instead of magic number 4
  Improves code readability and maintainability

Note: Context propagation to startCaching() is intentionally NOT changed.
The downloaded chunk is a shared resource that may be used by multiple
readers. Using context.Background() ensures the download completes even
if one reader cancels, preventing data loss for other waiting readers.

* http: inject request ID for observability in direct read path

Add request_id.InjectToRequest() call to readUrlDirectToBuffer() for
consistency with ReadUrlAsStream path. This ensures full-chunk reads
carry the same tracing/correlation headers for server logs and metrics.

* filer: consistent timestamp handling in sequential read path

Use max(ts, task.chunk.ModifiedTsNs) in sequential path to match
parallel path behavior. Also update ts before error check so that
on failure, the returned timestamp reflects the max of all chunks
processed so far.

* filer: document why context.Background() is used in startCaching

Add comment explaining the intentional design decision: the downloaded
chunk is a shared resource that may be used by multiple concurrent
readers. Using context.Background() ensures the download completes
even if one reader cancels, preventing errors for other waiting readers.

* filer: propagate context for reader cancellation

Address review comment: pass context through ReadChunkAt call chain so
that a reader can cancel its wait for a download. The key distinction is:

- Download uses context.Background() - shared resource, always completes
- Reader wait uses request context - can be cancelled individually

If a reader cancels, it stops waiting and returns ctx.Err(), but the
download continues to completion for other readers waiting on the same
chunk. This properly handles the shared resource semantics while still
allowing individual reader cancellation.

* filer: use defer for close(done) to guarantee signal on panic

Move close(s.done) to a defer statement at the start of startCaching()
to ensure the completion signal is always sent, even if an unexpected
panic occurs. This prevents readers from blocking indefinitely.

* filer: remove unnecessary code

- Remove close(s.cacheStartedCh) in destroy() - the channel is only used
  for one-time synchronization, closing it provides no benefit
- Remove task := task loop variable capture - Go 1.22+ fixed loop variable
  semantics, this capture is no longer necessary (go.mod specifies Go 1.24.0)

* filer: restore fallback to chunkCache when cacher returns no data

Fix critical issue where ReadChunkAt would return 0,nil immediately
if SingleChunkCacher couldn't provide data for the requested offset,
without trying the chunkCache fallback. Now if cacher.readChunkAt
returns n=0 and err=nil, we fall through to try chunkCache.

* filer: add comprehensive tests for ReaderCache

Tests cover:
- Context cancellation while waiting for download
- Fallback to chunkCache when cacher returns n=0, err=nil
- Multiple concurrent readers waiting for same chunk
- Partial reads at different offsets
- Downloader cleanup when exceeding cache limit
- Done channel signaling (no hangs on completion)

* filer: prioritize done channel over context cancellation

If data is already available (done channel closed), return it even if
the reader's context is also cancelled. This avoids unnecessary errors
when the download has already completed.

* filer: add lookup error test and document test limitations

Add TestSingleChunkCacherLookupError to test error handling when lookup
fails. Document that full HTTP integration tests for SingleChunkCacher
require global HTTP client initialization which is complex in unit tests.
The download path is tested via FUSE integration tests.

* filer: add tests that exercise SingleChunkCacher concurrency logic

Add tests that use blocking lookupFileIdFn to exercise the actual
SingleChunkCacher wait/cancellation logic:

- TestSingleChunkCacherContextCancellationDuringLookup: tests reader
  cancellation while lookup is blocked
- TestSingleChunkCacherMultipleReadersWaitForDownload: tests multiple
  readers waiting on the same download
- TestSingleChunkCacherOneReaderCancelsOthersContinue: tests that when
  one reader cancels, other readers continue waiting

These tests properly exercise the done channel wait/cancel logic without
requiring HTTP calls - the blocking lookup simulates a slow download.
2025-12-04 23:40:56 -08:00

662 lines
16 KiB
Go

package http
import (
"compress/gzip"
"context"
"encoding/json"
"errors"
"fmt"
"sync"
"github.com/seaweedfs/seaweedfs/weed/util"
"github.com/seaweedfs/seaweedfs/weed/util/mem"
"github.com/seaweedfs/seaweedfs/weed/util/request_id"
"io"
"net/http"
"net/url"
"strings"
"time"
"github.com/seaweedfs/seaweedfs/weed/glog"
"github.com/seaweedfs/seaweedfs/weed/security"
)
var ErrNotFound = fmt.Errorf("not found")
var ErrTooManyRequests = fmt.Errorf("too many requests")
var (
jwtSigningReadKey security.SigningKey
jwtSigningReadKeyExpires int
loadJwtConfigOnce sync.Once
)
func loadJwtConfig() {
v := util.GetViper()
jwtSigningReadKey = security.SigningKey(v.GetString("jwt.signing.read.key"))
jwtSigningReadKeyExpires = v.GetInt("jwt.signing.read.expires_after_seconds")
}
func Post(url string, values url.Values) ([]byte, error) {
r, err := GetGlobalHttpClient().PostForm(url, values)
if err != nil {
return nil, err
}
defer r.Body.Close()
b, err := io.ReadAll(r.Body)
if r.StatusCode >= 400 {
if err != nil {
return nil, fmt.Errorf("%s: %d - %s", url, r.StatusCode, string(b))
} else {
return nil, fmt.Errorf("%s: %s", url, r.Status)
}
}
if err != nil {
return nil, err
}
return b, nil
}
// github.com/seaweedfs/seaweedfs/unmaintained/repeated_vacuum/repeated_vacuum.go
// may need increasing http.Client.Timeout
func Get(url string) ([]byte, bool, error) {
return GetAuthenticated(url, "")
}
func GetAuthenticated(url, jwt string) ([]byte, bool, error) {
request, err := http.NewRequest(http.MethodGet, url, nil)
if err != nil {
return nil, true, err
}
maybeAddAuth(request, jwt)
request.Header.Add("Accept-Encoding", "gzip")
response, err := GetGlobalHttpClient().Do(request)
if err != nil {
return nil, true, err
}
defer CloseResponse(response)
var reader io.ReadCloser
switch response.Header.Get("Content-Encoding") {
case "gzip":
reader, err = gzip.NewReader(response.Body)
if err != nil {
return nil, true, err
}
defer reader.Close()
default:
reader = response.Body
}
b, err := io.ReadAll(reader)
if response.StatusCode >= 400 {
retryable := response.StatusCode >= 500
return nil, retryable, fmt.Errorf("%s: %s", url, response.Status)
}
if err != nil {
return nil, false, err
}
return b, false, nil
}
func Head(url string) (http.Header, error) {
r, err := GetGlobalHttpClient().Head(url)
if err != nil {
return nil, err
}
defer CloseResponse(r)
if r.StatusCode >= 400 {
return nil, fmt.Errorf("%s: %s", url, r.Status)
}
return r.Header, nil
}
func maybeAddAuth(req *http.Request, jwt string) {
if jwt != "" {
req.Header.Set("Authorization", "BEARER "+string(jwt))
}
}
func Delete(url string, jwt string) error {
req, err := http.NewRequest(http.MethodDelete, url, nil)
maybeAddAuth(req, jwt)
if err != nil {
return err
}
resp, e := GetGlobalHttpClient().Do(req)
if e != nil {
return e
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return err
}
switch resp.StatusCode {
case http.StatusNotFound, http.StatusAccepted, http.StatusOK:
return nil
}
m := make(map[string]interface{})
if e := json.Unmarshal(body, &m); e == nil {
if s, ok := m["error"].(string); ok {
return errors.New(s)
}
}
return errors.New(string(body))
}
func DeleteProxied(url string, jwt string) (body []byte, httpStatus int, err error) {
req, err := http.NewRequest(http.MethodDelete, url, nil)
maybeAddAuth(req, jwt)
if err != nil {
return
}
resp, err := GetGlobalHttpClient().Do(req)
if err != nil {
return
}
defer resp.Body.Close()
body, err = io.ReadAll(resp.Body)
if err != nil {
return
}
httpStatus = resp.StatusCode
return
}
func GetBufferStream(url string, values url.Values, allocatedBytes []byte, eachBuffer func([]byte)) error {
r, err := GetGlobalHttpClient().PostForm(url, values)
if err != nil {
return err
}
defer CloseResponse(r)
if r.StatusCode != 200 {
return fmt.Errorf("%s: %s", url, r.Status)
}
for {
n, err := r.Body.Read(allocatedBytes)
if n > 0 {
eachBuffer(allocatedBytes[:n])
}
if err != nil {
if err == io.EOF {
return nil
}
return err
}
}
}
func GetUrlStream(url string, values url.Values, readFn func(io.Reader) error) error {
r, err := GetGlobalHttpClient().PostForm(url, values)
if err != nil {
return err
}
defer CloseResponse(r)
if r.StatusCode != 200 {
return fmt.Errorf("%s: %s", url, r.Status)
}
return readFn(r.Body)
}
func DownloadFile(fileUrl string, jwt string) (filename string, header http.Header, resp *http.Response, e error) {
req, err := http.NewRequest(http.MethodGet, fileUrl, nil)
if err != nil {
return "", nil, nil, err
}
maybeAddAuth(req, jwt)
response, err := GetGlobalHttpClient().Do(req)
if err != nil {
return "", nil, nil, err
}
header = response.Header
contentDisposition := response.Header["Content-Disposition"]
if len(contentDisposition) > 0 {
idx := strings.Index(contentDisposition[0], "filename=")
if idx != -1 {
filename = contentDisposition[0][idx+len("filename="):]
filename = strings.Trim(filename, "\"")
}
}
resp = response
return
}
func Do(req *http.Request) (resp *http.Response, err error) {
return GetGlobalHttpClient().Do(req)
}
func NormalizeUrl(url string) (string, error) {
return GetGlobalHttpClient().NormalizeHttpScheme(url)
}
func ReadUrl(ctx context.Context, fileUrl string, cipherKey []byte, isContentCompressed bool, isFullChunk bool, offset int64, size int, buf []byte) (int64, error) {
if cipherKey != nil {
var n int
_, err := readEncryptedUrl(ctx, fileUrl, "", cipherKey, isContentCompressed, isFullChunk, offset, size, func(data []byte) {
n = copy(buf, data)
})
return int64(n), err
}
req, err := http.NewRequest(http.MethodGet, fileUrl, nil)
if err != nil {
return 0, err
}
if !isFullChunk {
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", offset, offset+int64(size)-1))
} else {
req.Header.Set("Accept-Encoding", "gzip")
}
r, err := GetGlobalHttpClient().Do(req)
if err != nil {
return 0, err
}
defer CloseResponse(r)
if r.StatusCode >= 400 {
return 0, fmt.Errorf("%s: %s", fileUrl, r.Status)
}
var reader io.ReadCloser
contentEncoding := r.Header.Get("Content-Encoding")
switch contentEncoding {
case "gzip":
reader, err = gzip.NewReader(r.Body)
if err != nil {
return 0, err
}
defer reader.Close()
default:
reader = r.Body
}
var (
i, m int
n int64
)
// refers to https://github.com/golang/go/blob/master/src/bytes/buffer.go#L199
// commit id c170b14c2c1cfb2fd853a37add92a82fd6eb4318
for {
m, err = reader.Read(buf[i:])
i += m
n += int64(m)
if err == io.EOF {
return n, nil
}
if err != nil {
return n, err
}
if n == int64(len(buf)) {
break
}
}
// drains the response body to avoid memory leak
data, _ := io.ReadAll(reader)
if len(data) != 0 {
glog.V(1).InfofCtx(ctx, "%s reader has remaining %d bytes", contentEncoding, len(data))
}
return n, err
}
func ReadUrlAsStream(ctx context.Context, fileUrl, jwt string, cipherKey []byte, isContentGzipped bool, isFullChunk bool, offset int64, size int, fn func(data []byte)) (retryable bool, err error) {
if cipherKey != nil {
return readEncryptedUrl(ctx, fileUrl, jwt, cipherKey, isContentGzipped, isFullChunk, offset, size, fn)
}
req, err := http.NewRequest(http.MethodGet, fileUrl, nil)
maybeAddAuth(req, jwt)
if err != nil {
return false, err
}
if isFullChunk {
req.Header.Add("Accept-Encoding", "gzip")
} else {
req.Header.Add("Range", fmt.Sprintf("bytes=%d-%d", offset, offset+int64(size)-1))
}
request_id.InjectToRequest(ctx, req)
r, err := GetGlobalHttpClient().Do(req)
if err != nil {
return true, err
}
defer CloseResponse(r)
if r.StatusCode >= 400 {
if r.StatusCode == http.StatusNotFound {
return true, fmt.Errorf("%s: %s: %w", fileUrl, r.Status, ErrNotFound)
}
if r.StatusCode == http.StatusTooManyRequests {
return false, fmt.Errorf("%s: %s: %w", fileUrl, r.Status, ErrTooManyRequests)
}
retryable = r.StatusCode >= 499
return retryable, fmt.Errorf("%s: %s", fileUrl, r.Status)
}
var reader io.ReadCloser
contentEncoding := r.Header.Get("Content-Encoding")
switch contentEncoding {
case "gzip":
reader, err = gzip.NewReader(r.Body)
defer reader.Close()
default:
reader = r.Body
}
var (
m int
)
buf := mem.Allocate(64 * 1024)
defer mem.Free(buf)
for {
// Check for context cancellation before each read
select {
case <-ctx.Done():
return false, ctx.Err()
default:
}
m, err = reader.Read(buf)
if m > 0 {
fn(buf[:m])
}
if err == io.EOF {
return false, nil
}
if err != nil {
return true, err
}
}
}
func readEncryptedUrl(ctx context.Context, fileUrl, jwt string, cipherKey []byte, isContentCompressed bool, isFullChunk bool, offset int64, size int, fn func(data []byte)) (bool, error) {
encryptedData, retryable, err := GetAuthenticated(fileUrl, jwt)
if err != nil {
return retryable, fmt.Errorf("fetch %s: %v", fileUrl, err)
}
decryptedData, err := util.Decrypt(encryptedData, util.CipherKey(cipherKey))
if err != nil {
return false, fmt.Errorf("decrypt %s: %v", fileUrl, err)
}
if isContentCompressed {
decryptedData, err = util.DecompressData(decryptedData)
if err != nil {
glog.V(0).InfofCtx(ctx, "unzip decrypt %s: %v", fileUrl, err)
}
}
if len(decryptedData) < int(offset)+size {
return false, fmt.Errorf("read decrypted %s size %d [%d, %d)", fileUrl, len(decryptedData), offset, int(offset)+size)
}
if isFullChunk {
fn(decryptedData)
} else {
sliceEnd := int(offset) + size
fn(decryptedData[int(offset):sliceEnd])
}
return false, nil
}
func ReadUrlAsReaderCloser(fileUrl string, jwt string, rangeHeader string) (*http.Response, io.ReadCloser, error) {
req, err := http.NewRequest(http.MethodGet, fileUrl, nil)
if err != nil {
return nil, nil, err
}
if rangeHeader != "" {
req.Header.Add("Range", rangeHeader)
} else {
req.Header.Add("Accept-Encoding", "gzip")
}
maybeAddAuth(req, jwt)
r, err := GetGlobalHttpClient().Do(req)
if err != nil {
return nil, nil, err
}
if r.StatusCode >= 400 {
CloseResponse(r)
return nil, nil, fmt.Errorf("%s: %s", fileUrl, r.Status)
}
var reader io.ReadCloser
contentEncoding := r.Header.Get("Content-Encoding")
switch contentEncoding {
case "gzip":
reader, err = gzip.NewReader(r.Body)
if err != nil {
return nil, nil, err
}
default:
reader = r.Body
}
return r, reader, nil
}
func CloseResponse(resp *http.Response) {
if resp == nil || resp.Body == nil {
return
}
reader := &CountingReader{reader: resp.Body}
io.Copy(io.Discard, reader)
resp.Body.Close()
if reader.BytesRead > 0 {
glog.V(1).Infof("response leftover %d bytes", reader.BytesRead)
}
}
func CloseRequest(req *http.Request) {
reader := &CountingReader{reader: req.Body}
io.Copy(io.Discard, reader)
req.Body.Close()
if reader.BytesRead > 0 {
glog.V(1).Infof("request leftover %d bytes", reader.BytesRead)
}
}
type CountingReader struct {
reader io.Reader
BytesRead int
}
func (r *CountingReader) Read(p []byte) (n int, err error) {
n, err = r.reader.Read(p)
r.BytesRead += n
return n, err
}
func RetriedFetchChunkData(ctx context.Context, buffer []byte, urlStrings []string, cipherKey []byte, isGzipped bool, isFullChunk bool, offset int64, fileId string) (n int, err error) {
loadJwtConfigOnce.Do(loadJwtConfig)
var jwt security.EncodedJwt
if len(jwtSigningReadKey) > 0 {
jwt = security.GenJwtForVolumeServer(
jwtSigningReadKey,
jwtSigningReadKeyExpires,
fileId,
)
}
// For unencrypted, non-gzipped full chunks, use direct buffer read
// This avoids the 64KB intermediate buffer and callback overhead
if cipherKey == nil && !isGzipped && isFullChunk {
return retriedFetchChunkDataDirect(ctx, buffer, urlStrings, string(jwt))
}
var shouldRetry bool
for waitTime := time.Second; waitTime < util.RetryWaitTime; waitTime += waitTime / 2 {
// Check for context cancellation before starting retry loop
select {
case <-ctx.Done():
return n, ctx.Err()
default:
}
for _, urlString := range urlStrings {
// Check for context cancellation before each volume server request
select {
case <-ctx.Done():
return n, ctx.Err()
default:
}
n = 0
if strings.Contains(urlString, "%") {
urlString = url.PathEscape(urlString)
}
shouldRetry, err = ReadUrlAsStream(ctx, urlString+"?readDeleted=true", string(jwt), cipherKey, isGzipped, isFullChunk, offset, len(buffer), func(data []byte) {
// Check for context cancellation during data processing
select {
case <-ctx.Done():
// Stop processing data when context is cancelled
return
default:
}
if n < len(buffer) {
x := copy(buffer[n:], data)
n += x
}
})
if !shouldRetry {
break
}
if err != nil {
glog.V(0).InfofCtx(ctx, "read %s failed, err: %v", urlString, err)
} else {
break
}
}
if err != nil && shouldRetry {
glog.V(0).InfofCtx(ctx, "retry reading in %v", waitTime)
// Sleep with proper context cancellation and timer cleanup
timer := time.NewTimer(waitTime)
select {
case <-ctx.Done():
timer.Stop()
return n, ctx.Err()
case <-timer.C:
// Continue with retry
}
} else {
break
}
}
return n, err
}
// retriedFetchChunkDataDirect reads chunk data directly into the buffer without
// intermediate buffering. This reduces memory copies and improves throughput
// for large chunk reads.
func retriedFetchChunkDataDirect(ctx context.Context, buffer []byte, urlStrings []string, jwt string) (n int, err error) {
var shouldRetry bool
for waitTime := time.Second; waitTime < util.RetryWaitTime; waitTime += waitTime / 2 {
select {
case <-ctx.Done():
return 0, ctx.Err()
default:
}
for _, urlString := range urlStrings {
select {
case <-ctx.Done():
return 0, ctx.Err()
default:
}
n, shouldRetry, err = readUrlDirectToBuffer(ctx, urlString+"?readDeleted=true", jwt, buffer)
if err == nil {
return n, nil
}
if !shouldRetry {
break
}
glog.V(0).InfofCtx(ctx, "read %s failed, err: %v", urlString, err)
}
if err != nil && shouldRetry {
glog.V(0).InfofCtx(ctx, "retry reading in %v", waitTime)
timer := time.NewTimer(waitTime)
select {
case <-ctx.Done():
timer.Stop()
return 0, ctx.Err()
case <-timer.C:
}
} else {
break
}
}
return n, err
}
// readUrlDirectToBuffer reads HTTP response directly into the provided buffer,
// avoiding intermediate buffer allocations and copies.
func readUrlDirectToBuffer(ctx context.Context, fileUrl, jwt string, buffer []byte) (n int, retryable bool, err error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, fileUrl, nil)
if err != nil {
return 0, false, err
}
maybeAddAuth(req, jwt)
request_id.InjectToRequest(ctx, req)
r, err := GetGlobalHttpClient().Do(req)
if err != nil {
return 0, true, err
}
defer CloseResponse(r)
if r.StatusCode >= 400 {
if r.StatusCode == http.StatusNotFound {
return 0, true, fmt.Errorf("%s: %s: %w", fileUrl, r.Status, ErrNotFound)
}
if r.StatusCode == http.StatusTooManyRequests {
return 0, false, fmt.Errorf("%s: %s: %w", fileUrl, r.Status, ErrTooManyRequests)
}
retryable = r.StatusCode >= 499
return 0, retryable, fmt.Errorf("%s: %s", fileUrl, r.Status)
}
// Read directly into the buffer without intermediate copying
// This is significantly faster for large chunks (16MB+)
var totalRead int
for totalRead < len(buffer) {
select {
case <-ctx.Done():
return totalRead, false, ctx.Err()
default:
}
m, readErr := r.Body.Read(buffer[totalRead:])
totalRead += m
if readErr != nil {
if readErr == io.EOF {
// Return io.ErrUnexpectedEOF if we haven't filled the buffer
// This prevents silent data corruption from truncated responses
if totalRead < len(buffer) {
return totalRead, true, io.ErrUnexpectedEOF
}
return totalRead, false, nil
}
return totalRead, true, readErr
}
}
return totalRead, false, nil
}