fix: clean up orphaned needles on remote.cache partial download failure (#8675)

When remote.cache downloads a file in parallel chunks and a gRPC
connection drops mid-transfer, chunks already written to volume servers
were not cleaned up. Since the filer metadata was never updated, these
needles became orphaned — invisible to volume.vacuum and never
referenced by the filer. On subsequent cache cycles the file was still
treated as uncached, creating more orphans each attempt.

Call DeleteUncommittedChunks on the download-error path, matching the
cleanup already present for the metadata-update-failure path.

Fixes #8481
This commit is contained in:
Chris Lu
2026-03-17 13:47:54 -07:00
committed by GitHub
parent 558a83661e
commit f4073107cb

View File

@@ -241,6 +241,12 @@ func (fs *FilerServer) doCacheRemoteObjectToLocalCluster(ctx context.Context, re
}) })
chunksMu.Unlock() chunksMu.Unlock()
if err != nil { if err != nil {
// Clean up any chunks that were successfully written before the error.
// Without this, partial downloads leave orphaned needles in volume servers
// that accumulate across retry cycles and cannot be reclaimed by vacuum.
if len(chunks) > 0 {
fs.filer.DeleteUncommittedChunks(ctx, chunks)
}
return nil, err return nil, err
} }