Process .ecj deletions during EC decode and vacuum decoded volume (#8863)
* Process .ecj deletions during EC decode and vacuum decoded volume (#8798) When decoding EC volumes back to normal volumes, deletions recorded in the .ecj journal were not being applied before computing the dat file size or checking for live needles. This caused the decoded volume to include data for deleted files and could produce false positives in the all-deleted check. - Call RebuildEcxFile before HasLiveNeedles/FindDatFileSize in VolumeEcShardsToVolume so .ecj deletions are merged into .ecx first - Vacuum the decoded volume after mounting in ec.decode to compact out deleted needle data from the .dat file - Add integration tests for decoding with non-empty .ecj files * storage: add offline volume compaction helper Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ec: compact decoded volumes before deleting shards Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ec: address PR review comments - Fall back to data directory for .ecx when idx directory lacks it - Make compaction failure non-fatal during EC decode - Remove misleading "buffer: 10%" from space check error message * ec: collect .ecj from all shard locations during decode Each server's .ecj only contains deletions for needles whose data resides in shards held by that server. Previously, sources with no new data shards to contribute were skipped entirely, losing their .ecj deletion entries. Now .ecj is always appended from every shard location so RebuildEcxFile sees the full set of deletions. * ec: add integration tests for .ecj collection during decode TestEcDecodePreservesDeletedNeedles: verifies that needles deleted via VolumeEcBlobDelete are excluded from the decoded volume. TestEcDecodeCollectsEcjFromPeer: regression test for the fix in collectEcShards. Deletes a needle only on a peer server that holds no new data shards, then verifies the deletion survives decode via .ecj collection. * ec: address review nits in decode and tests - Remove double error wrapping in mountDecodedVolume - Check VolumeUnmount error in peer ecj test - Assert 404 specifically for deleted needles, fail on 5xx --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -609,6 +609,17 @@ func (vs *VolumeServer) VolumeEcShardsToVolume(ctx context.Context, req *volume_
|
||||
}
|
||||
|
||||
dataBaseFileName, indexBaseFileName := v.DataBaseFileName(), v.IndexBaseFileName()
|
||||
if !util.FileExists(indexBaseFileName + ".ecx") {
|
||||
indexBaseFileName = dataBaseFileName
|
||||
}
|
||||
|
||||
// Merge .ecj deletions into .ecx so that HasLiveNeedles and FindDatFileSize
|
||||
// see the full set of deleted needles. Without this, needles deleted after the
|
||||
// last ecx rebuild would still appear live, causing the decoded .dat to include
|
||||
// data that should be skipped and HasLiveNeedles to return a false positive.
|
||||
if err := erasure_coding.RebuildEcxFile(indexBaseFileName); err != nil {
|
||||
return nil, fmt.Errorf("RebuildEcxFile %s: %v", indexBaseFileName, err)
|
||||
}
|
||||
|
||||
// If the EC index contains no live entries, decoding should be a no-op:
|
||||
// just allow the caller to purge EC shards and do not generate an empty normal volume.
|
||||
@@ -636,6 +647,29 @@ func (vs *VolumeServer) VolumeEcShardsToVolume(ctx context.Context, req *volume_
|
||||
return nil, fmt.Errorf("WriteIdxFileFromEcIndex %s: %v", v.IndexBaseFileName(), err)
|
||||
}
|
||||
|
||||
var volumeLocation *storage.DiskLocation
|
||||
for _, location := range vs.store.Locations {
|
||||
if candidate, found := location.FindEcVolume(needle.VolumeId(req.VolumeId)); found && candidate == v {
|
||||
volumeLocation = location
|
||||
break
|
||||
}
|
||||
}
|
||||
if volumeLocation == nil {
|
||||
return nil, fmt.Errorf("ec volume %d location not found for offline compaction", req.VolumeId)
|
||||
}
|
||||
|
||||
if err := vs.store.CompactVolumeFiles(
|
||||
needle.VolumeId(req.VolumeId),
|
||||
v.Collection,
|
||||
volumeLocation,
|
||||
vs.needleMapKind,
|
||||
vs.ldbTimout,
|
||||
0,
|
||||
vs.compactionBytePerSecond,
|
||||
); err != nil {
|
||||
glog.Errorf("CompactVolumeFiles %s: %v", dataBaseFileName, err)
|
||||
}
|
||||
|
||||
return &volume_server_pb.VolumeEcShardsToVolumeResponse{}, nil
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user