Adds volume.merge command with deduplication and disk-based backend (#8441)

* Enhance volume.merge command with deduplication and disk-based backend

* Fix copyVolume function call with correct argument order and missing bool parameter

* Revert "Fix copyVolume function call with correct argument order and missing bool parameter"

This reverts commit 7b4a190643576fec11f896b26bcad03dd02da2f7.

* Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures

* Optimize memory usage with watermark approach for duplicate detection

* Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging

* Replace temporary file with in-memory buffer for needle blob serialization

* test(volume.merge): Add comprehensive unit and integration tests

Add 7 unit tests covering:
- Ordering by timestamp
- Cross-stream duplicate deduplication
- Empty stream handling
- Complex multi-stream deduplication
- Single stream passthrough
- Large needle ID support
- LastModified fallback when timestamp unavailable

Add 2 integration validation tests:
- TestMergeWorkflowValidation: Documents 9-stage merge workflow
- TestMergeEdgeCaseHandling: Validates 10 edge case handling

All tests passing (9/9)

* fix(volume.merge): Use time window for deduplication to handle clock skew

The same needle ID can have different timestamps on different servers due to
clock skew and replication lag. Needles with the same ID within a 5-second
time window are now treated as duplicates (same write with timestamp variance).

Key changes:
- Add mergeDeduplicationWindowNs constant (5 seconds)
- Replace exact timestamp matching with time window comparison
- Use windowInitialized flag to properly detect window transitions
- Add TestMergeNeedleStreamsTimeWindowDeduplication test

This ensures that replicated writes with slight timestamp differences are
properly deduplicated during merge, while separate updates to the same file
ID (outside the window) are preserved.

All tests passing (10/10)

* test: Add volume.merge integration tests with 5 comprehensive test cases

* test: integration tests for volume.merge command

* Fix integration tests: use TripleVolumeCluster for volume.merge testing

- Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers
- Rebuilt weed binary with volume.merge command compiled in
- Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster
- Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd
- All 5 integration tests now pass:
  - TestVolumeMergeBasic
  - TestVolumeMergeReadonly
  - TestVolumeMergeRestore
  - TestVolumeMergeTailNeedles
  - TestVolumeMergeDivergentReplicas

* Refactor test framework: use parameterized server count instead of hardcoded

- Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter
- Replaced hardcoded volumePort0/1/2 with slices for flexible server count
- Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3)
- Made directory creation, port allocation, and server startup loop-based
- Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count
- All 5 integration tests continue to pass with new parameterized cluster framework
- Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly

* Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster

- Made DualVolumeCluster a type alias for MultiVolumeCluster
- Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2)
- Removed duplicate code from cluster_dual.go (now just 17 lines)
- All existing tests using StartDualVolumeCluster continue to work without changes
- Backward compatible: existing code continues to use the old function signatures
- Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster
- Enables unified cluster management across all test suites

* Address PR review comments: improve error handling and clean up code

- Replace parse error swallow with proper error return
- Log cleanup and restoration errors instead of silently discarding them
- Remove unused offset field from memoryBackendFile struct
- Fix WriteAt buffer truncation bug to preserve trailing bytes
- All unit tests passing (10/10)
- Code compiles successfully

* Fix PR review findings: test improvements and code quality

- Add timeout to runWeedShell to prevent hanging
- Add server 1 readonly status verification in tests
- Assert merge fails when replicas writable (not just log output)
- Replace sleep with polling for writable restoration check
- Fix WriteAt stale data snapshot bug in memoryBackendFile
- Fix startVolume error logging to show current server log
- Fix volumePubPorts double assignment in port allocation
- Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows
- Fix misleading dedup window comment

Unit tests: 10/10 passing
Binary: Compiles successfully

* Fix test assumption: merge command marks volumes readonly automatically

TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the
merge command is designed to mark volumes readonly as part of its operation. Fixed
test to verify merge succeeds on writable volumes and properly restores writable
state afterward. Removed redundant Test 2 code that duplicated the new behavior.

* fmt

* Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates

The dedup map previously used only NeedleId as key, causing same-stream
overwrites to be incorrectly skipped as duplicates. Changed to track which
stream first processed each needle ID in the current window:

- Cross-stream duplicates (same ID from different streams, within window) are skipped
- Same-stream duplicates (overwrites from same stream) are kept
- Map now stores: needleId -> streamIndex of first occurrence in window

Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream
overwrites are preserved while cross-stream duplicates are skipped.

All unit tests passing (11/11)
Binary compiles successfully
This commit is contained in:
Chris Lu
2026-02-25 10:12:09 -08:00
committed by GitHub
parent da4edb5fe6
commit b565a0cc86
7 changed files with 1760 additions and 286 deletions

Binary file not shown.

Binary file not shown.

View File

@@ -1,297 +1,17 @@
package framework
import (
"fmt"
"net"
"os"
"os/exec"
"path/filepath"
"strconv"
"sync"
"testing"
"github.com/seaweedfs/seaweedfs/test/volume_server/matrix"
)
type DualVolumeCluster struct {
testingTB testing.TB
profile matrix.Profile
weedBinary string
baseDir string
configDir string
logsDir string
keepLogs bool
masterPort int
masterGrpcPort int
volumePort0 int
volumeGrpcPort0 int
volumePubPort0 int
volumePort1 int
volumeGrpcPort1 int
volumePubPort1 int
masterCmd *exec.Cmd
volumeCmd0 *exec.Cmd
volumeCmd1 *exec.Cmd
cleanupOnce sync.Once
}
// DualVolumeCluster is deprecated. Use MultiVolumeCluster instead.
// For backward compatibility, it's a type alias for MultiVolumeCluster.
type DualVolumeCluster = MultiVolumeCluster
// StartDualVolumeCluster starts a cluster with 2 volume servers.
// Deprecated: Use StartMultiVolumeCluster(t, profile, 2) directly.
func StartDualVolumeCluster(t testing.TB, profile matrix.Profile) *DualVolumeCluster {
t.Helper()
weedBinary, err := FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("resolve weed binary: %v", err)
}
baseDir, keepLogs, err := newWorkDir()
if err != nil {
t.Fatalf("create temp test directory: %v", err)
}
configDir := filepath.Join(baseDir, "config")
logsDir := filepath.Join(baseDir, "logs")
masterDataDir := filepath.Join(baseDir, "master")
volumeDataDir0 := filepath.Join(baseDir, "volume0")
volumeDataDir1 := filepath.Join(baseDir, "volume1")
for _, dir := range []string{configDir, logsDir, masterDataDir, volumeDataDir0, volumeDataDir1} {
if mkErr := os.MkdirAll(dir, 0o755); mkErr != nil {
t.Fatalf("create %s: %v", dir, mkErr)
}
}
if err = writeSecurityConfig(configDir, profile); err != nil {
t.Fatalf("write security config: %v", err)
}
masterPort, masterGrpcPort, err := allocateMasterPortPair()
if err != nil {
t.Fatalf("allocate master port pair: %v", err)
}
ports, err := allocatePorts(6)
if err != nil {
t.Fatalf("allocate volume ports: %v", err)
}
c := &DualVolumeCluster{
testingTB: t,
profile: profile,
weedBinary: weedBinary,
baseDir: baseDir,
configDir: configDir,
logsDir: logsDir,
keepLogs: keepLogs,
masterPort: masterPort,
masterGrpcPort: masterGrpcPort,
volumePort0: ports[0],
volumeGrpcPort0: ports[1],
volumePubPort0: ports[0],
volumePort1: ports[2],
volumeGrpcPort1: ports[3],
volumePubPort1: ports[2],
}
if profile.SplitPublicPort {
c.volumePubPort0 = ports[4]
c.volumePubPort1 = ports[5]
}
if err = c.startMaster(masterDataDir); err != nil {
c.Stop()
t.Fatalf("start master: %v", err)
}
if err = c.waitForHTTP(c.MasterURL() + "/dir/status"); err != nil {
masterLog := c.tailLog("master.log")
c.Stop()
t.Fatalf("wait for master readiness: %v\nmaster log tail:\n%s", err, masterLog)
}
if err = c.startVolume(0, volumeDataDir0); err != nil {
masterLog := c.tailLog("master.log")
c.Stop()
t.Fatalf("start first volume server: %v\nmaster log tail:\n%s", err, masterLog)
}
if err = c.waitForHTTP(c.VolumeAdminURL(0) + "/status"); err != nil {
volumeLog := c.tailLog("volume0.log")
c.Stop()
t.Fatalf("wait for first volume readiness: %v\nvolume log tail:\n%s", err, volumeLog)
}
if err = c.waitForTCP(c.VolumeGRPCAddress(0)); err != nil {
volumeLog := c.tailLog("volume0.log")
c.Stop()
t.Fatalf("wait for first volume grpc readiness: %v\nvolume log tail:\n%s", err, volumeLog)
}
if err = c.startVolume(1, volumeDataDir1); err != nil {
volumeLog := c.tailLog("volume0.log")
c.Stop()
t.Fatalf("start second volume server: %v\nfirst volume log tail:\n%s", err, volumeLog)
}
if err = c.waitForHTTP(c.VolumeAdminURL(1) + "/status"); err != nil {
volumeLog := c.tailLog("volume1.log")
c.Stop()
t.Fatalf("wait for second volume readiness: %v\nvolume log tail:\n%s", err, volumeLog)
}
if err = c.waitForTCP(c.VolumeGRPCAddress(1)); err != nil {
volumeLog := c.tailLog("volume1.log")
c.Stop()
t.Fatalf("wait for second volume grpc readiness: %v\nvolume log tail:\n%s", err, volumeLog)
}
t.Cleanup(func() {
c.Stop()
})
return c
}
func (c *DualVolumeCluster) Stop() {
if c == nil {
return
}
c.cleanupOnce.Do(func() {
stopProcess(c.volumeCmd1)
stopProcess(c.volumeCmd0)
stopProcess(c.masterCmd)
if !c.keepLogs && !c.testingTB.Failed() {
_ = os.RemoveAll(c.baseDir)
} else if c.baseDir != "" {
c.testingTB.Logf("volume server integration logs kept at %s", c.baseDir)
}
})
}
func (c *DualVolumeCluster) startMaster(dataDir string) error {
logFile, err := os.Create(filepath.Join(c.logsDir, "master.log"))
if err != nil {
return err
}
args := []string{
"-config_dir=" + c.configDir,
"master",
"-ip=127.0.0.1",
"-port=" + strconv.Itoa(c.masterPort),
"-port.grpc=" + strconv.Itoa(c.masterGrpcPort),
"-mdir=" + dataDir,
"-peers=none",
"-volumeSizeLimitMB=" + strconv.Itoa(testVolumeSizeLimitMB),
"-defaultReplication=000",
}
c.masterCmd = exec.Command(c.weedBinary, args...)
c.masterCmd.Dir = c.baseDir
c.masterCmd.Stdout = logFile
c.masterCmd.Stderr = logFile
return c.masterCmd.Start()
}
func (c *DualVolumeCluster) startVolume(index int, dataDir string) error {
logName := fmt.Sprintf("volume%d.log", index)
logFile, err := os.Create(filepath.Join(c.logsDir, logName))
if err != nil {
return err
}
volumePort := c.volumePort0
volumeGrpcPort := c.volumeGrpcPort0
volumePubPort := c.volumePubPort0
if index == 1 {
volumePort = c.volumePort1
volumeGrpcPort = c.volumeGrpcPort1
volumePubPort = c.volumePubPort1
}
args := []string{
"-config_dir=" + c.configDir,
"volume",
"-ip=127.0.0.1",
"-port=" + strconv.Itoa(volumePort),
"-port.grpc=" + strconv.Itoa(volumeGrpcPort),
"-port.public=" + strconv.Itoa(volumePubPort),
"-dir=" + dataDir,
"-max=16",
"-master=127.0.0.1:" + strconv.Itoa(c.masterPort),
"-readMode=" + c.profile.ReadMode,
"-concurrentUploadLimitMB=" + strconv.Itoa(c.profile.ConcurrentUploadLimitMB),
"-concurrentDownloadLimitMB=" + strconv.Itoa(c.profile.ConcurrentDownloadLimitMB),
}
if c.profile.InflightUploadTimeout > 0 {
args = append(args, "-inflightUploadDataTimeout="+c.profile.InflightUploadTimeout.String())
}
if c.profile.InflightDownloadTimeout > 0 {
args = append(args, "-inflightDownloadDataTimeout="+c.profile.InflightDownloadTimeout.String())
}
cmd := exec.Command(c.weedBinary, args...)
cmd.Dir = c.baseDir
cmd.Stdout = logFile
cmd.Stderr = logFile
if err = cmd.Start(); err != nil {
return err
}
if index == 1 {
c.volumeCmd1 = cmd
} else {
c.volumeCmd0 = cmd
}
return nil
}
func (c *DualVolumeCluster) waitForHTTP(url string) error {
return (&Cluster{}).waitForHTTP(url)
}
func (c *DualVolumeCluster) waitForTCP(addr string) error {
return (&Cluster{}).waitForTCP(addr)
}
func (c *DualVolumeCluster) tailLog(logName string) string {
return (&Cluster{logsDir: c.logsDir}).tailLog(logName)
}
func (c *DualVolumeCluster) MasterAddress() string {
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.masterPort))
}
func (c *DualVolumeCluster) MasterURL() string {
return "http://" + c.MasterAddress()
}
func (c *DualVolumeCluster) VolumeAdminAddress(index int) string {
if index == 1 {
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumePort1))
}
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumePort0))
}
func (c *DualVolumeCluster) VolumePublicAddress(index int) string {
if index == 1 {
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumePubPort1))
}
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumePubPort0))
}
func (c *DualVolumeCluster) VolumeGRPCAddress(index int) string {
if index == 1 {
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumeGrpcPort1))
}
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumeGrpcPort0))
}
func (c *DualVolumeCluster) VolumeAdminURL(index int) string {
return "http://" + c.VolumeAdminAddress(index)
}
func (c *DualVolumeCluster) VolumePublicURL(index int) string {
return "http://" + c.VolumePublicAddress(index)
}
func (c *DualVolumeCluster) BaseDir() string {
return c.baseDir
return StartMultiVolumeCluster(t, profile, 2)
}

View File

@@ -0,0 +1,303 @@
package framework
import (
"fmt"
"net"
"os"
"os/exec"
"path/filepath"
"strconv"
"sync"
"testing"
"github.com/seaweedfs/seaweedfs/test/volume_server/matrix"
)
type MultiVolumeCluster struct {
testingTB testing.TB
profile matrix.Profile
weedBinary string
baseDir string
configDir string
logsDir string
keepLogs bool
volumeServerCount int
masterPort int
masterGrpcPort int
volumePorts []int
volumeGrpcPorts []int
volumePubPorts []int
masterCmd *exec.Cmd
volumeCmds []*exec.Cmd
cleanupOnce sync.Once
}
// StartMultiVolumeCluster starts a cluster with a specified number of volume servers
func StartMultiVolumeCluster(t testing.TB, profile matrix.Profile, serverCount int) *MultiVolumeCluster {
t.Helper()
if serverCount < 1 {
t.Fatalf("serverCount must be at least 1, got %d", serverCount)
}
weedBinary, err := FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("resolve weed binary: %v", err)
}
baseDir, keepLogs, err := newWorkDir()
if err != nil {
t.Fatalf("create temp test directory: %v", err)
}
configDir := filepath.Join(baseDir, "config")
logsDir := filepath.Join(baseDir, "logs")
masterDataDir := filepath.Join(baseDir, "master")
// Create directories for master and all volume servers
dirs := []string{configDir, logsDir, masterDataDir}
for i := 0; i < serverCount; i++ {
dirs = append(dirs, filepath.Join(baseDir, fmt.Sprintf("volume%d", i)))
}
for _, dir := range dirs {
if mkErr := os.MkdirAll(dir, 0o755); mkErr != nil {
t.Fatalf("create %s: %v", dir, mkErr)
}
}
if err = writeSecurityConfig(configDir, profile); err != nil {
t.Fatalf("write security config: %v", err)
}
masterPort, masterGrpcPort, err := allocateMasterPortPair()
if err != nil {
t.Fatalf("allocate master port pair: %v", err)
}
// Allocate ports for all volume servers (3 ports per server: admin, grpc, public)
// If SplitPublicPort is true, we need an additional port per server
portsPerServer := 3
if profile.SplitPublicPort {
portsPerServer = 4
}
totalPorts := serverCount * portsPerServer
ports, err := allocatePorts(totalPorts)
if err != nil {
t.Fatalf("allocate volume ports: %v", err)
}
c := &MultiVolumeCluster{
testingTB: t,
profile: profile,
weedBinary: weedBinary,
baseDir: baseDir,
configDir: configDir,
logsDir: logsDir,
keepLogs: keepLogs,
volumeServerCount: serverCount,
masterPort: masterPort,
masterGrpcPort: masterGrpcPort,
volumePorts: make([]int, serverCount),
volumeGrpcPorts: make([]int, serverCount),
volumePubPorts: make([]int, serverCount),
volumeCmds: make([]*exec.Cmd, serverCount),
}
// Assign ports to each volume server
for i := 0; i < serverCount; i++ {
baseIdx := i * portsPerServer
c.volumePorts[i] = ports[baseIdx]
c.volumeGrpcPorts[i] = ports[baseIdx+1]
// Assign public port, using baseIdx+3 if SplitPublicPort, else baseIdx+2
pubPortIdx := baseIdx + 2
if profile.SplitPublicPort {
pubPortIdx = baseIdx + 3
}
c.volumePubPorts[i] = ports[pubPortIdx]
}
// Start master
if err = c.startMaster(masterDataDir); err != nil {
c.Stop()
t.Fatalf("start master: %v", err)
}
if err = c.waitForHTTP(c.MasterURL() + "/dir/status"); err != nil {
masterLog := c.tailLog("master.log")
c.Stop()
t.Fatalf("wait for master readiness: %v\nmaster log tail:\n%s", err, masterLog)
}
// Start all volume servers
for i := 0; i < serverCount; i++ {
volumeDataDir := filepath.Join(baseDir, fmt.Sprintf("volume%d", i))
if err = c.startVolume(i, volumeDataDir); err != nil {
// Log current server's log for debugging startup failures
volumeLog := fmt.Sprintf("volume%d.log", i)
c.Stop()
t.Fatalf("start volume server %d: %v\nvolume log tail:\n%s", i, err, c.tailLog(volumeLog))
}
if err = c.waitForHTTP(c.VolumeAdminURL(i) + "/status"); err != nil {
volumeLog := fmt.Sprintf("volume%d.log", i)
c.Stop()
t.Fatalf("wait for volume server %d readiness: %v\nvolume log tail:\n%s", i, err, c.tailLog(volumeLog))
}
if err = c.waitForTCP(c.VolumeGRPCAddress(i)); err != nil {
volumeLog := fmt.Sprintf("volume%d.log", i)
c.Stop()
t.Fatalf("wait for volume server %d grpc readiness: %v\nvolume log tail:\n%s", i, err, c.tailLog(volumeLog))
}
}
t.Cleanup(func() {
c.Stop()
})
return c
}
// StartTripleVolumeCluster is a convenience wrapper that starts a cluster with 3 volume servers
func StartTripleVolumeCluster(t testing.TB, profile matrix.Profile) *MultiVolumeCluster {
return StartMultiVolumeCluster(t, profile, 3)
}
func (c *MultiVolumeCluster) Stop() {
if c == nil {
return
}
c.cleanupOnce.Do(func() {
// Stop volume servers in reverse order
for i := len(c.volumeCmds) - 1; i >= 0; i-- {
stopProcess(c.volumeCmds[i])
}
stopProcess(c.masterCmd)
if !c.keepLogs && !c.testingTB.Failed() {
_ = os.RemoveAll(c.baseDir)
} else if c.baseDir != "" {
c.testingTB.Logf("volume server integration logs kept at %s", c.baseDir)
}
})
}
func (c *MultiVolumeCluster) startMaster(dataDir string) error {
logFile, err := os.Create(filepath.Join(c.logsDir, "master.log"))
if err != nil {
return err
}
args := []string{
"-config_dir=" + c.configDir,
"master",
"-ip=127.0.0.1",
"-port=" + strconv.Itoa(c.masterPort),
"-port.grpc=" + strconv.Itoa(c.masterGrpcPort),
"-mdir=" + dataDir,
"-peers=none",
"-volumeSizeLimitMB=" + strconv.Itoa(testVolumeSizeLimitMB),
"-defaultReplication=000",
}
c.masterCmd = exec.Command(c.weedBinary, args...)
c.masterCmd.Dir = c.baseDir
c.masterCmd.Stdout = logFile
c.masterCmd.Stderr = logFile
return c.masterCmd.Start()
}
func (c *MultiVolumeCluster) startVolume(index int, dataDir string) error {
logName := fmt.Sprintf("volume%d.log", index)
logFile, err := os.Create(filepath.Join(c.logsDir, logName))
if err != nil {
return err
}
args := []string{
"-config_dir=" + c.configDir,
"volume",
"-ip=127.0.0.1",
"-port=" + strconv.Itoa(c.volumePorts[index]),
"-port.grpc=" + strconv.Itoa(c.volumeGrpcPorts[index]),
"-port.public=" + strconv.Itoa(c.volumePubPorts[index]),
"-dir=" + dataDir,
"-max=16",
"-master=127.0.0.1:" + strconv.Itoa(c.masterPort),
"-readMode=" + c.profile.ReadMode,
"-concurrentUploadLimitMB=" + strconv.Itoa(c.profile.ConcurrentUploadLimitMB),
"-concurrentDownloadLimitMB=" + strconv.Itoa(c.profile.ConcurrentDownloadLimitMB),
}
if c.profile.InflightUploadTimeout > 0 {
args = append(args, "-inflightUploadDataTimeout="+c.profile.InflightUploadTimeout.String())
}
if c.profile.InflightDownloadTimeout > 0 {
args = append(args, "-inflightDownloadDataTimeout="+c.profile.InflightDownloadTimeout.String())
}
cmd := exec.Command(c.weedBinary, args...)
cmd.Dir = c.baseDir
cmd.Stdout = logFile
cmd.Stderr = logFile
if err = cmd.Start(); err != nil {
return err
}
c.volumeCmds[index] = cmd
return nil
}
func (c *MultiVolumeCluster) waitForHTTP(url string) error {
return (&Cluster{}).waitForHTTP(url)
}
func (c *MultiVolumeCluster) waitForTCP(addr string) error {
return (&Cluster{}).waitForTCP(addr)
}
func (c *MultiVolumeCluster) tailLog(logName string) string {
return (&Cluster{logsDir: c.logsDir}).tailLog(logName)
}
func (c *MultiVolumeCluster) MasterAddress() string {
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.masterPort))
}
func (c *MultiVolumeCluster) MasterURL() string {
return "http://" + c.MasterAddress()
}
func (c *MultiVolumeCluster) VolumeAdminAddress(index int) string {
if index < 0 || index >= len(c.volumePorts) {
return ""
}
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumePorts[index]))
}
func (c *MultiVolumeCluster) VolumePublicAddress(index int) string {
if index < 0 || index >= len(c.volumePubPorts) {
return ""
}
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumePubPorts[index]))
}
func (c *MultiVolumeCluster) VolumeGRPCAddress(index int) string {
if index < 0 || index >= len(c.volumeGrpcPorts) {
return ""
}
return net.JoinHostPort("127.0.0.1", strconv.Itoa(c.volumeGrpcPorts[index]))
}
func (c *MultiVolumeCluster) VolumeAdminURL(index int) string {
return "http://" + c.VolumeAdminAddress(index)
}
func (c *MultiVolumeCluster) VolumePublicURL(index int) string {
return "http://" + c.VolumePublicAddress(index)
}
func (c *MultiVolumeCluster) BaseDir() string {
return c.baseDir
}

View File

@@ -0,0 +1,454 @@
package volume_server_merge_test
import (
"context"
"fmt"
"os"
"os/exec"
"strings"
"testing"
"time"
"github.com/seaweedfs/seaweedfs/test/volume_server/framework"
"github.com/seaweedfs/seaweedfs/test/volume_server/matrix"
"github.com/seaweedfs/seaweedfs/weed/pb/volume_server_pb"
)
// runWeedShell executes a weed shell command by providing commands via stdin with lock/unlock.
// It uses a timeout to prevent hanging if the weed shell process becomes unresponsive.
func runWeedShell(t *testing.T, weedBinary, masterAddr, shellCommand string) (output string, err error) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, weedBinary, "shell", "-master="+masterAddr)
// Wrap command in lock/unlock for cluster-wide operations
shellCommands := "lock\n" + shellCommand + "\nunlock\nexit\n"
cmd.Stdin = strings.NewReader(shellCommands)
outputBytes, err := cmd.CombinedOutput()
output = string(outputBytes)
if err != nil {
if ctx.Err() == context.DeadlineExceeded {
t.Logf("weed shell command '%s' timed out after 30s", shellCommand)
} else {
t.Logf("weed shell command '%s' output: %s, error: %v", shellCommand, output, err)
}
}
return output, err
}
// TestVolumeMergeBasic verifies the basic volume.merge workflow using the weed shell command
func TestVolumeMergeBasic(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test in short mode")
}
// Start a triple cluster with 3 volume servers (needed for merge which allocates to a third location)
cluster := framework.StartTripleVolumeCluster(t, matrix.P1())
// Connect to volume servers to allocate volumes
conn0, volumeClient0 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(0))
defer conn0.Close()
conn1, volumeClient1 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(1))
defer conn1.Close()
const volumeID = uint32(100)
// Allocate volume on only 2 servers (replicas)
// The merge command will allocate on the 3rd server as a temporary location
framework.AllocateVolume(t, volumeClient0, volumeID, "")
framework.AllocateVolume(t, volumeClient1, volumeID, "")
t.Logf("Successfully allocated volume %d on servers 0 and 1 as replicas", volumeID)
// Get weed binary
weedBinary := os.Getenv("WEED_BINARY")
if weedBinary == "" {
var err error
weedBinary, err = framework.FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("failed to find weed binary: %v", err)
}
}
// Execute volume.merge command via weed shell
output, err := runWeedShell(t, weedBinary, cluster.MasterAddress(), fmt.Sprintf("volume.merge -volumeId %d", volumeID))
t.Logf("volume.merge command output:\n%s", output)
if err != nil {
t.Fatalf("volume.merge command failed: %v\noutput: %s", err, output)
}
// Verify the success message in output
if !strings.Contains(output, fmt.Sprintf("merged volume %d", volumeID)) {
t.Fatalf("expected success message in output, got: %s", output)
}
t.Logf("Successfully executed volume.merge command for volume %d", volumeID)
}
// TestVolumeMergeReadonly verifies that volume.merge requires readonly state
func TestVolumeMergeReadonly(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test in short mode")
}
cluster := framework.StartTripleVolumeCluster(t, matrix.P1())
// Connect to volume servers
conn0, volumeClient0 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(0))
defer conn0.Close()
conn1, volumeClient1 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(1))
defer conn1.Close()
const volumeID = uint32(101)
// Allocate volumes on only 2 servers (the merge will allocate on the 3rd)
framework.AllocateVolume(t, volumeClient0, volumeID, "")
framework.AllocateVolume(t, volumeClient1, volumeID, "")
// Get weed binary
weedBinary := os.Getenv("WEED_BINARY")
if weedBinary == "" {
var err error
weedBinary, err = framework.FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("failed to find weed binary: %v", err)
}
}
// Test 1: Merge while writable (merge command will mark volumes readonly as needed)
output, err := runWeedShell(t, weedBinary, cluster.MasterAddress(), fmt.Sprintf("volume.merge -volumeId %d", volumeID))
if err != nil {
t.Logf("merge on writable volumes failed: %v\noutput: %s", err, output)
t.Fatalf("volume.merge should work on writable volumes (marks them readonly internally)")
}
if !strings.Contains(output, fmt.Sprintf("merged volume %d", volumeID)) {
t.Fatalf("expected success message in output, got: %s", output)
}
// Verify volumes were marked readonly during merge and restored after
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Check that volumes are writable again after merge (were restored)
status0, err := volumeClient0.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get status after merge: %v", err)
}
status1, err := volumeClient1.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get status from server 1 after merge: %v", err)
}
if status0.GetIsReadOnly() {
t.Fatalf("expected volume to be writable again after merge on server 0")
}
if status1.GetIsReadOnly() {
t.Fatalf("expected volume to be writable again after merge on server 1")
}
t.Logf("Successfully tested merge on writable volumes and writable restoration")
}
// TestVolumeMergeRestore verifies that merge restores writable state for originally-writable replicas
func TestVolumeMergeRestore(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test in short mode")
}
cluster := framework.StartTripleVolumeCluster(t, matrix.P1())
conn0, volumeClient0 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(0))
defer conn0.Close()
conn1, volumeClient1 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(1))
defer conn1.Close()
const volumeID = uint32(102)
// Allocate volume on only 2 servers (the merge will allocate on the 3rd)
framework.AllocateVolume(t, volumeClient0, volumeID, "")
framework.AllocateVolume(t, volumeClient1, volumeID, "")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Mark both as readonly
_, err := volumeClient0.VolumeMarkReadonly(ctx, &volume_server_pb.VolumeMarkReadonlyRequest{
VolumeId: volumeID,
Persist: false,
})
if err != nil {
t.Fatalf("failed to mark readonly: %v", err)
}
_, err = volumeClient1.VolumeMarkReadonly(ctx, &volume_server_pb.VolumeMarkReadonlyRequest{
VolumeId: volumeID,
Persist: false,
})
if err != nil {
t.Fatalf("failed to mark readonly on server 1: %v", err)
}
// Get weed binary
weedBinary := os.Getenv("WEED_BINARY")
if weedBinary == "" {
var err error
weedBinary, err = framework.FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("failed to find weed binary: %v", err)
}
}
// Execute volume.merge via shell
output, err := runWeedShell(t, weedBinary, cluster.MasterAddress(), fmt.Sprintf("volume.merge -volumeId %d", volumeID))
t.Logf("volume.merge output: %s, error: %v", output, err)
if err != nil {
t.Fatalf("volume.merge failed: %v\noutput: %s", err, output)
}
if !strings.Contains(output, fmt.Sprintf("merged volume %d", volumeID)) {
t.Fatalf("expected success message in output, got: %s", output)
}
// After merge, verify that originally-writable replicas are writable again
// (The merge command should restore writable state for replicas that were writable before readonly)
// Actually both were writable initially, then marked readonly, so both should be restored
// Poll for writable state restoration instead of fixed sleep
maxRetries := 50 // ~5s total with 100ms sleeps
for retries := 0; retries < maxRetries; retries++ {
status0, err := volumeClient0.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err == nil && !status0.GetIsReadOnly() {
// Server 0 is writable, check server 1
status1, err := volumeClient1.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err == nil && !status1.GetIsReadOnly() {
// Both are writable, break out
break
}
}
if retries < maxRetries-1 {
time.Sleep(100 * time.Millisecond)
}
}
status0Final, err := volumeClient0.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get final status for server 0: %v", err)
}
status1Final, err := volumeClient1.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get final status for server 1: %v", err)
}
if status0Final.GetIsReadOnly() {
t.Fatalf("expected volume %d to be writable on server 0 after merge, but it's still readonly", volumeID)
}
if status1Final.GetIsReadOnly() {
t.Fatalf("expected volume %d to be writable on server 1 after merge, but it's still readonly", volumeID)
}
t.Logf("After merge - volume %d on server 0: readonly=%v, server 1: readonly=%v", volumeID, status0Final.GetIsReadOnly(), status1Final.GetIsReadOnly())
t.Logf("Successfully tested merge and restore workflow for volume %d", volumeID)
}
// TestVolumeMergeTailNeedles verifies the volume.merge command with empty volumes
func TestVolumeMergeTailNeedles(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test in short mode")
}
cluster := framework.StartTripleVolumeCluster(t, matrix.P1())
conn0, volumeClient0 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(0))
defer conn0.Close()
conn1, volumeClient1 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(1))
defer conn1.Close()
const volumeID = uint32(200)
// Allocate empty volumes on only 2 servers (the merge will allocate on the 3rd)
framework.AllocateVolume(t, volumeClient0, volumeID, "")
framework.AllocateVolume(t, volumeClient1, volumeID, "")
// Mark as readonly
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
_, err := volumeClient0.VolumeMarkReadonly(ctx, &volume_server_pb.VolumeMarkReadonlyRequest{
VolumeId: volumeID,
Persist: false,
})
if err != nil {
t.Fatalf("failed to mark readonly on server 0: %v", err)
}
_, err = volumeClient1.VolumeMarkReadonly(ctx, &volume_server_pb.VolumeMarkReadonlyRequest{
VolumeId: volumeID,
Persist: false,
})
if err != nil {
t.Fatalf("failed to mark readonly on server 1: %v", err)
}
// Get weed binary
weedBinary := os.Getenv("WEED_BINARY")
if weedBinary == "" {
var err error
weedBinary, err = framework.FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("failed to find weed binary: %v", err)
}
}
// Execute volume.merge command on empty volumes
output, err := runWeedShell(t, weedBinary, cluster.MasterAddress(), fmt.Sprintf("volume.merge -volumeId %d", volumeID))
t.Logf("merge empty volumes - output: %s, error: %v", output, err)
if err != nil {
t.Fatalf("volume.merge failed on empty volumes: %v\noutput: %s", err, output)
}
// Verify merge completed successfully
if !strings.Contains(output, fmt.Sprintf("merged volume %d", volumeID)) {
t.Fatalf("expected success message in output, got: %s", output)
}
t.Logf("Successfully merged empty volumes %d", volumeID)
}
// TestVolumeMergeDivergentReplicas simulates a realistic merge scenario using shell command
func TestVolumeMergeDivergentReplicas(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test in short mode")
}
cluster := framework.StartTripleVolumeCluster(t, matrix.P1())
// Connect to both servers
conn0, volumeClient0 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(0))
defer conn0.Close()
conn1, volumeClient1 := framework.DialVolumeServer(t, cluster.VolumeGRPCAddress(1))
defer conn1.Close()
const volumeID = uint32(201)
// Allocate the same volume on only 2 servers (the merge will allocate on the 3rd)
framework.AllocateVolume(t, volumeClient0, volumeID, "")
framework.AllocateVolume(t, volumeClient1, volumeID, "")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Verify both volumes are initially writable
status0, err := volumeClient0.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get status for server 0: %v", err)
}
status1, err := volumeClient1.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get status for server 1: %v", err)
}
if status0.GetIsReadOnly() || status1.GetIsReadOnly() {
t.Fatalf("expected both volumes to be writable initially")
}
// Mark both as readonly to simulate merge precondition
_, err = volumeClient0.VolumeMarkReadonly(ctx, &volume_server_pb.VolumeMarkReadonlyRequest{
VolumeId: volumeID,
Persist: false,
})
if err != nil {
t.Fatalf("failed to mark readonly on server 0: %v", err)
}
_, err = volumeClient1.VolumeMarkReadonly(ctx, &volume_server_pb.VolumeMarkReadonlyRequest{
VolumeId: volumeID,
Persist: false,
})
if err != nil {
t.Fatalf("failed to mark readonly on server 1: %v", err)
}
// Verify both are readonly
status0Again, err := volumeClient0.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get status after readonly: %v", err)
}
if !status0Again.GetIsReadOnly() {
t.Fatalf("expected volume %d to be readonly", volumeID)
}
// Also verify server 1 is readonly
status1Again, err := volumeClient1.VolumeStatus(ctx, &volume_server_pb.VolumeStatusRequest{
VolumeId: volumeID,
})
if err != nil {
t.Fatalf("failed to get status on server 1 after readonly: %v", err)
}
if !status1Again.GetIsReadOnly() {
t.Fatalf("expected volume %d to be readonly on server 1", volumeID)
}
// Get weed binary
weedBinary := os.Getenv("WEED_BINARY")
if weedBinary == "" {
var err error
weedBinary, err = framework.FindOrBuildWeedBinary()
if err != nil {
t.Fatalf("failed to find weed binary: %v", err)
}
}
// Execute volume.merge command via shell
output, err := runWeedShell(t, weedBinary, cluster.MasterAddress(), fmt.Sprintf("volume.merge -volumeId %d", volumeID))
t.Logf("merge divergent replicas - output: %s, error: %v", output, err)
if err != nil {
t.Fatalf("volume.merge failed: %v\noutput: %s", err, output)
}
// Verify merge completed successfully
if !strings.Contains(output, fmt.Sprintf("merged volume %d", volumeID)) {
t.Fatalf("expected success message in output, got: %s", output)
}
t.Logf("Successfully merged divergent replicas for volume %d using shell command", volumeID)
}