Files

Chris Lu 5f85bf5e8a Batch volume balance: run multiple moves per job (#8561 )

* proto: add BalanceMoveSpec and batch fields to BalanceTaskParams

Add BalanceMoveSpec message for encoding individual volume moves,
and max_concurrent_moves + repeated moves fields to BalanceTaskParams
to support batching multiple volume moves in a single job.

* balance handler: add batch execution with concurrent volume moves

Refactor Execute() into executeSingleMove() (backward compatible) and
executeBatchMoves() which runs multiple volume moves concurrently using
a semaphore-bounded goroutine pool. When BalanceTaskParams.Moves is
populated, the batch path is taken; otherwise the single-move path.

Includes aggregate progress reporting across concurrent moves,
per-move error collection, and partial failure support.

* balance handler: add batch config fields to Descriptor and worker config

Add max_concurrent_moves and batch_size fields to the worker config
form and deriveBalanceWorkerConfig(). These control how many volume
moves run concurrently within a batch job and the maximum batch size.

* balance handler: group detection proposals into batch jobs

When batch_size > 1, the Detect method groups detection results into
batch proposals where each proposal encodes multiple BalanceMoveSpec
entries in BalanceTaskParams.Moves. Single-result batches fall back
to the existing single-move proposal format for backward compatibility.

* admin UI: add volume balance execution plan and batch badge

Add renderBalanceExecutionPlan() for rich rendering of volume balance
jobs in the job detail modal. Single-move jobs show source/target/volume
info; batch jobs show a moves table with all volume moves.

Add batch badge (e.g., "5 moves") next to job type in the execution
jobs table when the job has batch=true label.

* Update plugin_templ.go

* fix: detection algorithm uses greedy target instead of divergent topology scores

The detection loop tracked effective volume counts via an adjustments map,
but createBalanceTask independently called planBalanceDestination which used
the topology's LoadCount — a separate, unadjusted source of truth. This
divergence caused multiple moves to pile onto the same server.

Changes:
- Add resolveBalanceDestination to resolve the detection loop's greedy
  target (minServer) rather than independently picking a destination
- Add oscillation guard: stop when max-min <= 1 since no single move
  can improve the balance beyond that point
- Track unseeded destinations: if a target server wasn't in the initial
  serverVolumeCounts, add it so subsequent iterations include it
- Add TestDetection_UnseededDestinationDoesNotOverload

* fix: handler force_move propagation, partial failure, deterministic dedupe

- Propagate ForceMove from outer BalanceTaskParams to individual move
  TaskParams so batch moves respect the force_move flag
- Fix partial failure: mark job successful if at least one move
  succeeded (succeeded > 0 || failed == 0) to avoid re-running
  already-completed moves on retry
- Use SHA-256 hash for deterministic dedupe key fallback instead of
  time.Now().UnixNano() which is non-deterministic
- Remove unused successDetails variable
- Extract maxProposalStringLength constant to replace magic number 200

* admin UI: use template literals in balance execution plan rendering

* fix: integration test handles batch proposals from batched detection

With batch_size=20, all moves are grouped into a single proposal
containing BalanceParams.Moves instead of top-level Sources/Targets.
Update assertions to handle both batch and single-move proposal formats.

* fix: verify volume size on target before deleting source during balance

Add a pre-delete safety check that reads the volume file status on both
source and target, then compares .dat file size and file count. If they
don't match, the move is aborted — leaving the source intact rather than
risking irreversible data loss.

Also removes the redundant mountVolume call since VolumeCopy already
mounts the volume on the target server.

* fix: clamp maxConcurrent, serialize progress sends, validate config as int64

- Clamp maxConcurrentMoves to defaultMaxConcurrentMoves before creating
  the semaphore so a stale or malicious job cannot request unbounded
  concurrent volume moves
- Extend progressMu to cover sender.SendProgress calls since the
  underlying gRPC stream is not safe for concurrent writes
- Perform bounds checks on max_concurrent_moves and batch_size in int64
  space before casting to int, avoiding potential overflow on 32-bit

* fix: check disk capacity in resolveBalanceDestination

Skip disks where VolumeCount >= MaxVolumeCount so the detection loop
does not propose moves to a full disk that would fail at execution time.

* test: rename unseeded destination test to match actual behavior

The test exercises a server with 0 volumes that IS seeded from topology
(matching disk type), not an unseeded destination. Rename to
TestDetection_ZeroVolumeServerIncludedInBalance and fix comments.

* test: tighten integration test to assert exactly one batch proposal

With default batch_size=20, all moves should be grouped into a single
batch proposal. Assert len(proposals)==1 and require BalanceParams with
Moves, removing the legacy single-move else branch.

* fix: propagate ctx to RPCs and restore source writability on abort

- All helper methods (markVolumeReadonly, copyVolume, tailVolume,
  readVolumeFileStatus, deleteVolume) now accept a context parameter
  instead of using context.Background(), so Execute's ctx propagates
  cancellation and timeouts into every volume server RPC
- Add deferred cleanup that restores the source volume to writable if
  any step after markVolumeReadonly fails, preventing the source from
  being left permanently readonly on abort
- Add markVolumeWritable helper using VolumeMarkWritableRequest

* fix: deep-copy protobuf messages in test recording sender

Use proto.Clone in recordingExecutionSender to store immutable snapshots
of JobProgressUpdate and JobCompleted, preventing assertions from
observing mutations if the handler reuses message pointers.

* fix: add VolumeMarkWritable and ReadVolumeFileStatus to fake volume server

The balance task now calls ReadVolumeFileStatus for pre-delete
verification and VolumeMarkWritable to restore writability on abort.
Add both RPCs to the test fake, and drop the mountCalls assertion since
BalanceTask no longer calls VolumeMount directly (VolumeCopy handles it).

* fix: use maxConcurrentMovesLimit (50) for clamp, not defaultMaxConcurrentMoves

defaultMaxConcurrentMoves (5) is the fallback when the field is unset,
not an upper bound. Clamping to it silently overrides valid config
values like 10/20/50. Introduce maxConcurrentMovesLimit (50) matching
the descriptor's MaxValue and clamp to that instead.

* fix: cancel batch moves on progress stream failure

Derive a cancellable batchCtx from the caller's ctx. If
sender.SendProgress returns an error (client disconnect, context
cancelled), capture it, skip further sends, and cancel batchCtx so
in-flight moves abort via their propagated context rather than running
blind to completion.

* fix: bound cleanup timeout and validate batch move fields

- Use a 30-second timeout for the deferred markVolumeWritable cleanup
  instead of context.Background() which can block indefinitely if the
  volume server is unreachable
- Validate required fields (VolumeID, SourceNode, TargetNode) before
  appending moves to a batch proposal, skipping invalid entries
- Fall back to a single-move proposal when filtering leaves only one
  valid move in a batch

* fix: cancel task execution on SendProgress stream failure

All handler progress callbacks previously ignored SendProgress errors,
allowing tasks to continue executing after the client disconnected.
Now each handler creates a derived cancellable context and cancels it
on the first SendProgress error, stopping the in-flight task promptly.

Handlers fixed: erasure_coding, vacuum, volume_balance (single-move),
and admin_script (breaks command loop on send failure).

* fix: validate batch moves before scheduling in executeBatchMoves

Reject empty batches, enforce a hard upper bound (100 moves), and
filter out nil or incomplete move specs (missing source/target/volume)
before allocating progress tracking and launching goroutines.

* test: add batch balance execution integration test

Tests the batch move path with 3 volumes, max concurrency 2, using
fake volume servers. Verifies all moves complete with correct readonly,
copy, tail, and delete RPC counts.

* test: add MarkWritableCount and ReadFileStatusCount accessors

Expose the markWritableCalls and readFileStatusCalls counters on the
fake volume server, following the existing MarkReadonlyCount pattern.

* fix: oscillation guard uses global effective counts for heterogeneous capacity

The oscillation guard (max-min <= 1) previously used maxServer/minServer
which are determined by utilization ratio. With heterogeneous capacity,
maxServer by utilization can have fewer raw volumes than minServer,
producing a negative diff and incorrectly triggering the guard.

Now scans all servers' effective counts to find the true global max/min
volume counts, so the guard works correctly regardless of whether
utilization-based or raw-count balancing is used.

* fix: admin script handler breaks outer loop on SendProgress failure

The break on SendProgress error inside the shell.Commands scan only
exited the inner loop, letting the outer command loop continue
executing commands on a broken stream. Use a sendBroken flag to
propagate the break to the outer execCommands loop.

2026-03-09 19:30:08 -07:00

config

Fix maintenance worker panic and add EC integration tests (#8068 )

2026-01-20 15:07:43 -08:00

dash

fix: paginate bucket listing in Admin UI to show all buckets (#8585 )

2026-03-09 18:55:47 -07:00

handlers

fix: paginate bucket listing in Admin UI to show all buckets (#8585 )

2026-03-09 18:55:47 -07:00

internal/httputil

Admin UI: replace gin with mux (#8420 )

2026-02-23 19:11:17 -08:00

maintenance

reduce logs

2026-03-09 12:14:25 -07:00

plugin

admin: expose per-job-type detection interval in plugin UI (#8552 )

2026-03-08 14:03:51 -07:00

static

admin: fix access key creation UX (#8579 )

2026-03-09 14:03:41 -07:00

topology

fix: volume balance detection returns multiple tasks per run (#8559 )

2026-03-08 21:34:03 -07:00

view

Batch volume balance: run multiple moves per job (#8561 )

2026-03-09 19:30:08 -07:00

Makefile

consistent template generation

2026-02-22 13:34:06 -08:00

README.md

Add s3tables shell and admin UI (#8172 )

2026-01-30 22:57:05 -08:00

static_embed.go

Admin UI: Add message queue to admin UI (#6958 )

2025-07-11 10:19:27 -07:00

README.md

SeaweedFS Admin Component

A modern web-based administration interface for SeaweedFS clusters built with Go, Gin, Templ, and Bootstrap.

Features

Dashboard: Real-time cluster status and metrics
Master Management: Monitor master nodes and leadership status
Volume Server Management: View volume servers, capacity, and health
Object Store Bucket Management: Create, delete, and manage Object Store buckets with web interface
S3 Tables Management: Manage table buckets, namespaces, tables, tags, and policies via the admin UI
System Health: Overall cluster health monitoring
Responsive Design: Bootstrap-based UI that works on all devices
Authentication: Optional user authentication with sessions
TLS Support: HTTPS support for production deployments

Building

Using the Admin Makefile

The admin component has its own Makefile for development and building:

# Navigate to admin directory
cd weed/admin

# View all available targets
make help

# Generate templates and build
make build

# Development mode with template watching
make dev

# Run the admin server
make run

# Clean build artifacts
make clean

Using the Root Makefile

The root SeaweedFS Makefile automatically integrates the admin component:

# From the root directory
make install          # Builds weed with admin component
make full_install     # Full build with all tags
make test            # Runs tests including admin component

# Admin-specific targets from root
make admin-generate  # Generate admin templates
make admin-build     # Build admin component
make admin-run       # Run admin server
make admin-dev       # Development mode
make admin-clean     # Clean admin artifacts

Manual Building

If you prefer to build manually:

# Install templ compiler
go install github.com/a-h/templ/cmd/templ@latest

# Generate templates
templ generate

# Build the main weed binary
cd ../../../
go build -o weed ./weed

Development

Template Development

The admin interface uses Templ for type-safe HTML templates:

# Watch for template changes and auto-regenerate
make watch

# Or manually generate templates
make generate

# Format templates
make fmt

File Structure

weed/admin/
├── Makefile              # Admin-specific build tasks
├── README.md             # This file
├── admin.go              # Main application entry point
├── dash/                 # Server and handler logic
│   ├── admin_server.go   # HTTP server setup
│   ├── handler_admin.go  # Admin dashboard handlers
│   ├── handler_auth.go   # Authentication handlers
│   └── middleware.go     # HTTP middleware
├── static/               # Static assets
│   ├── css/admin.css     # Admin-specific styles
│   └── js/admin.js       # Admin-specific JavaScript
└── view/                 # Templates
    ├── app/              # Application templates
    │   ├── admin.templ   # Main dashboard template
    │   ├── s3_buckets.templ # Object Store bucket management template
    │   ├── s3tables_*.templ # S3 Tables management templates
    │   └── *_templ.go    # Generated Go code
    └── layout/           # Layout templates
        ├── layout.templ  # Base layout template
        └── layout_templ.go # Generated Go code

Object Store Management

The admin interface includes Object Store and S3 Tables management capabilities:

Create/delete Object Store buckets and adjust quotas or ownership.
Manage S3 Tables buckets, namespaces, and tables.
Update S3 Tables policies and tags via the UI and API endpoints.

Usage

Basic Usage

# Start admin interface on default port (23646)
weed admin

# Start with custom configuration
weed admin -port=8080 -masters="master1:9333,master2:9333"

# Start with authentication
weed admin -adminUser=admin -adminPassword=secret123

# Start with HTTPS
weed admin -port=443 -tlsCert=/path/to/cert.pem -tlsKey=/path/to/key.pem

Configuration Options

Option	Default	Description
`-port`	23646	Admin server port
`-masters`	localhost:9333	Comma-separated master servers
`-adminUser`	admin	Admin username (if auth enabled)
`-adminPassword`	""	Admin password (empty = no auth)
`-tlsCert`	""	Path to TLS certificate
`-tlsKey`	""	Path to TLS private key

Docker Usage

# Build Docker image with admin component
make docker-build

# Run with Docker
docker run -p 23646:23646 seaweedfs/seaweedfs:latest admin -masters=host.docker.internal:9333

Development Workflow

Quick Start

# Clone and setup
git clone <seaweedfs-repo>
cd seaweedfs/weed/admin

# Install dependencies and build
make install-deps
make build

# Start development server
make dev

Making Changes

Template Changes: Edit .templ files in view/
- Templates auto-regenerate in development mode
- Use make generate to manually regenerate
Go Code Changes: Edit .go files
- Restart the server to see changes
- Use make build to rebuild
Static Assets: Edit files in static/
- Changes are served immediately

Testing

# Run admin component tests
make test

# Run from root directory
make admin-test

# Lint code
make lint

# Format code
make fmt

Production Deployment

Security Considerations

Authentication: Always set adminPassword in production
HTTPS: Use TLS certificates for encrypted connections
Firewall: Restrict admin interface access to authorized networks

Example Production Setup

# Production deployment with security
weed admin \
  -port=443 \
  -masters="master1:9333,master2:9333,master3:9333" \
  -adminUser=admin \
  -adminPassword=your-secure-password \
  -tlsCert=/etc/ssl/certs/admin.crt \
  -tlsKey=/etc/ssl/private/admin.key

Monitoring

The admin interface provides endpoints for monitoring:

GET /health - Health check endpoint
GET /metrics - Prometheus metrics (if enabled)
GET /api/status - JSON status information

Troubleshooting

Common Issues

Templates not found: Run make generate to create template files
Build errors: Ensure templ is installed with make install-templ
Static files not loading: Check that static/ directory exists and has proper files
Connection errors: Verify master and filer addresses are correct

Debug Mode

# Enable debug logging
weed -v=2 admin

# Check generated templates
ls -la view/app/*_templ.go view/layout/*_templ.go

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: make test
Format code: make fmt
Submit a pull request

Architecture

The admin component follows a clean architecture:

Presentation Layer: Templ templates + Bootstrap CSS
HTTP Layer: Gin router with middleware
Business Logic: Handler functions in dash/ package
Data Layer: Communicates with SeaweedFS masters and filers

This separation makes the code maintainable and testable.