Admin: misc improvements on admin server and workers. EC now works. (#7055)

* initial design

* added simulation as tests

* reorganized the codebase to move the simulation framework and tests into their own dedicated package

* integration test. ec worker task

* remove "enhanced" reference

* start master, volume servers, filer

Current Status
 Master: Healthy and running (port 9333)
 Filer: Healthy and running (port 8888)
 Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready

* generate write load

* tasks are assigned

* admin start wtih grpc port. worker has its own working directory

* Update .gitignore

* working worker and admin. Task detection is not working yet.

* compiles, detection uses volumeSizeLimitMB from master

* compiles

* worker retries connecting to admin

* build and restart

* rendering pending tasks

* skip task ID column

* sticky worker id

* test canScheduleTaskNow

* worker reconnect to admin

* clean up logs

* worker register itself first

* worker can run ec work and report status

but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.

* move ec task logic

* listing ec shards

* local copy, ec. Need to distribute.

* ec is mostly working now

* distribution of ec shards needs improvement
* need configuration to enable ec

* show ec volumes

* interval field UI component

* rename

* integration test with vauuming

* garbage percentage threshold

* fix warning

* display ec shard sizes

* fix ec volumes list

* Update ui.go

* show default values

* ensure correct default value

* MaintenanceConfig use ConfigField

* use schema defined defaults

* config

* reduce duplication

* refactor to use BaseUIProvider

* each task register its schema

* checkECEncodingCandidate use ecDetector

* use vacuumDetector

* use volumeSizeLimitMB

* remove

remove

* remove unused

* refactor

* use new framework

* remove v2 reference

* refactor

* left menu can scroll now

* The maintenance manager was not being initialized when no data directory was configured for persistent storage.

* saving config

* Update task_config_schema_templ.go

* enable/disable tasks

* protobuf encoded task configurations

* fix system settings

* use ui component

* remove logs

* interface{} Reduction

* reduce interface{}

* reduce interface{}

* avoid from/to map

* reduce interface{}

* refactor

* keep it DRY

* added logging

* debug messages

* debug level

* debug

* show the log caller line

* use configured task policy

* log level

* handle admin heartbeat response

* Update worker.go

* fix EC rack and dc count

* Report task status to admin server

* fix task logging, simplify interface checking, use erasure_coding constants

* factor in empty volume server during task planning

* volume.list adds disk id

* track disk id also

* fix locking scheduled and manual scanning

* add active topology

* simplify task detector

* ec task completed, but shards are not showing up

* implement ec in ec_typed.go

* adjust log level

* dedup

* implementing ec copying shards and only ecx files

* use disk id when distributing ec shards

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned

* Delete original volume from all locations

* clean up existing shard locations

* local encoding and distributing

* Update docker/admin_integration/EC-TESTING-README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* check volume id range

* simplify

* fix tests

* fix types

* clean up logs and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Chris Lu
2025-07-30 12:38:03 -07:00
committed by GitHub
parent 64198dad83
commit 891a2fb6eb
130 changed files with 27737 additions and 4429 deletions

View File

@@ -70,43 +70,51 @@ templ MaintenanceQueue(data *maintenance.MaintenanceQueueData) {
</div>
</div>
<!-- Simple task queue display -->
<div class="row">
<!-- Pending Tasks -->
<div class="row mb-4">
<div class="col-12">
<div class="card">
<div class="card-header">
<h5 class="mb-0">Task Queue</h5>
<div class="card-header bg-primary text-white">
<h5 class="mb-0">
<i class="fas fa-clock me-2"></i>
Pending Tasks
</h5>
</div>
<div class="card-body">
if len(data.Tasks) == 0 {
if data.Stats.PendingTasks == 0 {
<div class="text-center text-muted py-4">
<i class="fas fa-clipboard-list fa-3x mb-3"></i>
<p>No maintenance tasks in queue</p>
<small>Tasks will appear here when the system detects maintenance needs</small>
<p>No pending maintenance tasks</p>
<small>Pending tasks will appear here when the system detects maintenance needs</small>
</div>
} else {
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>ID</th>
<th>Type</th>
<th>Status</th>
<th>Priority</th>
<th>Volume</th>
<th>Server</th>
<th>Reason</th>
<th>Created</th>
</tr>
</thead>
<tbody>
for _, task := range data.Tasks {
<tr>
<td><code>{task.ID[:8]}...</code></td>
<td>{string(task.Type)}</td>
<td>{string(task.Status)}</td>
<td>{fmt.Sprintf("%d", task.VolumeID)}</td>
<td>{task.Server}</td>
<td>{task.CreatedAt.Format("2006-01-02 15:04")}</td>
</tr>
if string(task.Status) == "pending" {
<tr>
<td>
@TaskTypeIcon(task.Type)
{string(task.Type)}
</td>
<td>@PriorityBadge(task.Priority)</td>
<td>{fmt.Sprintf("%d", task.VolumeID)}</td>
<td><small>{task.Server}</small></td>
<td><small>{task.Reason}</small></td>
<td>{task.CreatedAt.Format("2006-01-02 15:04")}</td>
</tr>
}
}
</tbody>
</table>
@@ -117,36 +125,171 @@ templ MaintenanceQueue(data *maintenance.MaintenanceQueueData) {
</div>
</div>
<!-- Workers Summary -->
<div class="row mt-4">
<!-- Active Tasks -->
<div class="row mb-4">
<div class="col-12">
<div class="card">
<div class="card-header">
<h5 class="mb-0">Active Workers</h5>
<div class="card-header bg-warning text-dark">
<h5 class="mb-0">
<i class="fas fa-running me-2"></i>
Active Tasks
</h5>
</div>
<div class="card-body">
if len(data.Workers) == 0 {
if data.Stats.RunningTasks == 0 {
<div class="text-center text-muted py-4">
<i class="fas fa-robot fa-3x mb-3"></i>
<p>No workers are currently active</p>
<small>Start workers using: <code>weed worker -admin=localhost:9333</code></small>
<i class="fas fa-tasks fa-3x mb-3"></i>
<p>No active maintenance tasks</p>
<small>Active tasks will appear here when workers start processing them</small>
</div>
} else {
<div class="row">
for _, worker := range data.Workers {
<div class="col-md-4 mb-3">
<div class="card">
<div class="card-body">
<h6 class="card-title">{worker.ID}</h6>
<p class="card-text">
<small class="text-muted">{worker.Address}</small><br/>
Status: {worker.Status}<br/>
Load: {fmt.Sprintf("%d/%d", worker.CurrentLoad, worker.MaxConcurrent)}
</p>
</div>
</div>
</div>
}
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Type</th>
<th>Status</th>
<th>Progress</th>
<th>Volume</th>
<th>Worker</th>
<th>Started</th>
</tr>
</thead>
<tbody>
for _, task := range data.Tasks {
if string(task.Status) == "assigned" || string(task.Status) == "in_progress" {
<tr>
<td>
@TaskTypeIcon(task.Type)
{string(task.Type)}
</td>
<td>@StatusBadge(task.Status)</td>
<td>@ProgressBar(task.Progress, task.Status)</td>
<td>{fmt.Sprintf("%d", task.VolumeID)}</td>
<td>
if task.WorkerID != "" {
<small>{task.WorkerID}</small>
} else {
<span class="text-muted">-</span>
}
</td>
<td>
if task.StartedAt != nil {
{task.StartedAt.Format("2006-01-02 15:04")}
} else {
<span class="text-muted">-</span>
}
</td>
</tr>
}
}
</tbody>
</table>
</div>
}
</div>
</div>
</div>
</div>
<!-- Completed Tasks -->
<div class="row mb-4">
<div class="col-12">
<div class="card">
<div class="card-header bg-success text-white">
<h5 class="mb-0">
<i class="fas fa-check-circle me-2"></i>
Completed Tasks
</h5>
</div>
<div class="card-body">
if data.Stats.CompletedToday == 0 && data.Stats.FailedToday == 0 {
<div class="text-center text-muted py-4">
<i class="fas fa-check-circle fa-3x mb-3"></i>
<p>No completed maintenance tasks today</p>
<small>Completed tasks will appear here after workers finish processing them</small>
</div>
} else {
<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Type</th>
<th>Status</th>
<th>Volume</th>
<th>Worker</th>
<th>Duration</th>
<th>Completed</th>
</tr>
</thead>
<tbody>
for _, task := range data.Tasks {
if string(task.Status) == "completed" || string(task.Status) == "failed" || string(task.Status) == "cancelled" {
if string(task.Status) == "failed" {
<tr class="table-danger">
<td>
@TaskTypeIcon(task.Type)
{string(task.Type)}
</td>
<td>@StatusBadge(task.Status)</td>
<td>{fmt.Sprintf("%d", task.VolumeID)}</td>
<td>
if task.WorkerID != "" {
<small>{task.WorkerID}</small>
} else {
<span class="text-muted">-</span>
}
</td>
<td>
if task.StartedAt != nil && task.CompletedAt != nil {
{formatDuration(task.CompletedAt.Sub(*task.StartedAt))}
} else {
<span class="text-muted">-</span>
}
</td>
<td>
if task.CompletedAt != nil {
{task.CompletedAt.Format("2006-01-02 15:04")}
} else {
<span class="text-muted">-</span>
}
</td>
</tr>
} else {
<tr>
<td>
@TaskTypeIcon(task.Type)
{string(task.Type)}
</td>
<td>@StatusBadge(task.Status)</td>
<td>{fmt.Sprintf("%d", task.VolumeID)}</td>
<td>
if task.WorkerID != "" {
<small>{task.WorkerID}</small>
} else {
<span class="text-muted">-</span>
}
</td>
<td>
if task.StartedAt != nil && task.CompletedAt != nil {
{formatDuration(task.CompletedAt.Sub(*task.StartedAt))}
} else {
<span class="text-muted">-</span>
}
</td>
<td>
if task.CompletedAt != nil {
{task.CompletedAt.Format("2006-01-02 15:04")}
} else {
<span class="text-muted">-</span>
}
</td>
</tr>
}
}
}
</tbody>
</table>
</div>
}
</div>
@@ -156,6 +299,9 @@ templ MaintenanceQueue(data *maintenance.MaintenanceQueueData) {
</div>
<script>
// Debug output to browser console
console.log("DEBUG: Maintenance Queue Template loaded");
// Auto-refresh every 10 seconds
setInterval(function() {
if (!document.hidden) {
@@ -163,7 +309,8 @@ templ MaintenanceQueue(data *maintenance.MaintenanceQueueData) {
}
}, 10000);
function triggerScan() {
window.triggerScan = function() {
console.log("triggerScan called");
fetch('/api/maintenance/scan', {
method: 'POST',
headers: {
@@ -182,7 +329,12 @@ templ MaintenanceQueue(data *maintenance.MaintenanceQueueData) {
.catch(error => {
alert('Error: ' + error.message);
});
}
};
window.refreshPage = function() {
console.log("refreshPage called");
window.location.reload();
};
</script>
}
@@ -243,32 +395,13 @@ templ ProgressBar(progress float64, status maintenance.MaintenanceTaskStatus) {
}
}
templ WorkerStatusBadge(status string) {
switch status {
case "active":
<span class="badge bg-success">Active</span>
case "busy":
<span class="badge bg-warning">Busy</span>
case "inactive":
<span class="badge bg-secondary">Inactive</span>
default:
<span class="badge bg-light text-dark">Unknown</span>
}
}
// Helper functions (would be defined in Go)
func getWorkerStatusColor(status string) string {
switch status {
case "active":
return "success"
case "busy":
return "warning"
case "inactive":
return "secondary"
default:
return "light"
func formatDuration(d time.Duration) string {
if d < time.Minute {
return fmt.Sprintf("%.0fs", d.Seconds())
} else if d < time.Hour {
return fmt.Sprintf("%.1fm", d.Minutes())
} else {
return fmt.Sprintf("%.1fh", d.Hours())
}
}