Admin: misc improvements on admin server and workers. EC now works. (#7055)

* initial design

* added simulation as tests

* reorganized the codebase to move the simulation framework and tests into their own dedicated package

* integration test. ec worker task

* remove "enhanced" reference

* start master, volume servers, filer

Current Status
 Master: Healthy and running (port 9333)
 Filer: Healthy and running (port 8888)
 Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready

* generate write load

* tasks are assigned

* admin start wtih grpc port. worker has its own working directory

* Update .gitignore

* working worker and admin. Task detection is not working yet.

* compiles, detection uses volumeSizeLimitMB from master

* compiles

* worker retries connecting to admin

* build and restart

* rendering pending tasks

* skip task ID column

* sticky worker id

* test canScheduleTaskNow

* worker reconnect to admin

* clean up logs

* worker register itself first

* worker can run ec work and report status

but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.

* move ec task logic

* listing ec shards

* local copy, ec. Need to distribute.

* ec is mostly working now

* distribution of ec shards needs improvement
* need configuration to enable ec

* show ec volumes

* interval field UI component

* rename

* integration test with vauuming

* garbage percentage threshold

* fix warning

* display ec shard sizes

* fix ec volumes list

* Update ui.go

* show default values

* ensure correct default value

* MaintenanceConfig use ConfigField

* use schema defined defaults

* config

* reduce duplication

* refactor to use BaseUIProvider

* each task register its schema

* checkECEncodingCandidate use ecDetector

* use vacuumDetector

* use volumeSizeLimitMB

* remove

remove

* remove unused

* refactor

* use new framework

* remove v2 reference

* refactor

* left menu can scroll now

* The maintenance manager was not being initialized when no data directory was configured for persistent storage.

* saving config

* Update task_config_schema_templ.go

* enable/disable tasks

* protobuf encoded task configurations

* fix system settings

* use ui component

* remove logs

* interface{} Reduction

* reduce interface{}

* reduce interface{}

* avoid from/to map

* reduce interface{}

* refactor

* keep it DRY

* added logging

* debug messages

* debug level

* debug

* show the log caller line

* use configured task policy

* log level

* handle admin heartbeat response

* Update worker.go

* fix EC rack and dc count

* Report task status to admin server

* fix task logging, simplify interface checking, use erasure_coding constants

* factor in empty volume server during task planning

* volume.list adds disk id

* track disk id also

* fix locking scheduled and manual scanning

* add active topology

* simplify task detector

* ec task completed, but shards are not showing up

* implement ec in ec_typed.go

* adjust log level

* dedup

* implementing ec copying shards and only ecx files

* use disk id when distributing ec shards

🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned

* Delete original volume from all locations

* clean up existing shard locations

* local encoding and distributing

* Update docker/admin_integration/EC-TESTING-README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* check volume id range

* simplify

* fix tests

* fix types

* clean up logs and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Chris Lu
2025-07-30 12:38:03 -07:00
committed by GitHub
parent 64198dad83
commit 891a2fb6eb
130 changed files with 27737 additions and 4429 deletions

View File

@@ -22,7 +22,7 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
<div id="collections-content">
<!-- Summary Cards -->
<div class="row mb-4">
<div class="col-xl-3 col-md-6 mb-4">
<div class="col-xl-2 col-lg-3 col-md-4 col-sm-6 mb-4">
<div class="card border-left-primary shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
@@ -42,13 +42,13 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
</div>
</div>
<div class="col-xl-3 col-md-6 mb-4">
<div class="col-xl-2 col-lg-3 col-md-4 col-sm-6 mb-4">
<div class="card border-left-info shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
<div class="col mr-2">
<div class="text-xs font-weight-bold text-info text-uppercase mb-1">
Total Volumes
Regular Volumes
</div>
<div class="h5 mb-0 font-weight-bold text-gray-800">
{fmt.Sprintf("%d", data.TotalVolumes)}
@@ -62,7 +62,27 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
</div>
</div>
<div class="col-xl-3 col-md-6 mb-4">
<div class="col-xl-2 col-lg-3 col-md-4 col-sm-6 mb-4">
<div class="card border-left-success shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
<div class="col mr-2">
<div class="text-xs font-weight-bold text-success text-uppercase mb-1">
EC Volumes
</div>
<div class="h5 mb-0 font-weight-bold text-gray-800">
{fmt.Sprintf("%d", data.TotalEcVolumes)}
</div>
</div>
<div class="col-auto">
<i class="fas fa-th-large fa-2x text-gray-300"></i>
</div>
</div>
</div>
</div>
</div>
<div class="col-xl-2 col-lg-3 col-md-4 col-sm-6 mb-4">
<div class="card border-left-warning shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
@@ -76,19 +96,19 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
</div>
<div class="col-auto">
<i class="fas fa-file fa-2x text-gray-300"></i>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="col-xl-3 col-md-6 mb-4">
<div class="col-xl-2 col-lg-3 col-md-4 col-sm-6 mb-4">
<div class="card border-left-secondary shadow h-100 py-2">
<div class="card-body">
<div class="row no-gutters align-items-center">
<div class="col mr-2">
<div class="text-xs font-weight-bold text-secondary text-uppercase mb-1">
Total Storage Size
Total Storage Size (Logical)
</div>
<div class="h5 mb-0 font-weight-bold text-gray-800">
{formatBytes(data.TotalSize)}
@@ -117,9 +137,10 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
<thead>
<tr>
<th>Collection Name</th>
<th>Volumes</th>
<th>Regular Volumes</th>
<th>EC Volumes</th>
<th>Files</th>
<th>Size</th>
<th>Size (Logical)</th>
<th>Disk Types</th>
<th>Actions</th>
</tr>
@@ -128,7 +149,7 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
for _, collection := range data.Collections {
<tr>
<td>
<a href={templ.SafeURL(fmt.Sprintf("/cluster/volumes?collection=%s", collection.Name))} class="text-decoration-none">
<a href={templ.SafeURL(fmt.Sprintf("/cluster/collections/%s", collection.Name))} class="text-decoration-none">
<strong>{collection.Name}</strong>
</a>
</td>
@@ -136,7 +157,23 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
<a href={templ.SafeURL(fmt.Sprintf("/cluster/volumes?collection=%s", collection.Name))} class="text-decoration-none">
<div class="d-flex align-items-center">
<i class="fas fa-database me-2 text-muted"></i>
{fmt.Sprintf("%d", collection.VolumeCount)}
if collection.VolumeCount > 0 {
{fmt.Sprintf("%d", collection.VolumeCount)}
} else {
<span class="text-muted">0</span>
}
</div>
</a>
</td>
<td>
<a href={templ.SafeURL(fmt.Sprintf("/cluster/ec-shards?collection=%s", collection.Name))} class="text-decoration-none">
<div class="d-flex align-items-center">
<i class="fas fa-th-large me-2 text-muted"></i>
if collection.EcVolumeCount > 0 {
{fmt.Sprintf("%d", collection.EcVolumeCount)}
} else {
<span class="text-muted">0</span>
}
</div>
</a>
</td>
@@ -171,6 +208,7 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
data-name={collection.Name}
data-datacenter={collection.DataCenter}
data-volume-count={fmt.Sprintf("%d", collection.VolumeCount)}
data-ec-volume-count={fmt.Sprintf("%d", collection.EcVolumeCount)}
data-file-count={fmt.Sprintf("%d", collection.FileCount)}
data-total-size={fmt.Sprintf("%d", collection.TotalSize)}
data-disk-types={formatDiskTypes(collection.DiskTypes)}>
@@ -223,6 +261,7 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
name: button.getAttribute('data-name'),
datacenter: button.getAttribute('data-datacenter'),
volumeCount: parseInt(button.getAttribute('data-volume-count')),
ecVolumeCount: parseInt(button.getAttribute('data-ec-volume-count')),
fileCount: parseInt(button.getAttribute('data-file-count')),
totalSize: parseInt(button.getAttribute('data-total-size')),
diskTypes: button.getAttribute('data-disk-types')
@@ -260,19 +299,25 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
'<div class="col-md-6">' +
'<h6 class="text-primary"><i class="fas fa-chart-bar me-1"></i>Storage Statistics</h6>' +
'<table class="table table-sm">' +
'<tr><td><strong>Total Volumes:</strong></td><td>' +
'<tr><td><strong>Regular Volumes:</strong></td><td>' +
'<div class="d-flex align-items-center">' +
'<i class="fas fa-database me-2 text-muted"></i>' +
'<span>' + collection.volumeCount.toLocaleString() + '</span>' +
'</div>' +
'</td></tr>' +
'<tr><td><strong>EC Volumes:</strong></td><td>' +
'<div class="d-flex align-items-center">' +
'<i class="fas fa-th-large me-2 text-muted"></i>' +
'<span>' + collection.ecVolumeCount.toLocaleString() + '</span>' +
'</div>' +
'</td></tr>' +
'<tr><td><strong>Total Files:</strong></td><td>' +
'<div class="d-flex align-items-center">' +
'<i class="fas fa-file me-2 text-muted"></i>' +
'<span>' + collection.fileCount.toLocaleString() + '</span>' +
'</div>' +
'</td></tr>' +
'<tr><td><strong>Total Size:</strong></td><td>' +
'<tr><td><strong>Total Size (Logical):</strong></td><td>' +
'<div class="d-flex align-items-center">' +
'<i class="fas fa-hdd me-2 text-muted"></i>' +
'<span>' + formatBytes(collection.totalSize) + '</span>' +
@@ -288,6 +333,9 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
'<a href="/cluster/volumes?collection=' + encodeURIComponent(collection.name) + '" class="btn btn-outline-primary">' +
'<i class="fas fa-database me-1"></i>View Volumes' +
'</a>' +
'<a href="/cluster/ec-shards?collection=' + encodeURIComponent(collection.name) + '" class="btn btn-outline-secondary">' +
'<i class="fas fa-th-large me-1"></i>View EC Volumes' +
'</a>' +
'<a href="/files?collection=' + encodeURIComponent(collection.name) + '" class="btn btn-outline-info">' +
'<i class="fas fa-folder me-1"></i>Browse Files' +
'</a>' +
@@ -295,6 +343,7 @@ templ ClusterCollections(data dash.ClusterCollectionsData) {
'</div>' +
'</div>' +
'</div>' +
'</div>' +
'<div class="modal-footer">' +
'<button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Close</button>' +
'</div>' +