Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design * added simulation as tests * reorganized the codebase to move the simulation framework and tests into their own dedicated package * integration test. ec worker task * remove "enhanced" reference * start master, volume servers, filer Current Status ✅ Master: Healthy and running (port 9333) ✅ Filer: Healthy and running (port 8888) ✅ Volume Servers: All 6 servers running (ports 8080-8085) 🔄 Admin/Workers: Will start when dependencies are ready * generate write load * tasks are assigned * admin start wtih grpc port. worker has its own working directory * Update .gitignore * working worker and admin. Task detection is not working yet. * compiles, detection uses volumeSizeLimitMB from master * compiles * worker retries connecting to admin * build and restart * rendering pending tasks * skip task ID column * sticky worker id * test canScheduleTaskNow * worker reconnect to admin * clean up logs * worker register itself first * worker can run ec work and report status but: 1. one volume should not be repeatedly worked on. 2. ec shards needs to be distributed and source data should be deleted. * move ec task logic * listing ec shards * local copy, ec. Need to distribute. * ec is mostly working now * distribution of ec shards needs improvement * need configuration to enable ec * show ec volumes * interval field UI component * rename * integration test with vauuming * garbage percentage threshold * fix warning * display ec shard sizes * fix ec volumes list * Update ui.go * show default values * ensure correct default value * MaintenanceConfig use ConfigField * use schema defined defaults * config * reduce duplication * refactor to use BaseUIProvider * each task register its schema * checkECEncodingCandidate use ecDetector * use vacuumDetector * use volumeSizeLimitMB * remove remove * remove unused * refactor * use new framework * remove v2 reference * refactor * left menu can scroll now * The maintenance manager was not being initialized when no data directory was configured for persistent storage. * saving config * Update task_config_schema_templ.go * enable/disable tasks * protobuf encoded task configurations * fix system settings * use ui component * remove logs * interface{} Reduction * reduce interface{} * reduce interface{} * avoid from/to map * reduce interface{} * refactor * keep it DRY * added logging * debug messages * debug level * debug * show the log caller line * use configured task policy * log level * handle admin heartbeat response * Update worker.go * fix EC rack and dc count * Report task status to admin server * fix task logging, simplify interface checking, use erasure_coding constants * factor in empty volume server during task planning * volume.list adds disk id * track disk id also * fix locking scheduled and manual scanning * add active topology * simplify task detector * ec task completed, but shards are not showing up * implement ec in ec_typed.go * adjust log level * dedup * implementing ec copying shards and only ecx files * use disk id when distributing ec shards 🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk 📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId 🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest 💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId]) 📂 File System: EC shards and metadata land in the exact disk directory planned * Delete original volume from all locations * clean up existing shard locations * local encoding and distributing * Update docker/admin_integration/EC-TESTING-README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * check volume id range * simplify * fix tests * fix types * clean up logs and tests --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
@@ -47,63 +47,70 @@ templ MaintenanceConfig(data *maintenance.MaintenanceConfigData) {
|
||||
<div class="mb-3">
|
||||
<label for="scanInterval" class="form-label">Scan Interval (minutes)</label>
|
||||
<input type="number" class="form-control" id="scanInterval"
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.ScanIntervalSeconds)/60)} min="1" max="1440">
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.ScanIntervalSeconds)/60)}
|
||||
placeholder="30 (default)" min="1" max="1440">
|
||||
<small class="form-text text-muted">
|
||||
How often to scan for maintenance tasks (1-1440 minutes).
|
||||
How often to scan for maintenance tasks (1-1440 minutes). <strong>Default: 30 minutes</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
<label for="workerTimeout" class="form-label">Worker Timeout (minutes)</label>
|
||||
<input type="number" class="form-control" id="workerTimeout"
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.WorkerTimeoutSeconds)/60)} min="1" max="60">
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.WorkerTimeoutSeconds)/60)}
|
||||
placeholder="5 (default)" min="1" max="60">
|
||||
<small class="form-text text-muted">
|
||||
How long to wait for worker heartbeat before considering it inactive (1-60 minutes).
|
||||
How long to wait for worker heartbeat before considering it inactive (1-60 minutes). <strong>Default: 5 minutes</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
<label for="taskTimeout" class="form-label">Task Timeout (hours)</label>
|
||||
<input type="number" class="form-control" id="taskTimeout"
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.TaskTimeoutSeconds)/3600)} min="1" max="24">
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.TaskTimeoutSeconds)/3600)}
|
||||
placeholder="2 (default)" min="1" max="24">
|
||||
<small class="form-text text-muted">
|
||||
Maximum time allowed for a single task to complete (1-24 hours).
|
||||
Maximum time allowed for a single task to complete (1-24 hours). <strong>Default: 2 hours</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
<label for="globalMaxConcurrent" class="form-label">Global Concurrent Limit</label>
|
||||
<input type="number" class="form-control" id="globalMaxConcurrent"
|
||||
value={fmt.Sprintf("%d", data.Config.Policy.GlobalMaxConcurrent)} min="1" max="20">
|
||||
value={fmt.Sprintf("%d", data.Config.Policy.GlobalMaxConcurrent)}
|
||||
placeholder="4 (default)" min="1" max="20">
|
||||
<small class="form-text text-muted">
|
||||
Maximum number of maintenance tasks that can run simultaneously across all workers (1-20).
|
||||
Maximum number of maintenance tasks that can run simultaneously across all workers (1-20). <strong>Default: 4</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
<label for="maxRetries" class="form-label">Default Max Retries</label>
|
||||
<input type="number" class="form-control" id="maxRetries"
|
||||
value={fmt.Sprintf("%d", data.Config.MaxRetries)} min="0" max="10">
|
||||
value={fmt.Sprintf("%d", data.Config.MaxRetries)}
|
||||
placeholder="3 (default)" min="0" max="10">
|
||||
<small class="form-text text-muted">
|
||||
Default number of times to retry failed tasks (0-10).
|
||||
Default number of times to retry failed tasks (0-10). <strong>Default: 3</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
<label for="retryDelay" class="form-label">Retry Delay (minutes)</label>
|
||||
<input type="number" class="form-control" id="retryDelay"
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.RetryDelaySeconds)/60)} min="1" max="120">
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.RetryDelaySeconds)/60)}
|
||||
placeholder="15 (default)" min="1" max="120">
|
||||
<small class="form-text text-muted">
|
||||
Time to wait before retrying failed tasks (1-120 minutes).
|
||||
Time to wait before retrying failed tasks (1-120 minutes). <strong>Default: 15 minutes</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
<div class="mb-3">
|
||||
<label for="taskRetention" class="form-label">Task Retention (days)</label>
|
||||
<input type="number" class="form-control" id="taskRetention"
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.TaskRetentionSeconds)/(24*3600))} min="1" max="30">
|
||||
value={fmt.Sprintf("%.0f", float64(data.Config.TaskRetentionSeconds)/(24*3600))}
|
||||
placeholder="7 (default)" min="1" max="30">
|
||||
<small class="form-text text-muted">
|
||||
How long to keep completed/failed task records (1-30 days).
|
||||
How long to keep completed/failed task records (1-30 days). <strong>Default: 7 days</strong>
|
||||
</small>
|
||||
</div>
|
||||
|
||||
@@ -143,7 +150,7 @@ templ MaintenanceConfig(data *maintenance.MaintenanceConfigData) {
|
||||
<i class={menuItem.Icon + " me-2"}></i>
|
||||
{menuItem.DisplayName}
|
||||
</h6>
|
||||
if data.Config.Policy.IsTaskEnabled(menuItem.TaskType) {
|
||||
if menuItem.IsEnabled {
|
||||
<span class="badge bg-success">Enabled</span>
|
||||
} else {
|
||||
<span class="badge bg-secondary">Disabled</span>
|
||||
@@ -200,44 +207,60 @@ templ MaintenanceConfig(data *maintenance.MaintenanceConfigData) {
|
||||
|
||||
<script>
|
||||
function saveConfiguration() {
|
||||
const config = {
|
||||
enabled: document.getElementById('enabled').checked,
|
||||
scan_interval_seconds: parseInt(document.getElementById('scanInterval').value) * 60, // Convert to seconds
|
||||
policy: {
|
||||
vacuum_enabled: document.getElementById('vacuumEnabled').checked,
|
||||
vacuum_garbage_ratio: parseFloat(document.getElementById('vacuumGarbageRatio').value) / 100,
|
||||
replication_fix_enabled: document.getElementById('replicationFixEnabled').checked,
|
||||
}
|
||||
};
|
||||
// First, get current configuration to preserve existing values
|
||||
fetch('/api/maintenance/config')
|
||||
.then(response => response.json())
|
||||
.then(currentConfig => {
|
||||
// Update only the fields from the form
|
||||
const updatedConfig = {
|
||||
...currentConfig.config, // Preserve existing config
|
||||
enabled: document.getElementById('enabled').checked,
|
||||
scan_interval_seconds: parseInt(document.getElementById('scanInterval').value) * 60, // Convert to seconds
|
||||
worker_timeout_seconds: parseInt(document.getElementById('workerTimeout').value) * 60, // Convert to seconds
|
||||
task_timeout_seconds: parseInt(document.getElementById('taskTimeout').value) * 3600, // Convert to seconds
|
||||
retry_delay_seconds: parseInt(document.getElementById('retryDelay').value) * 60, // Convert to seconds
|
||||
max_retries: parseInt(document.getElementById('maxRetries').value),
|
||||
task_retention_seconds: parseInt(document.getElementById('taskRetention').value) * 24 * 3600, // Convert to seconds
|
||||
policy: {
|
||||
...currentConfig.config.policy, // Preserve existing policy
|
||||
global_max_concurrent: parseInt(document.getElementById('globalMaxConcurrent').value)
|
||||
}
|
||||
};
|
||||
|
||||
fetch('/api/maintenance/config', {
|
||||
method: 'PUT',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(config)
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
alert('Configuration saved successfully');
|
||||
} else {
|
||||
alert('Failed to save configuration: ' + (data.error || 'Unknown error'));
|
||||
}
|
||||
})
|
||||
.catch(error => {
|
||||
alert('Error: ' + error.message);
|
||||
});
|
||||
// Send the updated configuration
|
||||
return fetch('/api/maintenance/config', {
|
||||
method: 'PUT',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(updatedConfig)
|
||||
});
|
||||
})
|
||||
.then(response => response.json())
|
||||
.then(data => {
|
||||
if (data.success) {
|
||||
alert('Configuration saved successfully');
|
||||
location.reload(); // Reload to show updated values
|
||||
} else {
|
||||
alert('Failed to save configuration: ' + (data.error || 'Unknown error'));
|
||||
}
|
||||
})
|
||||
.catch(error => {
|
||||
alert('Error: ' + error.message);
|
||||
});
|
||||
}
|
||||
|
||||
function resetToDefaults() {
|
||||
if (confirm('Are you sure you want to reset to default configuration? This will overwrite your current settings.')) {
|
||||
// Reset form to defaults
|
||||
// Reset form to defaults (matching DefaultMaintenanceConfig values)
|
||||
document.getElementById('enabled').checked = false;
|
||||
document.getElementById('scanInterval').value = '30';
|
||||
document.getElementById('vacuumEnabled').checked = false;
|
||||
document.getElementById('vacuumGarbageRatio').value = '30';
|
||||
document.getElementById('replicationFixEnabled').checked = false;
|
||||
document.getElementById('workerTimeout').value = '5';
|
||||
document.getElementById('taskTimeout').value = '2';
|
||||
document.getElementById('globalMaxConcurrent').value = '4';
|
||||
document.getElementById('maxRetries').value = '3';
|
||||
document.getElementById('retryDelay').value = '15';
|
||||
document.getElementById('taskRetention').value = '7';
|
||||
}
|
||||
}
|
||||
</script>
|
||||
|
||||
Reference in New Issue
Block a user