Commit Graph

20 Commits

Author SHA1 Message Date
Chris Lu
72a8f598f2 Fix Maintenance Task Sorting and Refactor Log Persistence (#8199)
* fix float stepping

* do not auto refresh

* only logs when non 200 status

* fix maintenance task sorting and cleanup redundant handler logic

* Refactor log retrieval to persist to disk and fix slowness

- Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail
- Implement background log fetching on task completion in worker_grpc_server.go
- Implement async background refresh for in-progress tasks
- Completely remove blocking gRPC calls from the UI path to fix 10s timeouts
- Cleanup debug logs and performance profiling code

* Ensure consistent deterministic sorting in config_persistence cleanup

* Replace magic numbers with constants and remove debug logs

- Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go
- Replaced magic numbers with these constants throughout the codebase
- Verified removal of stdout debug printing
- Ensured consistent truncation logic during log persistence

* Address code review feedback on history truncation and logging logic

- Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail
- Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs)
- Remove unused Timeout field from LogRequestContext and sync select timeouts with constants
- Ensure AssignmentHistory is only provided in the top-level field for better JSON structure

* Implement goroutine leak protection and request deduplication

- Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task
- Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map
- Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming
- Ensure all persistent log-fetching goroutines are bounded and efficiently managed

* Fix potential nil pointer panics in maintenance handlers

- Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig
- Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized
- Ensure internal helper methods consistently check for adminServer initialization before use

* Strictly enforce disk-only log reading

- Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view
- Remove unused lastLogFetch tracking fields to clean up dead code
- Ensure logs are only updated upon task completion via handleTaskCompletion

* Refactor GetWorkerLogs to read from disk

- Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs
- Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors
- Ensure consistent log retrieval behavior across the application (disk-only)

* Fix timestamp parsing in log viewer

- Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps
- Fix "Invalid time value" error when displaying logs fetched from disk
- Regenerate templates

* master: fallback to HDD if SSD volumes are full in Assign

* worker: improve EC detection logging and fix skip counters

* worker: add Sync method to TaskLogger interface

* worker: implement Sync and ensure logs are flushed before task completion

* admin: improve task log retrieval with retries and better timeouts

* admin: robust timestamp parsing in task detail view
2026-02-04 08:48:55 -08:00
Dmitriy Pavlov
cd78e653e1 add disable volume_growth flag (#7196) 2025-09-04 05:39:56 -07:00
chrislu
da728750be follow grow volume option version 2025-06-19 13:54:54 -07:00
chrislu
d2be5822a1 refactoring 2025-06-16 22:25:22 -07:00
chrislu
96632a34b1 add version to volume proto 2025-06-16 22:05:06 -07:00
Konstantin Lebedev
e2e97db917 [master] avoid timeout when assigning for main request with filter by DC or rack (#6291)
* avoid timeout when assigning for main request with filter by DC or rack

https://github.com/seaweedfs/seaweedfs/issues/6290

* use constant NoWritableVolumes
2024-11-26 08:33:31 -08:00
chrislu
ff3d46637d better logging for volume growth 2024-09-07 12:38:34 -07:00
chrislu
3b7bb62e38 logs on error 2024-08-26 09:09:11 -07:00
Konstantin Lebedev
b2ffcdaab2 [master] do sync grow request only if absolutely necessary (#5821)
* do sync grow request only if absolutely necessary
https://github.com/seaweedfs/seaweedfs/pull/5819

* remove check VolumeGrowStrategy Threshold on PickForWrite

* fix fmt.Errorf
2024-07-30 13:21:35 -07:00
chrislu
e2a07d11d5 Revert "Check ShouldGrowVolumes before returning error in assign. (#5819)"
This reverts commit 98d66338d0333cd955f7840c64ef95e3c4807a17.
2024-07-26 11:21:50 -07:00
Ruoxi
d15966ae8e Check ShouldGrowVolumes before returning error in assign. (#5819) 2024-07-26 11:04:38 -07:00
Konstantin Lebedev
67edf1d014 [master] Do Automatic Volume Grow in background (#5781)
* Do Automatic Volume Grow in backgound

* pass lastGrowCount to master

* fix build

* fix count to uint64
2024-07-16 08:03:40 -07:00
Konstantin Lebedev
04f4b10884 fix: avoid timeout if datacenter does not exist in topology (#5772)
* fix: avoid timeout if datacenter does not exist in topology

* fix: error msg

* fix: rm dublicate check

* fix: compare

* revert minor change
2024-07-12 11:19:08 -07:00
chrislu
d9490c5e1f rename 2024-04-18 08:47:45 -07:00
Konstantin Lebedev
df40908e57 fix panic 5435 (#5436) 2024-03-28 16:17:59 -07:00
chrislu
bebbc9fe44 create volume grow request if the selected volume is close to full 2023-12-27 11:45:44 -08:00
chrislu
756bcc032d adjust logs 2023-11-27 12:57:29 -08:00
Konstantin Lebedev
dd78397fea logging PickForWrite error
https://github.com/seaweedfs/seaweedfs/issues/3886
2023-11-27 12:56:15 -08:00
chrislu
94b7e2a37c add stream assign server side implementation 2023-08-22 09:59:04 -07:00
chrislu
ccedad5196 refactor files 2023-08-22 09:54:06 -07:00