4 Commits

Author SHA1 Message Date
Chris Lu
995dfc4d5d chore: remove ~50k lines of unreachable dead code (#8913)
* chore: remove unreachable dead code across the codebase

Remove ~50,000 lines of unreachable code identified by static analysis.

Major removals:
- weed/filer/redis_lua: entire unused Redis Lua filer store implementation
- weed/wdclient/net2, resource_pool: unused connection/resource pool packages
- weed/plugin/worker/lifecycle: unused lifecycle plugin worker
- weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy,
  multipart IAM, key rotation, and various SSE helper functions
- weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions
- weed/mq/offset: unused SQL storage and migration code
- weed/worker: unused registry, task, and monitoring functions
- weed/query: unused SQL engine, parquet scanner, and type functions
- weed/shell: unused EC proportional rebalance functions
- weed/storage/erasure_coding/distribution: unused distribution analysis functions
- Individual unreachable functions removed from 150+ files across admin,
  credential, filer, iam, kms, mount, mq, operation, pb, s3api, server,
  shell, storage, topology, and util packages

* fix(s3): reset shared memory store in IAM test to prevent flaky failure

TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because
the MemoryStore credential backend is a singleton registered via init().
Earlier tests that create anonymous identities pollute the shared store,
causing LookupAnonymous() to unexpectedly return true.

Fix by calling Reset() on the memory store before the test runs.

* style: run gofmt on changed files

* fix: restore KMS functions used by integration tests

* fix(plugin): prevent panic on send to closed worker session channel

The Plugin.sendToWorker method could panic with "send on closed channel"
when a worker disconnected while a message was being sent. The race was
between streamSession.close() closing the outgoing channel and sendToWorker
writing to it concurrently.

Add a done channel to streamSession that is closed before the outgoing
channel, and check it in sendToWorker's select to safely detect closed
sessions without panicking.
2026-04-03 16:04:27 -07:00
Chris Lu
47ddf05d95 feat(plugin): DC/rack/node filtering for volume balance (#8621)
* feat(plugin): add DC/rack/node filtering for volume balance detection

Add scoping filters so balance detection can be limited to specific data
centers, racks, or nodes. Filters are applied both at the metrics level
(in the handler) and at the topology seeding level (in detection) to
ensure only the targeted infrastructure participates in balancing.

* address PR review: use set lookups, deduplicate test helpers, add target checks

* address review: assert non-empty tasks in filter tests

Prevent vacuous test passes by requiring len(tasks) > 0
before checking source/target exclusions.

* address review: enforce filter scope in fallback, clarify DC filter

- Thread allowedServers into createBalanceTask so the fallback
  planner cannot produce out-of-scope targets when DC/rack/node
  filters are active
- Update data_center_filter description to clarify single-DC usage

* address review: centralize parseCSVSet, fix filter scope leak, iterate all targets

- Extract ParseCSVSet to shared weed/worker/tasks/util package,
  remove duplicates from detection.go and volume_balance_handler.go
- Fix metric accumulation re-introducing filtered-out servers by
  only counting metrics for servers that passed DC/rack/node filters
- Trim DataCenterFilter before matching to handle trailing spaces
- Iterate all task.TypedParams.Targets in filter tests, not just [0]

* remove useless descriptor string test
2026-03-13 17:03:37 -07:00
Chris Lu
998c8d2702 Worker maintenance tasks now use non-default grpcPort if configured (#8407)
Fixes #8401

When creating balance/vacuum tasks, the worker maintenance scheduler was
accidentally discarding the custom grpcPort defined on the DataNodeInfo
by using just its HTTP Address string, which defaults to +10000
during grpc dialing.

By using pb.NewServerAddressFromDataNode, the grpcPort suffix is correctly
encoded in the server address string, preventing connection refused errors
for users running volume servers with custom gRPC ports.
2026-02-22 22:40:14 -08:00
Chris Lu
c260e6a22e Fix issue #7880: Tasks use Volume IDs instead of ip:port (#7881)
* Fix issue #7880: Tasks use Volume IDs instead of ip:port

When volume servers are registered with custom IDs, tasks were attempting
to connect using the ID instead of the actual ip:port address, causing
connection failures.

Modified task detection logic in balance, erasure coding, and vacuum tasks
to resolve volume server IDs to their actual ip:port addresses using
ActiveTopology information.

* Use server addresses directly instead of translating from IDs

Modified VolumeHealthMetrics to include ServerAddress field populated
directly from topology DataNodeInfo.Address. Updated task detection
logic to use addresses directly without runtime lookups.

Changes:
- Added ServerAddress field to VolumeHealthMetrics
- Updated maintenance scanner to populate ServerAddress
- Modified task detection to use ServerAddress for Node fields
- Updated DestinationPlan to include TargetAddress
- Removed runtime address lookups in favor of direct address usage

* Address PR comments: add ServerAddress field, improve error handling

- Add missing ServerAddress field to VolumeHealthMetrics struct
- Add warning in vacuum detection when server not found in topology
- Improve error handling in erasure coding to abort task if sources missing
- Make vacuum task stricter by skipping if server not found in topology

* Refactor: Extract common address resolution logic into shared utility

- Created weed/worker/tasks/util/address.go with ResolveServerAddress function
- Updated balance, erasure_coding, and vacuum detection to use the shared utility
- Removed code duplication and improved maintainability
- Consistent error handling across all task types

* Fix critical issues in task address resolution

- Vacuum: Require topology availability and fail if server not found (no fallback to ID)
- Ensure all task types consistently fail early when topology is incomplete
- Prevent creation of tasks that would fail due to missing server addresses

* Address additional PR feedback

- Add validation for empty addresses in ResolveServerAddress
- Remove redundant serverAddress variable in vacuum detection
- Improve robustness of address resolution

* Improve error logging in vacuum detection

- Include actual error details in log message for better diagnostics
- Make error messages consistent with other task types
2025-12-25 16:14:05 -08:00