Commit Graph

13320 Commits

Author SHA1 Message Date
Chris Lu
45ce18266a Disable master maintenance scripts when admin server runs (#8499)
* Disable master maintenance scripts when admin server runs

* Stop defaulting master maintenance scripts

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Clarify master scripts are disabled by default

* Skip master maintenance scripts when admin server is connected

* Restore default master maintenance scripts

* Document admin server skip for master maintenance scripts

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-04 00:40:40 -08:00
Chris Lu
18ccc9b773 Plugin scheduler: sequential iterations with max runtime (#8496)
* pb: add job type max runtime setting

* plugin: default job type max runtime

* plugin: redesign scheduler loop

* admin ui: update scheduler settings

* plugin: fix scheduler loop state name

* plugin scheduler: restore backlog skip

* plugin scheduler: drop legacy detection helper

* admin api: require scheduler config body

* admin ui: preserve detection interval on save

* plugin scheduler: use job context and drain cancels

* plugin scheduler: respect detection intervals

* plugin scheduler: gate runs and drain queue

* ec test: reuse req/resp vars

* ec test: add scheduler debug logs

* Adjust scheduler idle sleep and initial run delay

* Clear pending job queue before scheduler runs

* Log next detection time in EC integration test

* Improve plugin scheduler debug logging in EC test

* Expose scheduler next detection time

* Log scheduler next detection time in EC test

* Wake scheduler on config or worker updates

* Expose scheduler sleep interval in UI

* Fix scheduler sleep save value selection

* Set scheduler idle sleep default to 613s

* Show scheduler next run time in plugin UI

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-03 23:09:49 -08:00
Chris Lu
e1e5b4a8a6 add admin script worker (#8491)
* admin: add plugin lock coordination

* shell: allow bypassing lock checks

* plugin worker: add admin script handler

* mini: include admin_script in plugin defaults

* admin script UI: drop name and enlarge text

* admin script: add default script

* admin_script: make run interval configurable

* plugin: gate other jobs during admin_script runs

* plugin: use last completed admin_script run

* admin: backfill plugin config defaults

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* comparable to default version

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* default to run

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* format

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: respect pre-set noLock for fix.replication

* shell: add force no-lock mode for admin scripts

* volume balance worker already exists

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* admin: expose scheduler status JSON

* shell: add sleep command

* shell: restrict sleep syntax

* Revert "shell: respect pre-set noLock for fix.replication"

This reverts commit 2b14e8b82602a740d3a473c085e3b3a14f1ddbb3.

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* fix import

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* less logs

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* Reduce master client logs on canceled contexts

* Update mini default job type count

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-03 15:10:40 -08:00
Peter Dodd
16f2269a33 feat(filer): lazy metadata pulling (#8454)
* Add remote storage index for lazy metadata pull

Introduces remoteStorageIndex, which maintains a map of filer directory
to remote storage client/location, refreshed periodically from the
filer's mount mappings. Provides lazyFetchFromRemote, ensureRemoteEntryInFiler,
and isRemoteBacked on S3ApiServer as integration points for handler-level
work in a follow-up PR. Nothing is wired into the server yet.

Made-with: Cursor

* Add unit tests for remote storage index and wire field into S3ApiServer

Adds tests covering isEmpty, findForPath (including longest-prefix
resolution), and isRemoteBacked. Also removes a stray PR review
annotation from the index file and adds the remoteStorageIdx field
to S3ApiServer so the package compiles ahead of the wiring PR.

Made-with: Cursor

* Address review comments on remote storage index

- Use filer_pb.CreateEntry helper so resp.Error is checked, not just the RPC error
- Extract keepPrev closure to remove duplicated error-handling in refresh loop
- Add comment explaining availability-over-consistency trade-off on filer save failure

Made-with: Cursor

* Move lazy metadata pull from S3 API to filer

- Add maybeLazyFetchFromRemote in filer: on FindEntry miss, stat remote
  and CreateEntry when path is under a remote mount
- Use singleflight for dedup; context guard prevents CreateEntry recursion
- Availability-over-consistency: return in-memory entry if CreateEntry fails
- Add longest-prefix test for nested mounts in remote_storage_test.go
- Remove remoteStorageIndex, lazyFetchFromRemote, ensureRemoteEntryInFiler,
  doLazyFetch from s3api; filer now owns metadata operations
- Add filer_lazy_remote_test.go with tests for hit, miss, not-found,
  CreateEntry failure, longest-prefix, and FindEntry integration

Made-with: Cursor

* Address review: fix context guard test, add FindMountDirectory comment, remove dead code

Made-with: Cursor

* Nitpicks: restore prev maker in registerStubMaker, instance-scope lazyFetchGroup, nil-check remoteEntry

Made-with: Cursor

* Fix remotePath when mountDir is root: ensure relPath has leading slash

Made-with: Cursor

* filer: decouple lazy-fetch persistence from caller context

Use context.Background() inside the singleflight closure for CreateEntry
so persistence is not cancelled when the winning request's context is
cancelled. Fixes CreateEntry failing for all waiters when the first
caller times out.

Made-with: Cursor

* filer: remove redundant Mode bitwise OR with zero

Made-with: Cursor

* filer: use bounded context for lazy-fetch persistence

Replace context.Background() with context.WithTimeout(30s) and defer
cancel() to prevent indefinite blocking and release resources.

Made-with: Cursor

* filer: use checked type assertion for singleflight result

Made-with: Cursor

* filer: rename persist context vars to avoid shadowing function parameter

Made-with: Cursor
2026-03-03 13:01:10 -08:00
Chris Lu
1a3e3100d0 Helm: set serviceAccountName independent of cluster role (#8495)
* Add stale job expiry and expire API

* Add expire job button

* helm: decouple serviceAccountName from cluster role

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-03 12:13:18 -08:00
Chris Lu
a61a2affe3 Expire stuck plugin jobs (#8492)
* Add stale job expiry and expire API

* Add expire job button

* Add test hook and coverage for ExpirePluginJobAPI

* Document scheduler filtering side effect and reuse helper

* Restore job spec proposal test

* Regenerate plugin template output

---------

Co-authored-by: Copilot <copilot@github.com>
2026-03-03 01:27:25 -08:00
Surote
3db05f59f0 Feat: update openshift helm value to support seaweed s3 (#8494)
feat: update openshift helm values

Update helm values for openshift to enable/disable s3 and change log to `emptydir` instead of `hostpath`
2026-03-03 01:11:01 -08:00
Chris Lu
2644816692 helm: avoid duplicate env var keys in workload env lists (#8488)
* helm: dedupe merged extraEnvironmentVars in workloads

* address comments

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* range

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* helm: reuse merge helper for extraEnvironmentVars

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-02 12:10:57 -08:00
Chris Lu
fb944f0071 test: add Polaris S3 tables integration tests (#8489)
* test: add polaris integration test harness

* test: add polaris integration coverage

* ci: run polaris s3 tables tests

* test: harden polaris harness

* test: DRY polaris integration tests

* ci: pre-pull Polaris image

* test: extend Polaris pull timeout

* test: refine polaris credentials selection

* test: keep Polaris tables inside allowed location

* test: use fresh context for polaris cleanup

* test: prefer specific Polaris storage credential

* test: tolerate Polaris credential variants

* test: request Polaris vended credentials

* test: load Polaris table credentials

* test: allow polaris vended access via bucket policy

* test: align Polaris object keys with table location

* test: rename Polaris vended role references

* test: simplify Polaris vended credential extraction

* test: marshal Polaris bucket policy
2026-03-02 12:10:03 -08:00
dependabot[bot]
479da50433 build(deps): bump gocloud.dev/pubsub/natspubsub from 0.44.0 to 0.45.0 (#8487)
Bumps [gocloud.dev/pubsub/natspubsub](https://github.com/google/go-cloud) from 0.44.0 to 0.45.0.
- [Release notes](https://github.com/google/go-cloud/releases)
- [Commits](https://github.com/google/go-cloud/compare/v0.44.0...v0.45.0)

---
updated-dependencies:
- dependency-name: gocloud.dev/pubsub/natspubsub
  dependency-version: 0.45.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:28:52 -08:00
dependabot[bot]
f7909b8ebd build(deps): bump golang.org/x/oauth2 from 0.34.0 to 0.35.0 (#8486)
Bumps [golang.org/x/oauth2](https://github.com/golang/oauth2) from 0.34.0 to 0.35.0.
- [Commits](https://github.com/golang/oauth2/compare/v0.34.0...v0.35.0)

---
updated-dependencies:
- dependency-name: golang.org/x/oauth2
  dependency-version: 0.35.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:28:43 -08:00
dependabot[bot]
2a3ecee28b build(deps): bump github.com/shirou/gopsutil/v4 from 4.26.1 to 4.26.2 (#8485)
Bumps [github.com/shirou/gopsutil/v4](https://github.com/shirou/gopsutil) from 4.26.1 to 4.26.2.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v4.26.1...v4.26.2)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v4
  dependency-version: 4.26.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:28:33 -08:00
dependabot[bot]
f9cf3f3791 build(deps): bump github.com/linkedin/goavro/v2 from 2.14.1 to 2.15.0 (#8484)
Bumps [github.com/linkedin/goavro/v2](https://github.com/linkedin/goavro) from 2.14.1 to 2.15.0.
- [Release notes](https://github.com/linkedin/goavro/releases)
- [Commits](https://github.com/linkedin/goavro/compare/v2.14.1...v2.15.0)

---
updated-dependencies:
- dependency-name: github.com/linkedin/goavro/v2
  dependency-version: 2.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:28:20 -08:00
dependabot[bot]
5d0667221b build(deps): bump github.com/go-redsync/redsync/v4 from 4.15.0 to 4.16.0 (#8483)
Bumps [github.com/go-redsync/redsync/v4](https://github.com/go-redsync/redsync) from 4.15.0 to 4.16.0.
- [Release notes](https://github.com/go-redsync/redsync/releases)
- [Commits](https://github.com/go-redsync/redsync/compare/v4.15.0...v4.16.0)

---
updated-dependencies:
- dependency-name: github.com/go-redsync/redsync/v4
  dependency-version: 4.16.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:28:08 -08:00
dependabot[bot]
74593f7065 build(deps): bump actions/upload-artifact from 6 to 7 (#8482)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v6...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:27:57 -08:00
Chris Lu
340339f678 Add Apache Polaris integration tests (#8478)
* test: add polaris integration test harness

* test: add polaris integration coverage

* ci: run polaris s3 tables tests

* test: harden polaris harness

* test: DRY polaris integration tests

* ci: pre-pull Polaris image

* test: extend Polaris pull timeout

* test: refine polaris credentials selection

* test: keep Polaris tables inside allowed location

* test: use fresh context for polaris cleanup

* test: prefer specific Polaris storage credential

* test: tolerate Polaris credential variants

* test: request Polaris vended credentials

* test: load Polaris table credentials

* test: allow polaris vended access via bucket policy

* test: align Polaris object keys with table location
2026-03-01 23:08:50 -08:00
dependabot[bot]
2fc47a48ec build(deps): bump com.fasterxml.jackson.core:jackson-core from 2.18.2 to 2.18.6 in /test/java/spark (#8476) 2026-03-01 13:14:04 -08:00
dependabot[bot]
623450a0d4 build(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.40.0 (#8475) 2026-03-01 12:46:43 -08:00
Chris Lu
f5c35240be Add volume dir tags and EC placement priority (#8472)
* Add volume dir tags to topology

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add preferred tag config for EC

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Prioritize EC destinations by tags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add EC placement planner tag tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor EC placement tests to reuse buildActiveTopology

Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.

All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate normalizeTagList into shared util package

Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.

This improves code reuse and maintainability by centralizing
tag normalization logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PreferredTags to EC config persistence

Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.

This allows EC preferred tags to be persisted and restored across
worker restarts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add defensive copy for Tags slice in DiskLocation

Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add doc comment to buildCandidateSets method

Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.

This clarifies the intended semantics for future maintainers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply final PR review fixes

1. Update parseVolumeTags to replicate single tag entry to all folders
   instead of leaving some folders with nil tags. This prevents nil
   pointer dereferences when processing folders without explicit tags.

2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
   the pattern used in FromTaskPolicy, preventing external mutation of
   the returned TaskPolicy.

3. Add clarifying comment in buildCandidateSets explaining that the
   shardsNeeded <= 0 branch is a defensive check for direct callers,
   since selectDestinations guarantees shardsNeeded > 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nil pointer dereference in parseVolumeTags

Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NormalizeTagList to return empty slice instead of nil

Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add nil safety check for v.tags pointer

Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add volume.tags flag to weed server and weed mini commands

Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.

The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-01 10:22:00 -08:00
Chris Lu
c5d5b517f6 Add lakekeeper table bucket integration test (#8470)
* Add lakekeeper table bucket integration test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Convert lakekeeper table bucket test to Go

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add multipart upload to lakekeeper table bucket test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove lakekeeper test skips

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Convert lakekeeper repro to Go

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-28 12:06:41 -08:00
Chris Lu
2dd3944819 Respect -minFreeSpace during ec.decode (#8467)
* shell: add ec.decode ignoreMinFreeSpace flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: respect minFreeSpace in ec.decode

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: rename ec.decode minFreeSpace flag

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: error when ec.decode has no shards

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: select ec.decode target with zero shards

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: adjust free counts across ec.decode

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* unused

* Update weed/shell/command_ec_decode.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-02-27 23:54:30 -08:00
Chris Lu
7354fa87f1 refactor ec shard distribution (#8465)
* refactor ec shard distribution

* fix shard assignment merge and mount errors

* fix mount error aggregation scope

* make WithFields compatible and wrap errors
2026-02-27 17:21:13 -08:00
Chris Lu
e8946e59ca fix(s3api): correctly extract host header port in extractHostHeader (#8464)
* Prevent concurrent maintenance tasks per volume

* fix panic

* fix(s3api): correctly extract host header port when X-Forwarded-Port is present

* test(s3api): add test cases for misreported X-Forwarded-Port
2026-02-27 13:41:45 -08:00
Chris Lu
b9e560dcf1 Prevent overlapping maintenance tasks per volume (#8463)
* Prevent concurrent maintenance tasks per volume

* fix panic
2026-02-27 13:14:52 -08:00
Chris Lu
4f647e1036 Worker set its working directory (#8461)
* set working directory

* consolidate to worker directory

* working directory

* correct directory name

* refactoring to use wildcard matcher

* simplify

* cleaning ec working directory

* fix reference

* clean

* adjust test
2026-02-27 12:22:21 -08:00
Chris Lu
cf3b7b3ad7 adjust weight 2026-02-26 19:46:38 -08:00
Chris Lu
09a1ace53a adjust display name 2026-02-26 19:33:52 -08:00
Chris Lu
c73e65ad5e Add customizable plugin display names and weights (#8459)
* feat: add customizable plugin display names and weights

- Add weight field to JobTypeCapability proto message
- Modify ListKnownJobTypes() to return JobTypeInfo with display names and weights
- Modify ListPluginJobTypes() to return JobTypeInfo instead of string
- Sort plugins by weight (descending) then alphabetically
- Update admin API to return enriched job type metadata
- Update plugin UI template to display names instead of IDs
- Consolidate API by reusing existing function names instead of suffixed variants

* perf: optimize plugin job type capability lookup and add null-safe parsing

- Pre-calculate job type capabilities in a map to reduce O(n*m) nested loops
  to O(n+m) lookup time in ListKnownJobTypes()
- Add parseJobTypeItem() helper function for null-safe job type item parsing
- Refactor plugin.templ to use parseJobTypeItem() in all job type access points
  (hasJobType, applyInitialNavigation, ensureActiveNavigation, renderTopTabs)
- Deterministic capability resolution by using first worker's capability

* templ

* refactor: use parseJobTypeItem helper consistently in plugin.templ

Replace duplicated job type extraction logic at line 1296-1298 with
parseJobTypeItem() helper function for consistency and maintainability.

* improve: prefer richer capability metadata and add null-safety checks

- Improve capability selection in ListKnownJobTypes() to prefer capabilities
  with non-empty DisplayName and higher Weight across all workers instead of
  first-wins approach. Handles mixed-version clusters better.
- Add defensive null checks in renderJobTypeSummary() to safely access
  parseJobTypeItem() result before property access
- Ensures malformed or missing entries won't break the rendering pipeline

* fix: preserve existing DisplayName when merging capabilities

Fix capability merge logic to respect existing DisplayName values:
- If existing has DisplayName but candidate doesn't, preserve existing
- If existing doesn't have DisplayName but candidate does, use candidate
- Only use Weight comparison if DisplayName status is equal
- Prevents higher-weight capabilities with empty DisplayName from
  overriding capabilities with non-empty DisplayName
2026-02-26 19:20:48 -08:00
Chris Lu
8eba7ba5b2 feat: drop table location mapping support (#8458)
* feat: drop table location mapping support

Disable external metadata locations for S3 Tables and remove the table location
mapping index entirely. Table metadata must live under the table bucket paths,
so lookups no longer use mapping directories.

Changes:
- Remove mapping lookup and cache from bucket path resolution
- Reject metadataLocation in CreateTable and UpdateTable
- Remove mapping helpers and tests

* compile

* refactor

* fix: accept metadataLocation in S3 Tables API requests

We removed the external table location mapping feature, but still need to
accept and store metadataLocation values from clients like Trino. The mapping
feature was an internal implementation detail that mapped external buckets to
internal table paths. The metadataLocation field itself is part of the S3 Tables
API and should be preserved.

* fmt

* fix: handle MetadataLocation in UpdateTable requests

Mirror handleCreateTable behavior by updating metadata.MetadataLocation
when req.MetadataLocation is provided in UpdateTable requests. This ensures
table metadata location can be updated, not just set during creation.
2026-02-26 16:36:24 -08:00
Chris Lu
641351da78 fix: table location mappings to /etc/s3tables (#8457)
* fix: move table location mappings to /etc/s3tables to avoid bucket name validation

Fixes #8362 - table location mappings were stored under /buckets/.table-location-mappings
which fails bucket name validation because it starts with a dot. Moving them to
/etc/s3tables resolves the migration error for upgrades.

Changes:
- Table location mappings now stored under /etc/s3tables
- Ensure parent /etc directory exists before creating /etc/s3tables
- Normal writes go to new location only (no legacy compatibility)
- Removed bucket name validation exception for old location

* refactor: simplify lookupTableLocationMapping by removing redundant mappingPath parameter

The mappingPath function parameter was redundant as the path can be derived
from mappingDir and bucket using path.Join. This simplifies the code and
reduces the risk of path mismatches between parameters.
2026-02-26 15:35:13 -08:00
blitt001
3d81d5bef7 Fix S3 signature verification behind reverse proxies (#8444)
* Fix S3 signature verification behind reverse proxies

When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong,
Traefik), AWS S3 Signature V4 verification fails because the Host header
the client signed with (e.g. "localhost:9000") differs from the Host
header SeaweedFS receives on the backend (e.g. "seaweedfs:8333").

This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL
environment variable) that tells SeaweedFS what public-facing URL clients
use to connect. When set, SeaweedFS uses this host value for signature
verification instead of the Host header from the incoming request.

New parameter:
  -s3.externalUrl  (flag) or S3_EXTERNAL_URL (environment variable)
  Example: -s3.externalUrl=http://localhost:9000
  Example: S3_EXTERNAL_URL=https://s3.example.com

The environment variable is particularly useful in Docker/Kubernetes
deployments where the external URL is injected via container config.
The flag takes precedence over the environment variable when both are set.

At startup, the URL is parsed and default ports are stripped to match
AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so
"http://s3.example.com:80" and "http://s3.example.com" are equivalent.

Bugs fixed:
- Default port stripping was removed by a prior PR, causing signature
  mismatches when clients connect on standard ports (80/443)
- X-Forwarded-Port was ignored when X-Forwarded-Host was not present
- Scheme detection now uses proper precedence: X-Forwarded-Proto >
  TLS connection > URL scheme > "http"
- Test expectations for standard port stripping were incorrect
- expectedHost field in TestSignatureV4WithForwardedPort was declared
  but never actually checked (self-referential test)

* Add Docker integration test for S3 proxy signature verification

Docker Compose setup with nginx reverse proxy to validate that the
-s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly
resolves S3 signature verification when SeaweedFS runs behind a proxy.

The test uses nginx proxying port 9000 to SeaweedFS on port 8333,
with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured
with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000"
for signature verification, matching what the AWS CLI signs with.

The test can be run with aws CLI on the host or without it by using
the amazon/aws-cli Docker image with --network host.

Test covers: create-bucket, list-buckets, put-object, head-object,
list-objects-v2, get-object, content round-trip integrity,
delete-object, and delete-bucket — all through the reverse proxy.

* Create s3-proxy-signature-tests.yml

* fix CLI

* fix CI

* Update s3-proxy-signature-tests.yml

* address comments

* Update Dockerfile

* add user

* no need for fuse

* Update s3-proxy-signature-tests.yml

* debug

* weed mini

* fix health check

* health check

* fix health checking

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-26 14:20:42 -08:00
Kirill Ilin
ae02d47433 helm: add optional parameters to COSI BucketClass (#8453)
Add cosi.bucketClassParameters to allow passing arbitrary parameters
to the default BucketClass resource. This enables use cases like
tiered storage where a diskType parameter needs to be set on the
BucketClass to route objects to specific volume servers.

When bucketClassParameters is empty (default), the BucketClass is
rendered without a parameters block, preserving backward compatibility.

Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
Co-authored-by: Claude <noreply@anthropic.com>
2026-02-26 12:19:07 -08:00
Chris Lu
9b6fc49946 Chart createBuckets config #8368: Add TTL, Object Lock, and Versioning support (#8375)
* Chart createBuckets config #8368: Add TTL, Object Lock, and Versioning support

* Update weed/shell/command_s3_bucket_versioning.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* address comments

* address comments

* go fmt

* fix failures are still treated like “bucket not found”

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-26 11:56:10 -08:00
Lars Lehtonen
0fac6e39ea weed/s3api/s3tables: fix dropped errors (#8456)
* weed/s3api/s3tables: fix dropped errors

* enhance errors

* fail fast when listing tables

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-26 11:12:10 -08:00
Chris Lu
453310b057 Add plugin worker integration tests for erasure coding (#8450)
* test: add plugin worker integration harness

* test: add erasure coding detection integration tests

* test: add erasure coding execution integration tests

* ci: add plugin worker integration workflow

* test: extend fake volume server for vacuum and balance

* test: expand erasure coding detection topologies

* test: add large erasure coding detection topology

* test: add vacuum plugin worker integration tests

* test: add volume balance plugin worker integration tests

* ci: run plugin worker tests per worker

* fixes

* erasure coding: stop after placement failures

* erasure coding: record hasMore when early stopping

* erasure coding: relax large topology expectations
2026-02-25 22:11:41 -08:00
Chris Lu
9b6bbf7d45 Remove volumePreallocate option from docker containers (#8451)
Some filesystems, such as XFS, may over-allocate disk spaces when using
volume preallocation. Remove this option from the default docker entrypoint
scripts to allow volumes to use only the necessary disk space.

Fixes: https://github.com/seaweedfs/seaweedfs/issues/6465#issuecomment-3964174718

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-25 22:11:15 -08:00
Chris Lu
d2b92938ee Make EC detection context aware (#8449)
* Make EC detection context aware

* Update register.go

* Speed up EC detection planning

* Add tests for EC detection planner

* optimizations

detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results.
erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen.
detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit).

* use working directory

* fix compilation

* fix compilation

* rename

* go vet

* fix getenv

* address comments, fix error
2026-02-25 18:02:35 -08:00
Chris Lu
7f6e58b791 Fix SFTP file upload failures with JWT filer tokens (#8448)
* Fix SFTP file upload failures with JWT filer tokens (issue #8425)

When JWT authentication is enabled for filer operations via jwt.filer_signing.*
configuration, SFTP server file upload requests were rejected because they lacked
JWT authorization headers.

Changes:
- Added JWT signing key and expiration fields to SftpServer struct
- Modified putFile() to generate and include JWT tokens in upload requests
- Enhanced SFTPServiceOptions with JWT configuration fields
- Updated SFTP command startup to load and pass JWT config to service

This allows SFTP uploads to authenticate with JWT-enabled filers, consistent
with how other SeaweedFS components (S3 API, file browser) handle filer auth.

Fixes #8425

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 14:30:21 -08:00
Chris Lu
a92e9baddf Add integration test for multipart operations inheriting s3:PutObject permissions
TestS3MultipartOperationsInheritPutObjectPermissions verifies that multipart
upload operations (CreateMultipartUpload, UploadPart, ListParts,
CompleteMultipartUpload, AbortMultipartUpload, ListMultipartUploads) work
correctly when a user has only s3:PutObject permission granted.

This test validates the behavior where multipart operations are implicitly
granted when s3:PutObject is authorized, as multipart upload is an
implementation detail of putting objects in S3.
2026-02-25 13:38:14 -08:00
dependabot[bot]
52be2edcb1 build(deps): bump github.com/parquet-go/parquet-go from 0.26.4 to 0.27.0 (#8410)
* build(deps): bump github.com/parquet-go/parquet-go from 0.26.4 to 0.27.0

Bumps [github.com/parquet-go/parquet-go](https://github.com/parquet-go/parquet-go) from 0.26.4 to 0.27.0.
- [Release notes](https://github.com/parquet-go/parquet-go/releases)
- [Changelog](https://github.com/parquet-go/parquet-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/parquet-go/parquet-go/compare/v0.26.4...v0.27.0)

---
updated-dependencies:
- dependency-name: github.com/parquet-go/parquet-go
  dependency-version: 0.27.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix parser

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-25 13:15:36 -08:00
Chris Lu
b9fa05153a Allow multipart upload operations when s3:PutObject is authorized (#8445)
* Allow multipart upload operations when s3:PutObject is authorized

Multipart upload is an implementation detail of putting objects, not a separate
permission. When a policy grants s3:PutObject, it should implicitly allow:
- s3:CreateMultipartUpload
- s3:UploadPart
- s3:CompleteMultipartUpload
- s3:AbortMultipartUpload
- s3:ListParts

This fixes a compatibility issue where clients like PyArrow that use multipart
uploads by default would fail even though the role had s3:PutObject permission.
The session policy intersection still applies - both the identity-based policy
AND session policy must allow s3:PutObject for multipart operations to work.

Implementation:
- Added constants for S3 multipart action strings
- Added multipartActionSet to efficiently check if action is multipart-related
- Updated MatchesAction method to implicitly grant multipart when PutObject allowed

* Update weed/s3api/policy_engine/types.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Add s3:ListMultipartUploads to multipart action set

Include s3:ListMultipartUploads in the multipartActionSet so that listing
multipart uploads is implicitly granted when s3:PutObject is authorized.
ListMultipartUploads is a critical part of the multipart upload workflow,
allowing clients to query in-progress uploads before completing them.

Changes:
- Added s3ListMultipartUploads constant definition
- Included s3ListMultipartUploads in multipartActionSet initialization
- Existing references to multipartActionSet automatically now cover ListMultipartUploads

All policy engine tests pass (0.351s execution time)

* Refactor: reuse multipart action constants from s3_constants package

Remove duplicate constant definitions from policy_engine/types.go and import
the canonical definitions from s3api/s3_constants/s3_actions.go instead.
This eliminates duplication and ensures a single source of truth for
multipart action strings:

- ACTION_CREATE_MULTIPART_UPLOAD
- ACTION_UPLOAD_PART
- ACTION_COMPLETE_MULTIPART
- ACTION_ABORT_MULTIPART
- ACTION_LIST_PARTS
- ACTION_LIST_MULTIPART_UPLOADS

All policy engine tests pass (0.350s execution time)

* Fix S3_ACTION_LIST_MULTIPART_UPLOADS constant value

Move S3_ACTION_LIST_MULTIPART_UPLOADS from bucket operations to multipart
operations section and change value from 's3:ListBucketMultipartUploads' to
's3:ListMultipartUploads' to match the action strings used in policy_engine
and s3_actions.go.

This ensures consistent action naming across all S3 constant definitions.

* refactor names

* Fix S3 action constant mismatches and MatchesAction early return bug

Fix two critical issues in policy engine:

1. S3Actions map had incorrect multipart action mappings:
   - 'ListMultipartUploads' was 's3:ListMultipartUploads' (should be 's3:ListBucketMultipartUploads')
   - 'ListParts' was 's3:ListParts' (should be 's3:ListMultipartUploadParts')
   These mismatches caused authorization checks to fail for list operations

2. CompiledStatement.MatchesAction() had early return bug:
   - Previously returned true immediately upon first direct action match
   - This prevented scanning remaining matchers for s3:PutObject permission
   - Now scans ALL matchers before returning, tracking both direct match and PutObject grant
   - Ensures multipart operations inherit s3:PutObject authorization even when
     explicitly requested action doesn't match (e.g., s3:ListMultipartUploadParts)

Changes:
- Track matchedAction flag to defer
Fix two critical issues in policy engine:

1. S3Actions map had incorrect multipart action mappings:
   - 'ListMultipartUploads' was 's3:ListMultipartUplPer
1. S3Actions map had incorrect multiparAll   - 'ListMultipartUploads(0.334s execution time)

* Refactor S3Actions map to use s3_constants

Replace hardcoded action strings in the S3Actions map with references to
canonical S3_ACTION_* constants from s3_constants/s3_action_strings.go.

Benefits:
- Single source of truth for S3 action values
- Eliminates string duplication across codebase
- Ensures consistency between policy engine and middleware
- Reduces maintenance burden when action strings need updates

All policy engine tests pass (0.334s execution time)

* Remove unused S3Actions map

The S3Actions map in types.go was never referenced anywhere in the codebase.
All action mappings are handled by GetActionMappings() in integration.go instead.
This removes 42 lines of dead code.

* Fix test: reload configuration function must also reload IAM state

TestEmbeddedIamAttachUserPolicyRefreshesIAM was failing because the test's
reloadConfigurationFunc only updated mockConfig but didn't reload the actual IAM
state. When AttachUserPolicy calls refreshIAMConfiguration(), it would use the
test's incomplete reload function instead of the real LoadS3ApiConfigurationFromCredentialManager().

Fixed by making the test's reloadConfigurationFunc also call
e.iam.LoadS3ApiConfigurationFromCredentialManager() so lookupByIdentityName()
sees the updated policy attachments.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 12:31:04 -08:00
Chris Lu
d5e71eb0d8 Revert "s3api: preserve Host header port in signature verification (#8434)"
This reverts commit 98d89ffad7.
2026-02-25 12:28:44 -08:00
dependabot[bot]
f82ee08972 build(deps): bump github.com/cloudflare/circl from 1.6.1 to 1.6.3 (#8446)
Bumps [github.com/cloudflare/circl](https://github.com/cloudflare/circl) from 1.6.1 to 1.6.3.
- [Release notes](https://github.com/cloudflare/circl/releases)
- [Commits](https://github.com/cloudflare/circl/compare/v1.6.1...v1.6.3)

---
updated-dependencies:
- dependency-name: github.com/cloudflare/circl
  dependency-version: 1.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 12:23:15 -08:00
dependabot[bot]
96078bc87e build(deps): bump github.com/cloudflare/circl from 1.6.1 to 1.6.3 in /test/kafka (#8447)
build(deps): bump github.com/cloudflare/circl in /test/kafka

Bumps [github.com/cloudflare/circl](https://github.com/cloudflare/circl) from 1.6.1 to 1.6.3.
- [Release notes](https://github.com/cloudflare/circl/releases)
- [Commits](https://github.com/cloudflare/circl/compare/v1.6.1...v1.6.3)

---
updated-dependencies:
- dependency-name: github.com/cloudflare/circl
  dependency-version: 1.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 12:23:02 -08:00
Chris Lu
8c0c7248b3 Refresh IAM config after policy attachments (#8439)
* Refresh IAM cache after policy attachments

* error handling
2026-02-25 10:30:05 -08:00
Chris Lu
e3decd2e3b go fmt 2026-02-25 10:25:44 -08:00
Chris Lu
7296f51f48 Merge branch 'master' of https://github.com/seaweedfs/seaweedfs 2026-02-25 10:25:31 -08:00
Chris Lu
a3cb7fa8cc go fmt 2026-02-25 10:25:23 -08:00
Peter Dodd
0910252e31 feat: add statfile remote storage (#8443)
* feat: add statfile; add error for remote storage misses

* feat: statfile implementations for storage providers

* test: add unit tests for StatFile method across providers

Add comprehensive unit tests for the StatFile implementation covering:
- S3: interface compliance and error constant accessibility
- Azure: interface compliance, error constants, and field population
- GCS: interface compliance, error constants, error detection, and field population

Also fix variable shadowing issue in S3 and Azure StatFile implementations where
named return parameters were being shadowed by local variable declarations.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address StatFile review feedback

- Use errors.New for ErrRemoteObjectNotFound sentinel
- Fix S3 HeadObject 404 detection to use awserr.Error code check
- Remove hollow field-population tests that tested nothing
- Remove redundant stdlib error detection tests
- Trim verbose doc comment on ErrRemoteObjectNotFound

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address second round of StatFile review feedback

- Rename interface assertion tests to TestXxxRemoteStorageClientImplementsInterface
- Delegate readFileRemoteEntry to StatFile in all three providers
- Revert S3 404 detection to RequestFailure.StatusCode() check
- Fix double-slash in GCS error message format string
- Add storage type prefix to S3 error message for consistency

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: comments

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-25 10:24:06 -08:00
Chris Lu
b565a0cc86 Adds volume.merge command with deduplication and disk-based backend (#8441)
* Enhance volume.merge command with deduplication and disk-based backend

* Fix copyVolume function call with correct argument order and missing bool parameter

* Revert "Fix copyVolume function call with correct argument order and missing bool parameter"

This reverts commit 7b4a190643576fec11f896b26bcad03dd02da2f7.

* Fix critical issues: per-replica writable tracking, tail goroutine cancellation via done channel, and debug logging for allocation failures

* Optimize memory usage with watermark approach for duplicate detection

* Fix critical issues: swap copyVolume arguments, increase idle timeout, remove file double-close, use glog for logging

* Replace temporary file with in-memory buffer for needle blob serialization

* test(volume.merge): Add comprehensive unit and integration tests

Add 7 unit tests covering:
- Ordering by timestamp
- Cross-stream duplicate deduplication
- Empty stream handling
- Complex multi-stream deduplication
- Single stream passthrough
- Large needle ID support
- LastModified fallback when timestamp unavailable

Add 2 integration validation tests:
- TestMergeWorkflowValidation: Documents 9-stage merge workflow
- TestMergeEdgeCaseHandling: Validates 10 edge case handling

All tests passing (9/9)

* fix(volume.merge): Use time window for deduplication to handle clock skew

The same needle ID can have different timestamps on different servers due to
clock skew and replication lag. Needles with the same ID within a 5-second
time window are now treated as duplicates (same write with timestamp variance).

Key changes:
- Add mergeDeduplicationWindowNs constant (5 seconds)
- Replace exact timestamp matching with time window comparison
- Use windowInitialized flag to properly detect window transitions
- Add TestMergeNeedleStreamsTimeWindowDeduplication test

This ensures that replicated writes with slight timestamp differences are
properly deduplicated during merge, while separate updates to the same file
ID (outside the window) are preserved.

All tests passing (10/10)

* test: Add volume.merge integration tests with 5 comprehensive test cases

* test: integration tests for volume.merge command

* Fix integration tests: use TripleVolumeCluster for volume.merge testing

- Created new TripleVolumeCluster framework (cluster_triple.go) with 3 volume servers
- Rebuilt weed binary with volume.merge command compiled in
- Updated all 5 integration tests to use TripleVolumeCluster instead of DualVolumeCluster
- Tests now properly allocate volumes on 2 servers and let merge allocate on 3rd
- All 5 integration tests now pass:
  - TestVolumeMergeBasic
  - TestVolumeMergeReadonly
  - TestVolumeMergeRestore
  - TestVolumeMergeTailNeedles
  - TestVolumeMergeDivergentReplicas

* Refactor test framework: use parameterized server count instead of hardcoded

- Renamed TripleVolumeCluster to MultiVolumeCluster with serverCount parameter
- Replaced hardcoded volumePort0/1/2 with slices for flexible server count
- Updated StartTripleVolumeCluster as backward-compatible wrapper calling StartMultiVolumeCluster(t, profile, 3)
- Made directory creation, port allocation, and server startup loop-based
- Updated accessor methods (VolumeAdminAddress, VolumeGRPCAddress, etc.) to support any server count
- All 5 integration tests continue to pass with new parameterized cluster framework
- Enables future testing with 2, 4, 5+ volume servers by calling StartMultiVolumeCluster directly

* Consolidate cluster frameworks: StartDualVolumeCluster now uses MultiVolumeCluster

- Made DualVolumeCluster a type alias for MultiVolumeCluster
- Updated StartDualVolumeCluster to call StartMultiVolumeCluster(t, profile, 2)
- Removed duplicate code from cluster_dual.go (now just 17 lines)
- All existing tests using StartDualVolumeCluster continue to work without changes
- Backward compatible: existing code continues to use the old function signatures
- Added wrapper functions in cluster_multi.go for StartTripleVolumeCluster
- Enables unified cluster management across all test suites

* Address PR review comments: improve error handling and clean up code

- Replace parse error swallow with proper error return
- Log cleanup and restoration errors instead of silently discarding them
- Remove unused offset field from memoryBackendFile struct
- Fix WriteAt buffer truncation bug to preserve trailing bytes
- All unit tests passing (10/10)
- Code compiles successfully

* Fix PR review findings: test improvements and code quality

- Add timeout to runWeedShell to prevent hanging
- Add server 1 readonly status verification in tests
- Assert merge fails when replicas writable (not just log output)
- Replace sleep with polling for writable restoration check
- Fix WriteAt stale data snapshot bug in memoryBackendFile
- Fix startVolume error logging to show current server log
- Fix volumePubPorts double assignment in port allocation
- Rename test to reflect behavior: DoesNotDeduplicateAcrossWindows
- Fix misleading dedup window comment

Unit tests: 10/10 passing
Binary: Compiles successfully

* Fix test assumption: merge command marks volumes readonly automatically

TestVolumeMergeReadonly was expecting merge to fail on writable volumes, but the
merge command is designed to mark volumes readonly as part of its operation. Fixed
test to verify merge succeeds on writable volumes and properly restores writable
state afterward. Removed redundant Test 2 code that duplicated the new behavior.

* fmt

* Fix deduplication logic to correctly handle same-stream vs cross-stream duplicates

The dedup map previously used only NeedleId as key, causing same-stream
overwrites to be incorrectly skipped as duplicates. Changed to track which
stream first processed each needle ID in the current window:

- Cross-stream duplicates (same ID from different streams, within window) are skipped
- Same-stream duplicates (overwrites from same stream) are kept
- Map now stores: needleId -> streamIndex of first occurrence in window

Added TestMergeNeedleStreamsSameStreamDuplicates to verify same-stream
overwrites are preserved while cross-stream duplicates are skipped.

All unit tests passing (11/11)
Binary compiles successfully
2026-02-25 10:12:09 -08:00