Commit Graph

1377 Commits

Author SHA1 Message Date
Chris Lu
e1e5b4a8a6 add admin script worker (#8491)
* admin: add plugin lock coordination

* shell: allow bypassing lock checks

* plugin worker: add admin script handler

* mini: include admin_script in plugin defaults

* admin script UI: drop name and enlarge text

* admin script: add default script

* admin_script: make run interval configurable

* plugin: gate other jobs during admin_script runs

* plugin: use last completed admin_script run

* admin: backfill plugin config defaults

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* comparable to default version

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* default to run

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* format

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* shell: respect pre-set noLock for fix.replication

* shell: add force no-lock mode for admin scripts

* volume balance worker already exists

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* admin: expose scheduler status JSON

* shell: add sleep command

* shell: restrict sleep syntax

* Revert "shell: respect pre-set noLock for fix.replication"

This reverts commit 2b14e8b82602a740d3a473c085e3b3a14f1ddbb3.

* templ

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* fix import

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* less logs

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

* Reduce master client logs on canceled contexts

* Update mini default job type count

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-03 15:10:40 -08:00
Chris Lu
f5c35240be Add volume dir tags and EC placement priority (#8472)
* Add volume dir tags to topology

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add preferred tag config for EC

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Prioritize EC destinations by tags

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add EC placement planner tag tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refactor EC placement tests to reuse buildActiveTopology

Remove buildActiveTopologyWithDiskTags helper function and consolidate
tag setup inline in test cases. Tests now use UpdateTopology to apply
tags after topology creation, reusing the existing buildActiveTopology
function rather than duplicating its logic.

All tag scenario tests pass:
- TestECPlacementPlannerPrefersTaggedDisks
- TestECPlacementPlannerFallsBackWhenTagsInsufficient

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate normalizeTagList into shared util package

Extract normalizeTagList from three locations (volume.go,
detection.go, erasure_coding_handler.go) into new weed/util/tag.go
as exported NormalizeTagList function. Replace all duplicate
implementations with imports and calls to util.NormalizeTagList.

This improves code reuse and maintainability by centralizing
tag normalization logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PreferredTags to EC config persistence

Add preferred_tags field to ErasureCodingTaskConfig protobuf with field
number 5. Update GetConfigSpec to include preferred_tags field in the
UI configuration schema. Add PreferredTags to ToTaskPolicy to serialize
config to protobuf. Add PreferredTags to FromTaskPolicy to deserialize
from protobuf with defensive copy to prevent external mutation.

This allows EC preferred tags to be persisted and restored across
worker restarts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add defensive copy for Tags slice in DiskLocation

Copy the incoming tags slice in NewDiskLocation instead of storing
by reference. This prevents external callers from mutating the
DiskLocation.Tags slice after construction, improving encapsulation
and preventing unexpected changes to disk metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add doc comment to buildCandidateSets method

Document the tiered candidate selection and fallback behavior. Explain
that for a planner with preferredTags, it accumulates disks matching
each tag in order into progressively larger tiers, emits a candidate
set once a tier reaches shardsNeeded, and finally falls back to the
full candidates set if preferred-tag tiers are insufficient.

This clarifies the intended semantics for future maintainers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply final PR review fixes

1. Update parseVolumeTags to replicate single tag entry to all folders
   instead of leaving some folders with nil tags. This prevents nil
   pointer dereferences when processing folders without explicit tags.

2. Add defensive copy in ToTaskPolicy for PreferredTags slice to match
   the pattern used in FromTaskPolicy, preventing external mutation of
   the returned TaskPolicy.

3. Add clarifying comment in buildCandidateSets explaining that the
   shardsNeeded <= 0 branch is a defensive check for direct callers,
   since selectDestinations guarantees shardsNeeded > 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix nil pointer dereference in parseVolumeTags

Ensure all folder tags are initialized to either normalized tags or
empty slices, not nil. When multiple tag entries are provided and there
are more folders than entries, remaining folders now get empty slices
instead of nil, preventing nil pointer dereference in downstream code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NormalizeTagList to return empty slice instead of nil

Change NormalizeTagList to always return a non-nil slice. When all tags
are empty or whitespace after normalization, return an empty slice
instead of nil. This prevents nil pointer dereferences in downstream
code that expects a valid (possibly empty) slice.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add nil safety check for v.tags pointer

Add a safety check to handle the case where v.tags might be nil,
preventing a nil pointer dereference. If v.tags is nil, use an empty
string instead. This is defensive programming to prevent panics in
edge cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add volume.tags flag to weed server and weed mini commands

Add the volume.tags CLI option to both the 'weed server' and 'weed mini'
commands. This allows users to specify disk tags when running the
combined server modes, just like they can with 'weed volume'.

The flag uses the same format and description as the volume command:
comma-separated tag groups per data dir with ':' separators
(e.g. fast:ssd,archive).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-01 10:22:00 -08:00
Chris Lu
4f647e1036 Worker set its working directory (#8461)
* set working directory

* consolidate to worker directory

* working directory

* correct directory name

* refactoring to use wildcard matcher

* simplify

* cleaning ec working directory

* fix reference

* clean

* adjust test
2026-02-27 12:22:21 -08:00
blitt001
3d81d5bef7 Fix S3 signature verification behind reverse proxies (#8444)
* Fix S3 signature verification behind reverse proxies

When SeaweedFS is deployed behind a reverse proxy (e.g. nginx, Kong,
Traefik), AWS S3 Signature V4 verification fails because the Host header
the client signed with (e.g. "localhost:9000") differs from the Host
header SeaweedFS receives on the backend (e.g. "seaweedfs:8333").

This commit adds a new -s3.externalUrl parameter (and S3_EXTERNAL_URL
environment variable) that tells SeaweedFS what public-facing URL clients
use to connect. When set, SeaweedFS uses this host value for signature
verification instead of the Host header from the incoming request.

New parameter:
  -s3.externalUrl  (flag) or S3_EXTERNAL_URL (environment variable)
  Example: -s3.externalUrl=http://localhost:9000
  Example: S3_EXTERNAL_URL=https://s3.example.com

The environment variable is particularly useful in Docker/Kubernetes
deployments where the external URL is injected via container config.
The flag takes precedence over the environment variable when both are set.

At startup, the URL is parsed and default ports are stripped to match
AWS SDK behavior (port 80 for HTTP, port 443 for HTTPS), so
"http://s3.example.com:80" and "http://s3.example.com" are equivalent.

Bugs fixed:
- Default port stripping was removed by a prior PR, causing signature
  mismatches when clients connect on standard ports (80/443)
- X-Forwarded-Port was ignored when X-Forwarded-Host was not present
- Scheme detection now uses proper precedence: X-Forwarded-Proto >
  TLS connection > URL scheme > "http"
- Test expectations for standard port stripping were incorrect
- expectedHost field in TestSignatureV4WithForwardedPort was declared
  but never actually checked (self-referential test)

* Add Docker integration test for S3 proxy signature verification

Docker Compose setup with nginx reverse proxy to validate that the
-s3.externalUrl parameter (or S3_EXTERNAL_URL env var) correctly
resolves S3 signature verification when SeaweedFS runs behind a proxy.

The test uses nginx proxying port 9000 to SeaweedFS on port 8333,
with X-Forwarded-Host/Port/Proto headers set. SeaweedFS is configured
with -s3.externalUrl=http://localhost:9000 so it uses "localhost:9000"
for signature verification, matching what the AWS CLI signs with.

The test can be run with aws CLI on the host or without it by using
the amazon/aws-cli Docker image with --network host.

Test covers: create-bucket, list-buckets, put-object, head-object,
list-objects-v2, get-object, content round-trip integrity,
delete-object, and delete-bucket — all through the reverse proxy.

* Create s3-proxy-signature-tests.yml

* fix CLI

* fix CI

* Update s3-proxy-signature-tests.yml

* address comments

* Update Dockerfile

* add user

* no need for fuse

* Update s3-proxy-signature-tests.yml

* debug

* weed mini

* fix health check

* health check

* fix health checking

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-02-26 14:20:42 -08:00
Chris Lu
d2b92938ee Make EC detection context aware (#8449)
* Make EC detection context aware

* Update register.go

* Speed up EC detection planning

* Add tests for EC detection planner

* optimizations

detection.go: extracted ParseCollectionFilter (exported) and feed it into the detection loop so both detection and tracing share the same parsing/whitelisting logic; the detection loop now iterates on a sorted list of volume IDs, checks the context at every iteration, and only sets hasMore when there are still unprocessed groups after hitting maxResults, keeping runtime bounded while still scheduling planned tasks before returning the results.
erasure_coding_handler.go: dropped the duplicated inline filter parsing in emitErasureCodingDetectionDecisionTrace and now reuse erasurecodingtask.ParseCollectionFilter, and the summary suffix logic now only accounts for the hasMore case that can actually happen.
detection_test.go: updated the helper topology builder to use master_pb.VolumeInformationMessage (matching the current protobuf types) and tightened the cancellation/max-results tests so they reliably exercise the detection logic (cancel before calling Detection, and provide enough disks so one result is produced before the limit).

* use working directory

* fix compilation

* fix compilation

* rename

* go vet

* fix getenv

* address comments, fix error
2026-02-25 18:02:35 -08:00
Chris Lu
7f6e58b791 Fix SFTP file upload failures with JWT filer tokens (#8448)
* Fix SFTP file upload failures with JWT filer tokens (issue #8425)

When JWT authentication is enabled for filer operations via jwt.filer_signing.*
configuration, SFTP server file upload requests were rejected because they lacked
JWT authorization headers.

Changes:
- Added JWT signing key and expiration fields to SftpServer struct
- Modified putFile() to generate and include JWT tokens in upload requests
- Enhanced SFTPServiceOptions with JWT configuration fields
- Updated SFTP command startup to load and pass JWT config to service

This allows SFTP uploads to authenticate with JWT-enabled filers, consistent
with how other SeaweedFS components (S3 API, file browser) handle filer auth.

Fixes #8425

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 14:30:21 -08:00
Anton
427c975ff3 fix(plugin/worker): make VacuumHandler report MaxExecutionConcurrency from worker startup flag (#8435)
* fix(plugin/worker): make VacuumHandler report MaxExecutionConcurrency from worker startup flag

Previously, MaxExecutionConcurrency was hardcoded to 2 in VacuumHandler.Capability().
The scheduler's schedulerWorkerExecutionLimit() takes the minimum of the UI-configured
PerWorkerExecutionConcurrency and the worker-reported capability limit, so the hardcoded
value silently capped each worker to 2 concurrent vacuum executions regardless of the
--max-execute flag passed at worker startup.

Pass maxExecutionConcurrency into NewVacuumHandler() and wire it through
buildPluginWorkerHandler/buildPluginWorkerHandlers so the capability reflects the actual
worker configuration. The default falls back to 2 when the value is unset or zero.

* Update weed/command/worker_runtime.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Anton Ustyugov <anton@devops>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-24 15:13:00 -08:00
Chris Lu
8d59ef41d5 Admin UI: replace gin with mux (#8420)
* Replace admin gin router with mux

* Update layout_templ.go

* Harden admin handlers

* Add login CSRF handling

* Fix filer copy naming conflict

* address comments

* address comments
2026-02-23 19:11:17 -08:00
Chris Lu
e596542295 Move SQL engine and PostgreSQL server to their own binaries (#8417)
* Drop SQL engine and PostgreSQL server

* Split SQL tooling into weed-db and weed-sql

* move

* fix building
2026-02-23 16:27:08 -08:00
Chris Lu
e4b70c2521 go fix 2026-02-20 18:42:00 -08:00
Chris Lu
8ec9ff4a12 Refactor plugin system and migrate worker runtime (#8369)
* admin: add plugin runtime UI page and route wiring

* pb: add plugin gRPC contract and generated bindings

* admin/plugin: implement worker registry, runtime, monitoring, and config store

* admin/dash: wire plugin runtime and expose plugin workflow APIs

* command: add flags to enable plugin runtime

* admin: rename remaining plugin v2 wording to plugin

* admin/plugin: add detectable job type registry helper

* admin/plugin: add scheduled detection and dispatch orchestration

* admin/plugin: prefetch job type descriptors when workers connect

* admin/plugin: add known job type discovery API and UI

* admin/plugin: refresh design doc to match current implementation

* admin/plugin: enforce per-worker scheduler concurrency limits

* admin/plugin: use descriptor runtime defaults for scheduler policy

* admin/ui: auto-load first known plugin job type on page open

* admin/plugin: bootstrap persisted config from descriptor defaults

* admin/plugin: dedupe scheduled proposals by dedupe key

* admin/ui: add job type and state filters for plugin monitoring

* admin/ui: add per-job-type plugin activity summary

* admin/plugin: split descriptor read API from schema refresh

* admin/ui: keep plugin summary metrics global while tables are filtered

* admin/plugin: retry executor reservation before timing out

* admin/plugin: expose scheduler states for monitoring

* admin/ui: show per-job-type scheduler states in plugin monitor

* pb/plugin: rename protobuf package to plugin

* admin/plugin: rename pluginRuntime wiring to plugin

* admin/plugin: remove runtime naming from plugin APIs and UI

* admin/plugin: rename runtime files to plugin naming

* admin/plugin: persist jobs and activities for monitor recovery

* admin/plugin: lease one detector worker per job type

* admin/ui: show worker load from plugin heartbeats

* admin/plugin: skip stale workers for detector and executor picks

* plugin/worker: add plugin worker command and stream runtime scaffold

* plugin/worker: implement vacuum detect and execute handlers

* admin/plugin: document external vacuum plugin worker starter

* command: update plugin.worker help to reflect implemented flow

* command/admin: drop legacy Plugin V2 label

* plugin/worker: validate vacuum job type and respect min interval

* plugin/worker: test no-op detect when min interval not elapsed

* command/admin: document plugin.worker external process

* plugin/worker: advertise configured concurrency in hello

* command/plugin.worker: add jobType handler selection

* command/plugin.worker: test handler selection by job type

* command/plugin.worker: persist worker id in workingDir

* admin/plugin: document plugin.worker jobType and workingDir flags

* plugin/worker: support cancel request for in-flight work

* plugin/worker: test cancel request acknowledgements

* command/plugin.worker: document workingDir and jobType behavior

* plugin/worker: emit executor activity events for monitor

* plugin/worker: test executor activity builder

* admin/plugin: send last successful run in detection request

* admin/plugin: send cancel request when detect or execute context ends

* admin/plugin: document worker cancel request responsibility

* admin/handlers: expose plugin scheduler states API in no-auth mode

* admin/handlers: test plugin scheduler states route registration

* admin/plugin: keep worker id on worker-generated activity records

* admin/plugin: test worker id propagation in monitor activities

* admin/dash: always initialize plugin service

* command/admin: remove plugin enable flags and default to enabled

* admin/dash: drop pluginEnabled constructor parameter

* admin/plugin UI: stop checking plugin enabled state

* admin/plugin: remove docs for plugin enable flags

* admin/dash: remove unused plugin enabled check method

* admin/dash: fallback to in-memory plugin init when dataDir fails

* admin/plugin API: expose worker gRPC port in status

* command/plugin.worker: resolve admin gRPC port via plugin status

* split plugin UI into overview/configuration/monitoring pages

* Update layout_templ.go

* add volume_balance plugin worker handler

* wire plugin.worker CLI for volume_balance job type

* add erasure_coding plugin worker handler

* wire plugin.worker CLI for erasure_coding job type

* support multi-job handlers in plugin worker runtime

* allow plugin.worker jobType as comma-separated list

* admin/plugin UI: rename to Workers and simplify config view

* plugin worker: queue detection requests instead of capacity reject

* Update plugin_worker.go

* plugin volume_balance: remove force_move/timeout from worker config UI

* plugin erasure_coding: enforce local working dir and cleanup

* admin/plugin UI: rename admin settings to job scheduling

* admin/plugin UI: persist and robustly render detection results

* admin/plugin: record and return detection trace metadata

* admin/plugin UI: show detection process and decision trace

* plugin: surface detector decision trace as activities

* mini: start a plugin worker by default

* admin/plugin UI: split monitoring into detection and execution tabs

* plugin worker: emit detection decision trace for EC and balance

* admin workers UI: split monitoring into detection and execution pages

* plugin scheduler: skip proposals for active assigned/running jobs

* admin workers UI: add job queue tab

* plugin worker: add dummy stress detector and executor job type

* admin workers UI: reorder tabs to detection queue execution

* admin workers UI: regenerate plugin template

* plugin defaults: include dummy stress and add stress tests

* plugin dummy stress: rotate detection selections across runs

* plugin scheduler: remove cross-run proposal dedupe

* plugin queue: track pending scheduled jobs

* plugin scheduler: wait for executor capacity before dispatch

* plugin scheduler: skip detection when waiting backlog is high

* plugin: add disk-backed job detail API and persistence

* admin ui: show plugin job detail modal from job id links

* plugin: generate unique job ids instead of reusing proposal ids

* plugin worker: emit heartbeats on work state changes

* plugin registry: round-robin tied executor and detector picks

* add temporary EC overnight stress runner

* plugin job details: persist and render EC execution plans

* ec volume details: color data and parity shard badges

* shard labels: keep parity ids numeric and color-only distinction

* admin: remove legacy maintenance UI routes and templates

* admin: remove dead maintenance endpoint helpers

* Update layout_templ.go

* remove dummy_stress worker and command support

* refactor plugin UI to job-type top tabs and sub-tabs

* migrate weed worker command to plugin runtime

* remove plugin.worker command and keep worker runtime with metrics

* update helm worker args for jobType and execution flags

* set plugin scheduling defaults to global 16 and per-worker 4

* stress: fix RPC context reuse and remove redundant variables in ec_stress_runner

* admin/plugin: fix lifecycle races, safe channel operations, and terminal state constants

* admin/dash: randomize job IDs and fix priority zero-value overwrite in plugin API

* admin/handlers: implement buffered rendering to prevent response corruption

* admin/plugin: implement debounced persistence flusher and optimize BuildJobDetail memory lookups

* admin/plugin: fix priority overwrite and implement bounded wait in scheduler reserve

* admin/plugin: implement atomic file writes and fix run record side effects

* admin/plugin: use P prefix for parity shard labels in execution plans

* admin/plugin: enable parallel execution for cancellation tests

* admin: refactor time.Time fields to pointers for better JSON omitempty support

* admin/plugin: implement pointer-safe time assignments and comparisons in plugin core

* admin/plugin: fix time assignment and sorting logic in plugin monitor after pointer refactor

* admin/plugin: update scheduler activity tracking to use time pointers

* admin/plugin: fix time-based run history trimming after pointer refactor

* admin/dash: fix JobSpec struct literal in plugin API after pointer refactor

* admin/view: add D/P prefixes to EC shard badges for UI consistency

* admin/plugin: use lifecycle-aware context for schema prefetching

* Update ec_volume_details_templ.go

* admin/stress: fix proposal sorting and log volume cleanup errors

* stress: refine ec stress runner with math/rand and collection name

- Added Collection field to VolumeEcShardsDeleteRequest for correct filename construction.
- Replaced crypto/rand with seeded math/rand PRNG for bulk payloads.
- Added documentation for EcMinAge zero-value behavior.
- Added logging for ignored errors in volume/shard deletion.

* admin: return internal server error for plugin store failures

Changed error status code from 400 Bad Request to 500 Internal Server Error for failures in GetPluginJobDetail to correctly reflect server-side errors.

* admin: implement safe channel sends and graceful shutdown sync

- Added sync.WaitGroup to Plugin struct to manage background goroutines.
- Implemented safeSendCh helper using recover() to prevent panics on closed channels.
- Ensured Shutdown() waits for all background operations to complete.

* admin: robustify plugin monitor with nil-safe time and record init

- Standardized nil-safe assignment for *time.Time pointers (CreatedAt, UpdatedAt, CompletedAt).
- Ensured persistJobDetailSnapshot initializes new records correctly if they don't exist on disk.
- Fixed debounced persistence to trigger immediate write on job completion.

* admin: improve scheduler shutdown behavior and logic guards

- Replaced brittle error string matching with explicit r.shutdownCh selection for shutdown detection.
- Removed redundant nil guard in buildScheduledJobSpec.
- Standardized WaitGroup usage for schedulerLoop.

* admin: implement deep copy for job parameters and atomic write fixes

- Implemented deepCopyGenericValue and used it in cloneTrackedJob to prevent shared state.
- Ensured atomicWriteFile creates parent directories before writing.

* admin: remove unreachable branch in shard classification

Removed an unreachable 'totalShards <= 0' check in classifyShardID as dataShards and parityShards are already guarded.

* admin: secure UI links and use canonical shard constants

- Added rel="noopener noreferrer" to external links for security.
- Replaced magic number 14 with erasure_coding.TotalShardsCount.
- Used renderEcShardBadge for missing shard list consistency.

* admin: stabilize plugin tests and fix regressions

- Composed a robust plugin_monitor_test.go to handle asynchronous persistence.
- Updated all time.Time literals to use timeToPtr helper.
- Added explicit Shutdown() calls in tests to synchronize with debounced writes.
- Fixed syntax errors and orphaned struct literals in tests.

* Potential fix for code scanning alert no. 278: Slice memory allocation with excessive size value

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Potential fix for code scanning alert no. 283: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* admin: finalize refinements for error handling, scheduler, and race fixes

- Standardized HTTP 500 status codes for store failures in plugin_api.go.
- Tracked scheduled detection goroutines with sync.WaitGroup for safe shutdown.
- Fixed race condition in safeSendDetectionComplete by extracting channel under lock.
- Implemented deep copy for JobActivity details.
- Used defaultDirPerm constant in atomicWriteFile.

* test(ec): migrate admin dockertest to plugin APIs

* admin/plugin_api: fix RunPluginJobTypeAPI to return 500 for server-side detection/filter errors

* admin/plugin_api: fix ExecutePluginJobAPI to return 500 for job execution failures

* admin/plugin_api: limit parseProtoJSONBody request body to 1MB to prevent unbounded memory usage

* admin/plugin: consolidate regex to package-level validJobTypePattern; add char validation to sanitizeJobID

* admin/plugin: fix racy Shutdown channel close with sync.Once

* admin/plugin: track sendLoop and recv goroutines in WorkerStream with r.wg

* admin/plugin: document writeProtoFiles atomicity — .pb is source of truth, .json is human-readable only

* admin/plugin: extract activityLess helper to deduplicate nil-safe OccurredAt sort comparators

* test/ec: check http.NewRequest errors to prevent nil req panics

* test/ec: replace deprecated ioutil/math/rand, fix stale step comment 5.1→3.1

* plugin(ec): raise default detection and scheduling throughput limits

* topology: include empty disks in volume list and EC capacity fallback

* topology: remove hard 10-task cap for detection planning

* Update ec_volume_details_templ.go

* adjust default

* fix tests

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-02-18 13:42:41 -08:00
Chris Lu
3300874cb5 filer: add default log purging to master maintenance scripts (#8359)
* filer: add default log purging to master maintenance scripts

* filer: fix default maintenance scripts to include full set of tasks

* filer: refactor maintenance scripts to avoid duplication
2026-02-16 16:58:15 -08:00
Chris Lu
0d8588e3ae S3: Implement IAM defaults and STS signing key fallback (#8348)
* S3: Implement IAM defaults and STS signing key fallback logic

* S3: Refactor startup order to init SSE-S3 key manager before IAM

* S3: Derive STS signing key from KEK using HKDF for security isolation

* S3: Document STS signing key fallback in security.toml

* fix(s3api): refine anonymous access logic and secure-by-default behavior

- Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions.
- Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration.
- Update `NewIdentityAccessManagement` signature to accept `filerClient`.
- In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior.
- Use specific `LookupAnonymous` method instead of generic map lookup.
- Update tests to accommodate signature changes and verify improved anonymous handling.

* feat(s3api): make IAM configuration optional

- Start S3 API server without a configuration file if `EnableIam` option is set.
- Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode).
- Handle empty configuration path gracefully in `loadIAMManagerFromConfig`.
- Add integration test `iam_optional_test.go` to verify empty config behavior.

* fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore

* fix(iamapi): properly initialize FilerClient instead of passing nil

* fix(iamapi): properly initialize filer client for IAM management

- Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses.
- Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality.

* clean: remove dead code in s3api_server.go

* refactor(s3api): improve IAM initialization, safety and anonymous access security

* fix(s3api): ensure IAM config loads from filer after client init

* fix(s3): resolve test failures in integration, CORS, and tagging tests

- Fix CORS tests by providing explicit anonymous permissions config
- Fix S3 integration tests by setting admin credentials in init
- Align tagging test credentials in CI with IAM defaults
- Added goroutine to retry IAM config load in iamapi server

* fix(s3): allow anonymous access to health targets and S3 Tables when identities are present

* fix(ci): use /healthz for Caddy health check in awscli tests

* iam, s3api: expose DefaultAllow from IAM and Policy Engine

This allows checking the global "Open by Default" configuration from
other components like S3 Tables.

* s3api/s3tables: support DefaultAllow in permission logic and handler

Updated CheckPermissionWithContext to respect the DefaultAllow flag
in PolicyContext. This enables "Open by Default" behavior for
unauthenticated access in zero-config environments. Added a targeted
unit test to verify the logic.

* s3api/s3tables: propagate DefaultAllow through handlers

Propagated the DefaultAllow flag to individual handlers for
namespaces, buckets, tables, policies, and tagging. This ensures
consistent "Open by Default" behavior across all S3 Tables API
endpoints.

* s3api: wire up DefaultAllow for S3 Tables API initialization

Updated registerS3TablesRoutes to query the global IAM configuration
and set the DefaultAllow flag on the S3 Tables API server. This
completes the end-to-end propagation required for anonymous access in
zero-config environments. Added a SetDefaultAllow method to
S3TablesApiServer to facilitate this.

* s3api: fix tests by adding DefaultAllow to mock IAM integrations

The IAMIntegration interface was updated to include DefaultAllow(),
breaking several mock implementations in tests. This commit fixes
the build errors by adding the missing method to the mocks.

* env

* ensure ports

* env

* env

* fix default allow

* add one more test using non-anonymous user

* debug

* add more debug

* less logs
2026-02-16 13:59:13 -08:00
Lisandro Pin
0721e3c1e9 Rework volume compaction (a.k.a vacuuming) logic to cleanly support new parameters. (#8337)
We'll leverage on this to support a "ignore broken needles" option, necessary
to properly recover damaged volumes, as described in
https://github.com/seaweedfs/seaweedfs/issues/7442#issuecomment-3897784283 .
2026-02-16 02:15:14 -08:00
Chris Lu
b57429ef2e Switch empty-folder cleanup to bucket policy (#8292)
* Fix Spark _temporary cleanup and add issue #8285 regression test

* Generalize empty folder cleanup for Spark temp artifacts

* Revert synchronous folder pruning and add cleanup diagnostics

* Add actionable empty-folder cleanup diagnostics

* Fix Spark temp marker cleanup in async folder cleaner

* Fix Spark temp cleanup with implicit directory markers

* Keep explicit directory markers non-implicit

* logging

* more logs

* Switch empty-folder cleanup to bucket policy

* Seaweed-X-Amz-Allow-Empty-Folders

* less logs

* go vet

* less logs

* refactoring
2026-02-10 18:38:38 -08:00
Chris Lu
ba8e2aaae9 Fix master leader election when grpc ports change (#8272)
* Fix master leader detection when grpc ports change

* Canonicalize self peer entry to avoid raft self-alias panic

* Normalize and deduplicate master peer addresses
2026-02-09 18:13:02 -08:00
Chris Lu
e6ee293c17 Add table operations test (#8241)
* Add Trino blog operations test

* Update test/s3tables/catalog_trino/trino_blog_operations_test.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* feat: add table bucket path helpers and filer operations

- Add table object root and table location mapping directories
- Implement ensureDirectory, upsertFile, deleteEntryIfExists helpers
- Support table location bucket mapping for S3 access

* feat: manage table bucket object roots on creation/deletion

- Create .objects directory for table buckets on creation
- Clean up table object bucket paths on deletion
- Enable S3 operations on table bucket object roots

* feat: add table location mapping for Iceberg REST

- Track table location bucket mappings when tables are created/updated/deleted
- Enable location-based routing for S3 operations on table data

* feat: route S3 operations to table bucket object roots

- Route table-s3 bucket names to mapped table paths
- Route table buckets to object root directories
- Support table location bucket mapping lookup

* feat: emit table-s3 locations from Iceberg REST

- Generate unique table-s3 bucket names with UUID suffix
- Store table metadata under table bucket paths
- Return table-s3 locations for Trino compatibility

* fix: handle missing directories in S3 list operations

- Propagate ErrNotFound from ListEntries for non-existent directories
- Treat missing directories as empty results for list operations
- Fixes Trino non-empty location checks on table creation

* test: improve Trino CSV parsing for single-value results

- Sanitize Trino output to skip jline warnings
- Handle single-value CSV results without header rows
- Strip quotes from numeric values in tests

* refactor: use bucket path helpers throughout S3 API

- Replace direct bucket path operations with helper functions
- Leverage centralized table bucket routing logic
- Improve maintainability with consistent path resolution

* fix: add table bucket cache and improve filer error handling

- Cache table bucket lookups to reduce filer overhead on repeated checks
- Use filer_pb.CreateEntry and filer_pb.UpdateEntry helpers to check resp.Error
- Fix delete order in handler_bucket_get_list_delete: delete table object before directory
- Make location mapping errors best-effort: log and continue, don't fail API
- Update table location mappings to delete stale prior bucket mappings on update
- Add 1-second sleep before timestamp time travel query to ensure timestamps are in past
- Fix CSV parsing: examine all lines, not skip first; handle single-value rows

* fix: properly handle stale metadata location mapping cleanup

- Capture oldMetadataLocation before mutation in handleUpdateTable
- Update updateTableLocationMapping to accept both old and new locations
- Use passed-in oldMetadataLocation to detect location changes
- Delete stale mapping only when location actually changes
- Pass empty string for oldLocation in handleCreateTable (new tables have no prior mapping)
- Improve logging to show old -> new location transitions

* refactor: cleanup imports and cache design

- Remove unused 'sync' import from bucket_paths.go
- Use filer_pb.UpdateEntry helper in setExtendedAttribute and deleteExtendedAttribute for consistent error handling
- Add dedicated tableBucketCache map[string]bool to BucketRegistry instead of mixing concerns with metadataCache
- Improve cache separation: table buckets cache is now separate from bucket metadata cache

* fix: improve cache invalidation and add transient error handling

Cache invalidation (critical fix):
- Add tableLocationCache to BucketRegistry for location mapping lookups
- Clear tableBucketCache and tableLocationCache in RemoveBucketMetadata
- Prevents stale cache entries when buckets are deleted/recreated

Transient error handling:
- Only cache table bucket lookups when conclusive (found or ErrNotFound)
- Skip caching on transient errors (network, permission, etc)
- Prevents marking real table buckets as non-table due to transient failures

Performance optimization:
- Cache tableLocationDir results to avoid repeated filer RPCs on hot paths
- tableLocationDir now checks cache before making expensive filer lookups
- Cache stores empty string for 'not found' to avoid redundant lookups

Code clarity:
- Add comment to deleteDirectory explaining DeleteEntry response lacks Error field

* go fmt

* fix: mirror transient error handling in tableLocationDir and optimize bucketDir

Transient error handling:
- tableLocationDir now only caches definitive results
- Mirrors isTableBucket behavior to prevent treating transient errors as permanent misses
- Improves reliability on flaky systems or during recovery

Performance optimization:
- bucketDir avoids redundant isTableBucket call via bucketRoot
- Directly use s3a.option.BucketsPath for regular buckets
- Saves one cache lookup for every non-table bucket operation

* fix: revert bucketDir optimization to preserve bucketRoot logic

The optimization to directly use BucketsPath bypassed bucketRoot's logic
and caused issues with S3 list operations on delimiter+prefix cases.

Revert to using path.Join(s3a.bucketRoot(bucket), bucket) which properly
handles all bucket types and ensures consistent path resolution across
the codebase.

The slight performance cost of an extra cache lookup is worth the correctness
and consistency benefits.

* feat: move table buckets under /buckets

Add a table-bucket marker attribute, reuse bucket metadata cache for table bucket detection, and update list/validation/UI/test paths to treat table buckets as /buckets entries.

* Fix S3 Tables code review issues

- handler_bucket_create.go: Fix bucket existence check to properly validate
  entryResp.Entry before setting s3BucketExists flag (nil Entry should not
  indicate existing bucket)
- bucket_paths.go: Add clarifying comment to bucketRoot() explaining unified
  buckets root path for all bucket types
- file_browser_data.go: Optimize by extracting table bucket check early to
  avoid redundant WithFilerClient call

* Fix list prefix delimiter handling

* Handle list errors conservatively

* Fix Trino FOR TIMESTAMP query - use past timestamp

Iceberg requires the timestamp to be strictly in the past.
Use current_timestamp - interval '1' second instead of current_timestamp.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-07 13:27:47 -08:00
Chris Lu
a3b83f8808 test: add Trino Iceberg catalog integration test (#8228)
* test: add Trino Iceberg catalog integration test

- Create test/s3/catalog_trino/trino_catalog_test.go with TestTrinoIcebergCatalog
- Tests integration between Trino SQL engine and SeaweedFS Iceberg REST catalog
- Starts weed mini with all services and Trino in Docker container
- Validates Iceberg catalog schema creation and listing operations
- Uses native S3 filesystem support in Trino with path-style access
- Add workflow job to s3-tables-tests.yml for CI execution

* fix: preserve AWS environment credentials when replacing S3 configuration

When S3 configuration is loaded from filer/db, it replaces the identities list
and inadvertently removes AWS_ACCESS_KEY_ID credentials that were added from
environment variables. This caused auth to remain disabled even though valid
credentials were present.

Fix by preserving environment-based identities when replacing the configuration
and re-adding them after the replacement. This ensures environment credentials
persist across configuration reloads and properly enable authentication.

* fix: use correct ServerAddress format with gRPC port encoding

The admin server couldn't connect to master because the master address
was missing the gRPC port information. Use pb.NewServerAddress() which
properly encodes both HTTP and gRPC ports in the address string.

Changes:
- weed/command/mini.go: Use pb.NewServerAddress for master address in admin
- test/s3/policy/policy_test.go: Store and use gRPC ports for master/filer addresses

This fix applies to:
1. Admin server connection to master (mini.go)
2. Test shell commands that need master/filer addresses (policy_test.go)

* move

* move

* fix: always include gRPC port in server address encoding

The NewServerAddress() function was omitting the gRPC port from the address
string when it matched the port+10000 convention. However, gRPC port allocation
doesn't always follow this convention - when the calculated port is busy, an
alternative port is allocated.

This caused a bug where:
1. Master's gRPC port was allocated as 50661 (sequential, not port+10000)
2. Address was encoded as '192.168.1.66:50660' (gRPC port omitted)
3. Admin client called ToGrpcAddress() which assumed port+10000 offset
4. Admin tried to connect to 60660 but master was on 50661 → connection failed

Fix: Always include explicit gRPC port in address format (host:httpPort.grpcPort)
unless gRPC port is 0. This makes addresses unambiguous and works regardless of
the port allocation strategy used.

Impacts: All server-to-server gRPC connections now use properly formatted addresses.

* test: fix Iceberg REST API readiness check

The Iceberg REST API endpoints require authentication. When checked without
credentials, the API returns 403 Forbidden (not 401 Unauthorized).  The
readiness check now accepts both auth error codes (401/403) as indicators
that the service is up and ready, it just needs credentials.

This fixes the 'Iceberg REST API did not become ready' test failure.

* Fix AWS SigV4 signature verification for base64-encoded payload hashes

   AWS SigV4 canonical requests must use hex-encoded SHA256 hashes,
   but the X-Amz-Content-Sha256 header may be transmitted as base64.

   Changes:
   - Added normalizePayloadHash() function to convert base64 to hex
   - Call normalizePayloadHash() in extractV4AuthInfoFromHeader()
   - Added encoding/base64 import

   Fixes 403 Forbidden errors on POST requests to Iceberg REST API
   when clients send base64-encoded content hashes in the header.

   Impacted services: Iceberg REST API, S3Tables

* Fix AWS SigV4 signature verification for base64-encoded payload hashes

   AWS SigV4 canonical requests must use hex-encoded SHA256 hashes,
   but the X-Amz-Content-Sha256 header may be transmitted as base64.

   Changes:
   - Added normalizePayloadHash() function to convert base64 to hex
   - Call normalizePayloadHash() in extractV4AuthInfoFromHeader()
   - Added encoding/base64 import
   - Removed unused fmt import

   Fixes 403 Forbidden errors on POST requests to Iceberg REST API
   when clients send base64-encoded content hashes in the header.

   Impacted services: Iceberg REST API, S3Tables

* pass sigv4

* s3api: fix identity preservation and logging levels

- Ensure environment-based identities are preserved during config replacement
- Update accessKeyIdent and nameToIdentity maps correctly
- Downgrade informational logs to V(2) to reduce noise

* test: fix trino integration test and s3 policy test

- Pin Trino image version to 479
- Fix port binding to 0.0.0.0 for Docker connectivity
- Fix S3 policy test hang by correctly assigning MiniClusterCtx
- Improve port finding robustness in policy tests

* ci: pre-pull trino image to avoid timeouts

- Pull trinodb/trino:479 after Docker setup
- Ensure image is ready before integration tests start

* iceberg: remove unused checkAuth and improve logging

- Remove unused checkAuth method
- Downgrade informational logs to V(2)
- Ensure loggingMiddleware uses a status writer for accurate reported codes
- Narrow catch-all route to avoid interfering with other subsystems

* iceberg: fix build failure by removing unused s3api import

* Update iceberg.go

* use warehouse

* Update trino_catalog_test.go
2026-02-06 13:12:25 -08:00
Chris Lu
a04e8dd00b Support Linux file/dir ACL in weed mount (#8233)
* Support Linux file/dir ACL in weed mount #8229

* Update weed/command/mount_std.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-06 11:33:36 -08:00
Chris Lu
e39a4c2041 fix flaky test 2026-02-04 23:16:31 -08:00
Chris Lu
bd4e7ff14e command: fix s3 panic in filer command (#8208)
command: fix s3 panic in filer command due to uninitialized options

This fixes a nil pointer dereference panic when starting the S3 server
via the `weed filer` command by correctly initializing `iamReadOnly`
and `portIceberg` flags.

Relates to #8200
2026-02-04 10:37:20 -08:00
Chris Lu
72a8f598f2 Fix Maintenance Task Sorting and Refactor Log Persistence (#8199)
* fix float stepping

* do not auto refresh

* only logs when non 200 status

* fix maintenance task sorting and cleanup redundant handler logic

* Refactor log retrieval to persist to disk and fix slowness

- Move log retrieval to disk-based persistence in GetMaintenanceTaskDetail
- Implement background log fetching on task completion in worker_grpc_server.go
- Implement async background refresh for in-progress tasks
- Completely remove blocking gRPC calls from the UI path to fix 10s timeouts
- Cleanup debug logs and performance profiling code

* Ensure consistent deterministic sorting in config_persistence cleanup

* Replace magic numbers with constants and remove debug logs

- Added descriptive constants for truncation limits and timeouts in admin_server.go and worker_grpc_server.go
- Replaced magic numbers with these constants throughout the codebase
- Verified removal of stdout debug printing
- Ensured consistent truncation logic during log persistence

* Address code review feedback on history truncation and logging logic

- Fix AssignmentHistory double-serialization by copying task in GetMaintenanceTaskDetail
- Fix handleTaskCompletion logging logic (mutually exclusive success/failure logs)
- Remove unused Timeout field from LogRequestContext and sync select timeouts with constants
- Ensure AssignmentHistory is only provided in the top-level field for better JSON structure

* Implement goroutine leak protection and request deduplication

- Add request deduplication in RequestTaskLogs to prevent multiple concurrent fetches for the same task
- Implement safe cleanup in timeout handlers to avoid race conditions in pendingLogRequests map
- Add a 10s cooldown for background log refreshes in GetMaintenanceTaskDetail to prevent spamming
- Ensure all persistent log-fetching goroutines are bounded and efficiently managed

* Fix potential nil pointer panics in maintenance handlers

- Add nil checks for adminServer in ShowTaskDetail, ShowMaintenanceWorkers, and UpdateTaskConfig
- Update getMaintenanceQueueData to return a descriptive error instead of nil when adminServer is uninitialized
- Ensure internal helper methods consistently check for adminServer initialization before use

* Strictly enforce disk-only log reading

- Remove background log fetching from GetMaintenanceTaskDetail to prevent timeouts and network calls during page view
- Remove unused lastLogFetch tracking fields to clean up dead code
- Ensure logs are only updated upon task completion via handleTaskCompletion

* Refactor GetWorkerLogs to read from disk

- Update /api/maintenance/workers/:id/logs endpoint to use configPersistence.LoadTaskExecutionLogs
- Remove synchronous gRPC call RequestTaskLogs to prevent timeouts and bad gateway errors
- Ensure consistent log retrieval behavior across the application (disk-only)

* Fix timestamp parsing in log viewer

- Update task_detail.templ JS to handle both ISO 8601 strings and Unix timestamps
- Fix "Invalid time value" error when displaying logs fetched from disk
- Regenerate templates

* master: fallback to HDD if SSD volumes are full in Assign

* worker: improve EC detection logging and fix skip counters

* worker: add Sync method to TaskLogger interface

* worker: implement Sync and ensure logs are flushed before task completion

* admin: improve task log retrieval with retries and better timeouts

* admin: robust timestamp parsing in task detail view
2026-02-04 08:48:55 -08:00
Chris Lu
2ff1cd9fc9 format 2026-02-03 18:39:01 -08:00
Chris Lu
1274cf038c s3: enforce authentication and JSON error format for Iceberg REST Catalog (#8192)
* s3: enforce authentication and JSON error format for Iceberg REST Catalog

* s3/iceberg: align error exception types with OpenAPI spec examples

* s3api: refactor AuthenticateRequest to return identity object

* s3/iceberg: propagate full identity object to request context

* s3/iceberg: differentiate NotAuthorizedException and ForbiddenException

* s3/iceberg: reject requests if authenticator is nil to prevent auth bypass

* s3/iceberg: refactor Auth middleware to build context incrementally and use switch for error mapping

* s3api: update misleading comment for authRequestWithAuthType

* s3api: return ErrAccessDenied if IAM is not configured to prevent auth bypass

* s3/iceberg: optimize context update in Auth middleware

* s3api: export CanDo for external authorization use

* s3/iceberg: enforce identity-based authorization in all API handlers

* s3api: fix compilation errors by updating internal CanDo references

* s3/iceberg: robust identity validation and consistent action usage in handlers

* s3api: complete CanDo rename across tests and policy engine integration

* s3api: fix integration tests by allowing admin access when auth is disabled and explicit gRPC ports

* duckdb

* create test bucket
2026-02-03 11:55:12 -08:00
Chris Lu
2bb21ea276 feat: Add Iceberg REST Catalog server and admin UI (#8175)
* feat: Add Iceberg REST Catalog server

Implement Iceberg REST Catalog API on a separate port (default 8181)
that exposes S3 Tables metadata through the Apache Iceberg REST protocol.

- Add new weed/s3api/iceberg package with REST handlers
- Implement /v1/config endpoint returning catalog configuration
- Implement namespace endpoints (list/create/get/head/delete)
- Implement table endpoints (list/create/load/head/delete/update)
- Add -port.iceberg flag to S3 standalone server (s3.go)
- Add -s3.port.iceberg flag to combined server mode (server.go)
- Add -s3.port.iceberg flag to mini cluster mode (mini.go)
- Support prefix-based routing for multiple catalogs

The Iceberg REST server reuses S3 Tables metadata storage under
/table-buckets and enables DuckDB, Spark, and other Iceberg clients
to connect to SeaweedFS as a catalog.

* feat: Add Iceberg Catalog pages to admin UI

Add admin UI pages to browse Iceberg catalogs, namespaces, and tables.

- Add Iceberg Catalog menu item under Object Store navigation
- Create iceberg_catalog.templ showing catalog overview with REST info
- Create iceberg_namespaces.templ listing namespaces in a catalog
- Create iceberg_tables.templ listing tables in a namespace
- Add handlers and routes in admin_handlers.go
- Add Iceberg data provider methods in s3tables_management.go
- Add Iceberg data types in types.go

The Iceberg Catalog pages provide visibility into the same S3 Tables
data through an Iceberg-centric lens, including REST endpoint examples
for DuckDB and PyIceberg.

* test: Add Iceberg catalog integration tests and reorg s3tables tests

- Reorganize existing s3tables tests to test/s3tables/table-buckets/
- Add new test/s3tables/catalog/ for Iceberg REST catalog tests
- Add TestIcebergConfig to verify /v1/config endpoint
- Add TestIcebergNamespaces to verify namespace listing
- Add TestDuckDBIntegration for DuckDB connectivity (requires Docker)
- Update CI workflow to use new test paths

* fix: Generate proper random UUIDs for Iceberg tables

Address code review feedback:
- Replace placeholder UUID with crypto/rand-based UUID v4 generation
- Add detailed TODO comments for handleUpdateTable stub explaining
  the required atomic metadata swap implementation

* fix: Serve Iceberg on localhost listener when binding to different interface

Address code review feedback: properly serve the localhost listener
when the Iceberg server is bound to a non-localhost interface.

* ci: Add Iceberg catalog integration tests to CI

Add new job to run Iceberg catalog tests in CI, along with:
- Iceberg package build verification
- Iceberg unit tests
- Iceberg go vet checks
- Iceberg format checks

* fix: Address code review feedback for Iceberg implementation

- fix: Replace hardcoded account ID with s3_constants.AccountAdminId in buildTableBucketARN()
- fix: Improve UUID generation error handling with deterministic fallback (timestamp + PID + counter)
- fix: Update handleUpdateTable to return HTTP 501 Not Implemented instead of fake success
- fix: Better error handling in handleNamespaceExists to distinguish 404 from 500 errors
- fix: Use relative URL in template instead of hardcoded localhost:8181
- fix: Add HTTP timeout to test's waitForService function to avoid hangs
- fix: Use dynamic ephemeral ports in integration tests to avoid flaky parallel failures
- fix: Add Iceberg port to final port configuration logging in mini.go

* fix: Address critical issues in Iceberg implementation

- fix: Cache table UUIDs to ensure persistence across LoadTable calls
  The UUID now remains stable for the lifetime of the server session.
  TODO: For production, UUIDs should be persisted in S3 Tables metadata.

- fix: Remove redundant URL-encoded namespace parsing
  mux router already decodes %1F to \x1F before passing to handlers.
  Redundant ReplaceAll call could cause bugs with literal %1F in namespace.

* fix: Improve test robustness and reduce code duplication

- fix: Make DuckDB test more robust by failing on unexpected errors
  Instead of silently logging errors, now explicitly check for expected
  conditions (extension not available) and skip the test appropriately.

- fix: Extract username helper method to reduce duplication
  Created getUsername() helper in AdminHandlers to avoid duplicating
  the username retrieval logic across Iceberg page handlers.

* fix: Add mutex protection to table UUID cache

Protects concurrent access to the tableUUIDs map with sync.RWMutex.
Uses read-lock for fast path when UUID already cached, and write-lock
for generating new UUIDs. Includes double-check pattern to handle race
condition between read-unlock and write-lock.

* style: fix go fmt errors

* feat(iceberg): persist table UUID in S3 Tables metadata

* feat(admin): configure Iceberg port in Admin UI and commands

* refactor: address review comments (flags, tests, handlers)

- command/mini: fix tracking of explicit s3.port.iceberg flag
- command/admin: add explicit -iceberg.port flag
- admin/handlers: reuse getUsername helper
- tests: use 127.0.0.1 for ephemeral ports and os.Stat for file size check

* test: check error from FileStat in verify_gc_empty_test
2026-02-02 23:12:13 -08:00
Chris Lu
2ee6e4f391 mount: refresh and evict hot dir cache (#8174)
* mount: refresh and evict hot dir cache

* mount: guard dir update window and extend TTL

* mount: reuse timestamp for cache mark

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* mount: make dir cache tuning configurable

* mount: dedupe dir update notices

* mount: restore invalidate-all cache helper

* mount: keep hot dir tuning constants

* mount: centralize cache state reset

* mount: mark refresh completion time

* mount: allow disabling idle eviction

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-31 13:46:37 -08:00
Chris Lu
c106532b79 fix: prevent MiniClusterCtx race conditions in command shutdown
Capture global MiniClusterCtx into local variables before goroutine/select
evaluation to prevent nil dereference/data race when context is reset to nil
after nil check. Applied to filer, master, volume, and s3 commands.
2026-01-28 19:42:16 -08:00
Chris Lu
580c2b4ad4 command: fix stale error variable logging in filer serving goroutines
- Use local 'err' variable instead of stale 'e' from outer scope
- Applied to both TLS and non-TLS paths for local listener
2026-01-28 11:27:18 -08:00
Chris Lu
01c17478ae command: implement graceful shutdown for mini cluster
- Introduce MiniClusterCtx to coordinate shutdown across mini services
- Update Master, Volume, Filer, S3, and WebDAV servers to respect context cancellation
- Ensure all resources are cleaned up properly during test teardown
- Integrate MiniClusterCtx in s3tables integration tests
2026-01-28 10:36:19 -08:00
Chris Lu
6542d1e0aa Enable weed fuse on FreeBSD (#8146)
* Enable weed fuse on FreeBSD

* Update weed/command/fuse_notsupported.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update weed/command/fuse_std.go

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-27 21:37:23 -08:00
MorezMartin
20952aa514 Fix jwt error in admin UI (#8140)
* add jwt token in weed admin headers requests

* add jwt token to header for download

* :s/upload/download

* filer_signing.read despite of filer_signing key

* finalize filer_browser_handlers.go

* admin: add JWT authorization to file browser handlers

* security: fix typos in JWT read validation descriptions

* Move security.toml to example and secure keys

* security: address PR feedback on JWT enforcement and example keys

* security: refactor JWT logic and improve example keys readability

* Update docker/Dockerfile.local

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 17:27:02 -08:00
Chris Lu
551a31e156 Implement IAM propagation to S3 servers (#8130)
* Implement IAM propagation to S3 servers

- Add PropagatingCredentialStore to propagate IAM changes to S3 servers via gRPC
- Add Policy management RPCs to S3 proto and S3ApiServer
- Update CredentialManager to use PropagatingCredentialStore when MasterClient is available
- Wire FilerServer to enable propagation

* Implement parallel IAM propagation and fix S3 cluster registration

- Parallelized IAM change propagation with 10s timeout.
- Refined context usage in PropagatingCredentialStore.
- Added S3Type support to cluster node management.
- Enabled S3 servers to register with gRPC address to the master.
- Ensured IAM configuration reload after policy updates via gRPC.

* Optimize IAM propagation with direct in-memory cache updates

* Secure IAM propagation: Use metadata to skip persistence only on propagation

* pb: refactor IAM and S3 services for unidirectional IAM propagation

- Move SeaweedS3IamCache service from iam.proto to s3.proto.
- Remove legacy IAM management RPCs and empty SeaweedS3 service from s3.proto.
- Enforce that S3 servers only use the synchronization interface.

* pb: regenerate Go code for IAM and S3 services

Updated generated code following the proto refactoring of IAM synchronization services.

* s3api: implement read-only mode for Embedded IAM API

- Add readOnly flag to EmbeddedIamApi to reject write operations via HTTP.
- Enable read-only mode by default in S3ApiServer.
- Handle AccessDenied error in writeIamErrorResponse.
- Embed SeaweedS3IamCacheServer in S3ApiServer.

* credential: refactor PropagatingCredentialStore for unidirectional IAM flow

- Update to use s3_pb.SeaweedS3IamCacheClient for propagation to S3 servers.
- Propagate full Identity object via PutIdentity for consistency.
- Remove redundant propagation of specific user/account/policy management RPCs.
- Add timeout context for propagation calls.

* s3api: implement SeaweedS3IamCacheServer for unidirectional sync

- Update S3ApiServer to implement the cache synchronization gRPC interface.
- Methods (PutIdentity, RemoveIdentity, etc.) now perform direct in-memory cache updates.
- Register SeaweedS3IamCacheServer in command/s3.go.
- Remove registration for the legacy and now empty SeaweedS3 service.

* s3api: update tests for read-only IAM and propagation

- Added TestEmbeddedIamReadOnly to verify rejection of write operations in read-only mode.
- Update test setup to pass readOnly=false to NewEmbeddedIamApi in routing tests.
- Updated EmbeddedIamApiForTest helper with read-only checks matching production behavior.

* s3api: add back temporary debug logs for IAM updates

Log IAM updates received via:
- gRPC propagation (PutIdentity, PutPolicy, etc.)
- Metadata configuration reloads (LoadS3ApiConfigurationFromCredentialManager)
- Core identity management (UpsertIdentity, RemoveIdentity)

* IAM: finalize propagation fix with reduced logging and clarified architecture

* Allow configuring IAM read-only mode for S3 server integration tests

* s3api: add defensive validation to UpsertIdentity

* s3api: fix log message to reference correct IAM read-only flag

* test/s3/iam: ensure WaitForS3Service checks for IAM write permissions

* test: enable writable IAM in Makefile for integration tests

* IAM: add GetPolicy/ListPolicies RPCs to s3.proto

* S3: add GetBucketPolicy and ListBucketPolicies helpers

* S3: support storing generic IAM policies in IdentityAccessManagement

* S3: implement IAM policy RPCs using IdentityAccessManagement

* IAM: fix stale user identity on rename propagation
2026-01-26 22:59:43 -08:00
Chris Lu
5a7c74feac migrate IAM policies to multi-file storage (#8114)
* Add IAM gRPC service definition

- Add GetConfiguration/PutConfiguration for config management
- Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management
- Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management
- Methods mirror existing IAM HTTP API functionality

* Add IAM gRPC handlers on filer server

- Implement IamGrpcServer with CredentialManager integration
- Handle configuration get/put operations
- Handle user CRUD operations
- Handle access key create/delete operations
- All methods delegate to CredentialManager for actual storage

* Wire IAM gRPC service to filer server

- Add CredentialManager field to FilerOption and FilerServer
- Import credential store implementations in filer command
- Initialize CredentialManager from credential.toml if available
- Register IAM gRPC service on filer gRPC server
- Enable credential management via gRPC alongside existing filer services

* Regenerate IAM protobuf with gRPC service methods

* fix: compilation error in DeleteUser

* fix: address code review comments for IAM migration

* feat: migrate policies to multi-file layout and fix identity duplicated content

* refactor: remove configuration.json and migrate Service Accounts to multi-file layout

* refactor: standardize Service Accounts as distinct store entities and fix Admin Server persistence

* config: set ServiceAccountsDirectory to /etc/iam/service_accounts

* Fix Chrome dialog auto-dismiss with Bootstrap modals

- Add modal-alerts.js library with Bootstrap modal replacements
- Replace all 15 confirm() calls with showConfirm/showDeleteConfirm
- Auto-override window.alert() for all alert() calls
- Fixes Chrome 132+ aggressively blocking native dialogs

* Upgrade Bootstrap from 5.3.2 to 5.3.8

* Fix syntax error in object_store_users.templ - remove duplicate closing braces

* create policy

* display errors

* migrate to multi-file policies

* address PR feedback: use showDeleteConfirm and showErrorMessage in policies.templ, refine migration check

* Update policies_templ.go

* add service account to iam grpc

* iam: fix potential path traversal in policy names by validating name pattern

* iam: add GetServiceAccountByAccessKey to CredentialStore interface

* iam: implement service account support for PostgresStore

Includes full CRUD operations and efficient lookup by access key.

* iam: implement GetServiceAccountByAccessKey for filer_etc, grpc, and memory stores

Provides efficient lookup of service accounts by access key where possible,
with linear scan fallbacks for file-based stores.

* iam: remove filer_multiple support

Deleted its implementation and references in imports, scaffold config,
and core interface constants. Redundant with filer_etc.

* clear comment

* dash: robustify service account construction

- Guard against nil sa.Credential when constructing responses
- Fix Expiration logic to only set if > 0, avoiding Unix epoch 1970
- Ensure consistency across Get, Create, and Update handlers

* credential/filer_etc: improve error propagation in configuration handlers

- Return error from loadServiceAccountsFromMultiFile to callers
- Ensure listEntries errors in SaveConfiguration (cleanup logic) are
  propagated unless they are "not found" failures.
- Fixes potential silent failures during IAM configuration sync.

* credential/filer_etc: add existence check to CreateServiceAccount

Ensures consistency with other stores by preventing accidental overwrite
of existing service accounts during creation.

* credential/memory: improve store robustness and Reset logic

- Enforce ID immutability in UpdateServiceAccount to prevent orphans
- Update Reset() to also clear the policies map, ensuring full state
  cleanup for tests.

* dash: improve service account robustness and policy docs

- Wrap parent user lookup errors to preserve context
- Strictly validate Status field in UpdateServiceAccount
- Add deprecation comments to legacy policy management methods

* credential/filer_etc: protect against path traversal in service accounts

Implemented ID validation (alphanumeric, underscores, hyphens) and applied
it to Get, Save, and Delete operations to ensure no directory traversal
via saId.json filenames.

* credential/postgres: improve robustness and cleanup comments

- Removed brainstorming comments in GetServiceAccountByAccessKey
- Added missing rows.Err() check during iteration
- Properly propagate Scan and Unmarshal errors instead of swallowing them

* admin: unify UI alerts and confirmations using Bootstrap modals

- Updated modal-alerts.js with improved automated alert type detection
- Replaced native alert() and confirm() with showAlert(), showConfirm(),
  and showDeleteConfirm() across various Templ components
- Improved UX for delete operations by providing better context and styling
- Ensured consistent error reporting across IAM and Maintenance views

* admin: additional UI consistency fixes for alerts and confirmations

- Replaced native alert() and confirm() with Bootstrap modals in:
  - EC volumes (repair flow)
  - Collection details (repair flow)
  - File browser (properties and delete)
  - Maintenance config schema (save and reset)
- Improved delete confirmation in file browser with item context
- Ensured consistent success/error/info styling for all feedbacks

* make

* iam: add GetServiceAccountByAccessKey RPC and update GetConfiguration

* iam: implement GetServiceAccountByAccessKey on server and client

* iam: centralize policy and service account validation

* iam: optimize MemoryStore service account lookups with indexing

* iam: fix postgres service_accounts table and optimize lookups

* admin: refactor modal alerts and clean up dashboard logic

* admin: fix EC shards table layout mismatch

* admin: URL-encode IAM path parameters for safety

* admin: implement pauseWorker logic in maintenance view

* iam: add rows.Err() check to postgres ListServiceAccounts

* iam: standardize ErrServiceAccountNotFound across credential stores

* iam: map ErrServiceAccountNotFound to codes.NotFound in DeleteServiceAccount

* iam: refine service account store logic, errors and schema

* iam: add validation to GetServiceAccountByAccessKey

* admin: refine modal titles and ensure URL safety

* admin: address bot review comments for alerts and async usage

* iam: fix syntax error by restoring missing function declaration

* [FilerEtcStore] improve error handling in CreateServiceAccount

Refine error handling to provide clearer messages when checking for
existing service accounts.

* [PostgresStore] add nil guards and validation to service account methods

Ensure input parameters are not nil and required IDs are present
to prevent runtime panics and ensure data integrity.

* [JS] add shared IAM utility script

Consolidate common IAM operations like deleteUser and deleteAccessKey
into a shared utility script for better maintainability.

* [View] include shared IAM utilities in layout

Include iam-utils.js in the main layout to make IAM functions
available across all administrative pages.

* [View] refactor IAM logic and restore async in EC Shards view

Remove redundant local IAM functions and ensure that delete
confirmation callbacks are properly marked as async.

* [View] consolidate IAM logic in Object Store Users view

Remove redundant local definitions of deleteUser and deleteAccessKey,
relying on the shared utilities instead.

* [View] update generated templ files for UI consistency

* credential/postgres: remove redundant name column from service_accounts table

The id is already used as the unique identifier and was being copied to the name column.
This removes the name column from the schema and updates the INSERT/UPDATE queries.

* credential/filer_etc: improve logging for policy migration failures

Added Errorf log if AtomicRenameEntry fails during migration to ensure visibility of common failure points.

* credential: allow uppercase characters in service account ID username

Updated ServiceAccountIdPattern to allow [A-Za-z0-9_-]+ for the username component,
matching the actual service account creation logic which uses the parent user name directly.

* Update object_store_users_templ.go

* admin: fix ec_shards pagination to handle numeric page arguments

Updated goToPage in cluster_ec_shards.templ to accept either an Event
or a numeric page argument. This prevents errors when goToPage(1)
is called directly. Corrected both the .templ source and generated Go code.

* credential/filer_etc: improve service account storage robustness

Added nil guard to saveServiceAccount, updated GetServiceAccount
to return ErrServiceAccountNotFound for empty data, and improved
deleteServiceAccount to handle response-level Filer errors.
2026-01-26 11:28:23 -08:00
Chris Lu
6bf088cec9 IAM Policy Management via gRPC (#8109)
* Add IAM gRPC service definition

- Add GetConfiguration/PutConfiguration for config management
- Add CreateUser/GetUser/UpdateUser/DeleteUser/ListUsers for user management
- Add CreateAccessKey/DeleteAccessKey/GetUserByAccessKey for access key management
- Methods mirror existing IAM HTTP API functionality

* Add IAM gRPC handlers on filer server

- Implement IamGrpcServer with CredentialManager integration
- Handle configuration get/put operations
- Handle user CRUD operations
- Handle access key create/delete operations
- All methods delegate to CredentialManager for actual storage

* Wire IAM gRPC service to filer server

- Add CredentialManager field to FilerOption and FilerServer
- Import credential store implementations in filer command
- Initialize CredentialManager from credential.toml if available
- Register IAM gRPC service on filer gRPC server
- Enable credential management via gRPC alongside existing filer services

* Regenerate IAM protobuf with gRPC service methods

* iam_pb: add Policy Management to protobuf definitions

* credential: implement PolicyManager in credential stores

* filer: implement IAM Policy Management RPCs

* shell: add s3.policy command

* test: add integration test for s3.policy

* test: fix compilation errors in policy_test

* pb

* fmt

* test

* weed shell: add -policies flag to s3.configure

This allows linking/unlinking IAM policies to/from identities
directly from the s3.configure command.

* test: verify s3.configure policy linking and fix port allocation

- Added test case for linking policies to users via s3.configure
- Implemented findAvailablePortPair to ensure HTTP and gRPC ports
  are both available, avoiding conflicts with randomized port assignments.
- Updated assertion to match jsonpb output (policyNames)

* credential: add StoreTypeGrpc constant

* credential: add IAM gRPC store boilerplate

* credential: implement identity methods in gRPC store

* credential: implement policy methods in gRPC store

* admin: use gRPC credential store for AdminServer

This ensures that all IAM and policy changes made through the Admin UI
are persisted via the Filer's IAM gRPC service instead of direct file manipulation.

* shell: s3.configure use granular IAM gRPC APIs instead of full config patching

* shell: s3.configure use granular IAM gRPC APIs

* shell: replace deprecated ioutil with os in s3.policy

* filer: use gRPC FailedPrecondition for unconfigured credential manager

* test: improve s3.policy integration tests and fix error checks

* ci: add s3 policy shell integration tests to github workflow

* filer: fix LoadCredentialConfiguration error handling

* credential/grpc: propagate unmarshal errors in GetPolicies

* filer/grpc: improve error handling and validation

* shell: use gRPC status codes in s3.configure

* credential: document PutPolicy as create-or-replace

* credential/postgres: reuse CreatePolicy in PutPolicy to deduplicate logic

* shell: add timeout context and strictly enforce flags in s3.policy

* iam: standardize policy content field naming in gRPC and proto

* shell: extract slice helper functions in s3.configure

* filer: map credential store errors to gRPC status codes

* filer: add input validation for UpdateUser and CreateAccessKey

* iam: improve validation in policy and config handlers

* filer: ensure IAM service registration by defaulting credential manager

* credential: add GetStoreName method to manager

* test: verify policy deletion in integration test
2026-01-25 13:39:30 -08:00
Chris Lu
e559b8df37 Refactor Admin UI to use unified IAM storage and add Shutdown hook 2026-01-23 20:29:21 -08:00
Chris Lu
f6318edbc9 Refactor Admin UI to use unified IAM storage and add MultipleFileStore (#8101)
* Refactor Admin UI to use unified IAM storage and add MultipleFileStore

* Address PR feedback: fix renames, error handling, and sync logic in FilerMultipleStore

* Address refined PR feedback: safe rename order, rollback logic, and structural sync refinement

* Optimize LoadConfiguration: use streaming callback for memory efficiency

* Refactor UpdateUser: log rollback failures during rename

* Implement PolicyManager for FilerMultipleStore

* include the filer_multiple backend configuration

* Implement cross-S3 synchronization and proper shutdown for all IAM backends

* Extract Admin UI refactoring to a separate PR
2026-01-23 20:12:59 -08:00
KyoungYun-K
59dfe047b6 Support for cacheMetaTtlSec option in fuse command (#8063) 2026-01-19 22:52:47 -08:00
Jaehoon Kim
f2e7af257d Fix volume.fsck -forcePurging -reallyDeleteFromVolume to fail fast on filer traversal errors (#8015)
* Add TraverseBfsWithContext and fix race conditions in error handling

- Add TraverseBfsWithContext function to support context cancellation
- Fix race condition in doTraverseBfsAndSaving using atomic.Bool and sync.Once
- Improve error handling with fail-fast behavior and proper error propagation
- Update command_volume_fsck to use error-returning saveFn callback
- Enhance error messages in readFilerFileIdFile with detailed context

* refactoring

* fix error format

* atomic

* filer_pb: make enqueue return void

* shell: simplify fs.meta.save error handling

* filer_pb: handle enqueue return value

* Revert "atomic"

This reverts commit 712648bc354b186d6654fdb8a46fd4848fdc4e00.

* shell: refine fs.meta.save logic

---------

Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 21:37:50 -08:00
Walnuts
691aea84c3 feat: add TLS configuration options for Cassandra2 store (#7998)
* feat: add TLS configuration options for Cassandra2 store

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>

* fix: use 9142 port in tls connection

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>

* Align the setting field names with gocql's SSLOpts.

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>

* Removed: store.cluster.Port = 9142

* chore: update gocql dependency to v2

* refactor: improve Cassandra TLS configuration and port logic

* docs: update filer.toml scaffold with ssl_enable_host_verification

---------

Signed-off-by: walnuts1018 <r.juglans.1018@gmail.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-14 17:59:59 -08:00
Chris Lu
1ea6b0c0d9 cleanup: deduplicate environment variable credential loading
Previously, `weed mini` logic duplicated the credential loading process
by creating a temporary IAM config file from environment variables.
`auth_credentials.go` also had fallback logic to load these variables.

This change:
1. Updates `auth_credentials.go` to *always* check for and merge
   AWS environment variable credentials (`AWS_ACCESS_KEY_ID`, etc.)
   into the identity list. This ensures they are available regardless
   of whether other configurations (static file or filer) are loaded.
2. Removes the redundant file creation logic from `weed/command/mini.go`.
3. Updates `weed mini` user messages to accurately reflect that
   credentials are loaded from environment variables in-memory.

This results in a cleaner implementation where `weed/s3api` manages
all credential loading logic, and `weed mini` simply relies on it.
2026-01-08 20:35:37 -08:00
Chris Lu
bd237999bb weed mini can optionally skip s3 2026-01-08 10:05:42 -08:00
promalert
9012069bd7 chore: execute goimports to format the code (#7983)
* chore: execute goimports to format the code

Signed-off-by: promalert <promalert@outlook.com>

* goimports -w .

---------

Signed-off-by: promalert <promalert@outlook.com>
Co-authored-by: Chris Lu <chris.lu@gmail.com>
2026-01-07 13:06:08 -08:00
Chris Lu
d4ecfaeda7 Enable writeback_cache and async_dio FUSE options (#7980)
* Enable writeback_cache and async_dio FUSE options

Fixes #7978

- Update mount_std.go to use EnableWriteback and EnableAsyncDio from go-fuse
- Add go.mod replace directive to use local go-fuse with capability support
- Remove temporary workaround that disabled these options

This enables proper FUSE kernel capability negotiation for writeback cache
and async direct I/O, improving performance for small writes and concurrent
direct I/O operations.

* Address PR review comments

- Remove redundant nil checks for writebackCache and asyncDio flags
- Update go.mod replace directive to use seaweedfs/go-fuse fork instead of local path

* Add TODO comment for go.mod replace directive

The replace directive must use a local path until seaweedfs/go-fuse#1 is merged.
After merge, this should be updated to use the proper version.

* Use seaweedfs/go-fuse v2.9.0 instead of local repository

Replace local path with seaweedfs/go-fuse v2.9.0 fork which includes
the writeback_cache and async_dio capability support.

* Use github.com/seaweedfs/go-fuse/v2 directly without replace directive

- Updated all imports to use github.com/seaweedfs/go-fuse/v2
- Removed replace directive from go.mod
- Using seaweedfs/go-fuse v2.0.0-20260106181308-87f90219ce09 which includes:
  * writeback_cache and async_dio support
  * Corrected module path

* Update to seaweedfs/go-fuse v2.9.1

Use v2.9.1 tag which includes the corrected module path
(github.com/seaweedfs/go-fuse/v2) along with writeback_cache
and async_dio support.
2026-01-06 10:50:54 -08:00
Chris Lu
e10f11b480 opt: reduce ShardsInfo memory usage with bitmap and sorted slice (#7974)
* opt: reduce ShardsInfo memory usage with bitmap and sorted slice

- Replace map[ShardId]*ShardInfo with sorted []ShardInfo slice
- Add ShardBits (uint32) bitmap for O(1) existence checks
- Use binary search for O(log n) lookups by shard ID
- Maintain sorted order for efficient iteration
- Add comprehensive unit tests and benchmarks

Memory savings:
- Map overhead: ~48 bytes per entry eliminated
- Pointers: 8 bytes per entry eliminated
- Total: ~56 bytes per shard saved

Performance improvements:
- Has(): O(1) using bitmap
- Size(): O(log n) using binary search (was O(1), acceptable tradeoff)
- Count(): O(1) using popcount on bitmap
- Iteration: Faster due to cache locality

* refactor: add methods to ShardBits type

- Add Has(), Set(), Clear(), and Count() methods to ShardBits
- Simplify ShardsInfo methods by using ShardBits methods
- Improves code readability and encapsulation

* opt: use ShardBits directly in ShardsCountFromVolumeEcShardInformationMessage

Avoid creating a full ShardsInfo object just to count shards.
Directly cast vi.EcIndexBits to ShardBits and use Count() method.

* opt: use strings.Builder in ShardsInfo.String() for efficiency

* refactor: change AsSlice to return []ShardInfo (values instead of pointers)

This completes the memory optimization by avoiding unnecessary pointer slices and potential allocations.

* refactor: rename ShardsCountFromVolumeEcShardInformationMessage to GetShardCount

* fix: prevent deadlock in Add and Subtract methods

Copy shards data from 'other' before releasing its lock to avoid
potential deadlock when a.Add(b) and b.Add(a) are called concurrently.

The previous implementation held other's lock while calling si.Set/Delete,
which acquires si's lock. This could deadlock if two goroutines tried to
add/subtract each other concurrently.

* opt: avoid unnecessary locking in constructor functions

ShardsInfoFromVolume and ShardsInfoFromVolumeEcShardInformationMessage
now build shards slice and bitmap directly without calling Set(), which
acquires a lock on every call. Since the object is local and not yet
shared, locking is unnecessary and adds overhead.

This improves performance during object construction.

* fix: rename 'copy' variable to avoid shadowing built-in function

The variable name 'copy' in TestShardsInfo_Copy shadowed the built-in
copy() function, which is confusing and bad practice. Renamed to 'siCopy'.

* opt: use math/bits.OnesCount32 and reorganize types

1. Replace manual popcount loop with math/bits.OnesCount32 for better
   performance and idiomatic Go code
2. Move ShardSize type definition to ec_shards_info.go for better code
   organization since it's primarily used there

* refactor: Set() now accepts ShardInfo for future extensibility

Changed Set(id ShardId, size ShardSize) to Set(shard ShardInfo) to
support future additions to ShardInfo without changing the API.

This makes the code more extensible as new fields can be added to
ShardInfo (e.g., checksum, location, etc.) without breaking the Set API.

* refactor: move ShardInfo and ShardSize to separate file

Created ec_shard_info.go to hold the basic shard types (ShardInfo and
ShardSize) for better code organization and separation of concerns.

* refactor: add ShardInfo constructor and helper functions

Added NewShardInfo() constructor and IsValid() method to better
encapsulate ShardInfo creation and validation. Updated code to use
the constructor for cleaner, more maintainable code.

* fix: update remaining Set() calls to use NewShardInfo constructor

Fixed compilation errors in storage and shell packages where Set() calls
were not updated to use the new NewShardInfo() constructor.

* fix: remove unreachable code in filer backup commands

Removed unreachable return statements after infinite loops in
filer_backup.go and filer_meta_backup.go to fix compilation errors.

* fix: rename 'new' variable to avoid shadowing built-in

Renamed 'new' to 'result' in MinusParityShards, Plus, and Minus methods
to avoid shadowing Go's built-in new() function.

* fix: update remaining test files to use NewShardInfo constructor

Fixed Set() calls in command_volume_list_test.go and
ec_rebalance_slots_test.go to use NewShardInfo() constructor.
2026-01-06 00:09:52 -08:00
Chris Lu
d75162370c Fix trust policy wildcard principal handling (#7970)
* Fix trust policy wildcard principal handling

This change fixes the trust policy validation to properly support
AWS-standard wildcard principals like {"Federated": "*"}.

Previously, the evaluatePrincipalValue() function would check for
context existence before evaluating wildcards, causing wildcard
principals to fail when the context key didn't exist. This forced
users to use the plain "*" workaround instead of the more specific
{"Federated": "*"} format.

Changes:
- Modified evaluatePrincipalValue() to check for "*" FIRST before
  validating against context
- Added support for wildcards in principal arrays
- Added comprehensive tests for wildcard principal handling
- All existing tests continue to pass (no regressions)

This matches AWS IAM behavior where "*" in a principal field means
"allow any value" without requiring context validation.

Fixes: https://github.com/seaweedfs/seaweedfs/issues/7917

* Refactor: Move Principal matching to PolicyEngine

This refactoring consolidates all policy evaluation logic into the
PolicyEngine, improving code organization and eliminating duplication.

Changes:
- Added matchesPrincipal() and evaluatePrincipalValue() to PolicyEngine
- Added EvaluateTrustPolicy() method for direct trust policy evaluation
- Updated statementMatches() to check Principal field when present
- Made resource matching optional (trust policies don't have Resources)
- Simplified evaluateTrustPolicy() in iam_manager.go to delegate to PolicyEngine
- Removed ~170 lines of duplicate code from iam_manager.go

Benefits:
- Single source of truth for all policy evaluation
- Better code reusability and maintainability
- Consistent evaluation rules for all policy types
- Easier to test and debug

All tests pass with no regressions.

* Make PolicyEngine AWS-compatible and add unit tests

Changes:
1. AWS-Compatible Context Keys:
   - Changed "seaweed:FederatedProvider" -> "aws:FederatedProvider"
   - Changed "seaweed:AWSPrincipal" -> "aws:PrincipalArn"
   - Changed "seaweed:ServicePrincipal" -> "aws:PrincipalServiceName"
   - This ensures 100% AWS compatibility for trust policies

2. Added Comprehensive Unit Tests:
   - TestPrincipalMatching: 8 test cases for Principal matching
   - TestEvaluatePrincipalValue: 7 test cases for value evaluation
   - TestTrustPolicyEvaluation: 6 test cases for trust policy evaluation
   - TestGetPrincipalContextKey: 4 test cases for context key mapping
   - Total: 25 new unit tests for PolicyEngine

All tests pass:
- Policy engine tests: 54 passed
- Integration tests: 9 passed
- Total: 63 tests passing

* Update context keys to standard AWS/OIDC formats

Replaced remaining seaweed: context keys with standard AWS and OIDC
keys to ensure 100% compatibility with AWS IAM policies.

Mappings:
- seaweed:TokenIssuer -> oidc:iss
- seaweed:Issuer -> oidc:iss
- seaweed:Subject -> oidc:sub
- seaweed:SourceIP -> aws:SourceIp

Also updated unit tests to reflect these changes.

All 63 tests pass successfully.

* Add advanced policy tests for variable substitution and conditions

Added comprehensive tests inspired by AWS IAM patterns:
- TestPolicyVariableSubstitution: Tests ${oidc:sub} variable in resources
- TestConditionWithNumericComparison: Tests sts:DurationSeconds condition
- TestMultipleConditionOperators: Tests combining StringEquals and StringLike

Results:
- TestMultipleConditionOperators:  All 3 subtests pass
- Other tests reveal need for sts:DurationSeconds context population

These tests validate the PolicyEngine's ability to handle complex
AWS-compatible policy scenarios.

* Fix federated provider context and add DurationSeconds support

Changes:
- Use iss claim as aws:FederatedProvider (AWS standard)
- Add sts:DurationSeconds to trust policy evaluation context
- TestPolicyVariableSubstitution now passes 

Remaining work:
- TestConditionWithNumericComparison partially works (1/3 pass)
- Need to investigate NumericLessThanEquals evaluation

* Update trust policies to use issuer URL for AWS compatibility

Changed trust policy from using provider name ("test-oidc") to
using the issuer URL ("https://test-issuer.com") to match AWS
standard behavior where aws:FederatedProvider contains the OIDC
issuer URL.

Test Results:
- 10/12 test suites passing
- TestFullOIDCWorkflow:  All subtests pass
- TestPolicyEnforcement:  All subtests pass
- TestSessionExpiration:  Pass
- TestPolicyVariableSubstitution:  Pass
- TestMultipleConditionOperators:  All subtests pass

Remaining work:
- TestConditionWithNumericComparison needs investigation
- One subtest in TestTrustPolicyValidation needs fix

* Fix S3 API tests for AWS compatibility

Updated all S3 API tests to use AWS-compatible context keys and
trust policy principals:

Changes:
- seaweed:SourceIP → aws:SourceIp (IP-based conditions)
- Federated: "test-oidc" → "https://test-issuer.com" (trust policies)

Test Results:
- TestS3EndToEndWithJWT:  All 13 subtests pass
- TestIPBasedPolicyEnforcement:  All 3 subtests pass

This ensures policies are 100% AWS-compatible and portable.

* Fix ValidateTrustPolicy for AWS compatibility

Updated ValidateTrustPolicy method to check for:
- OIDC: issuer URL ("https://test-issuer.com")
- LDAP: provider name ("test-ldap")
- Wildcard: "*"

Test Results:
- TestTrustPolicyValidation:  All 3 subtests pass

This ensures trust policy validation uses the same AWS-compatible
principals as the PolicyEngine.

* Fix multipart and presigned URL tests for AWS compatibility

Updated trust policies in:
- s3_multipart_iam_test.go
- s3_presigned_url_iam_test.go

Changed "Federated": "test-oidc" → "https://test-issuer.com"

Test Results:
- TestMultipartIAMValidation:  All 7 subtests pass
- TestPresignedURLIAMValidation:  All 4 subtests pass
- TestPresignedURLGeneration:  All 4 subtests pass
- TestPresignedURLExpiration:  All 4 subtests pass
- TestPresignedURLSecurityPolicy:  All 4 subtests pass

All S3 API tests now use AWS-compatible trust policies.

* Fix numeric condition evaluation and trust policy validation interface

Major updates to ensure robust AWS-compatible policy evaluation:
1.  **Policy Engine**: Added support for `int` and `int64` types in `evaluateNumericCondition`, fixing issues where raw numbers in policy documents caused evaluation failures.
2.  **Trust Policy Validation**: Updated `TrustPolicyValidator` interface and `STSService` to propagate `DurationSeconds` correctly during the double-validation flow (Validation -> STS -> Validation callback).
3.  **IAM Manager**: Updated implementation to match the new interface and correctly pass `sts:DurationSeconds` context key.

Test Results:
- TestConditionWithNumericComparison:  All 3 subtests pass
- All IAM and S3 integration tests pass (100%)

This resolves the final edge case with DurationSeconds numeric conditions.

* Fix MockTrustPolicyValidator interface and unreachable code warnings

Updates:
1. Updated MockTrustPolicyValidator.ValidateTrustPolicyForWebIdentity to match new interface signature with durationSeconds parameter
2. Removed unreachable code after infinite loops in filer_backup.go and filer_meta_backup.go to satisfy linter

Test Results:
- All STS tests pass 
- Build warnings resolved 

* Refactor matchesPrincipal to consolidate array handling logic

Consolidated duplicated logic for []interface{} and []string types by converting them to a unified []interface{} upfront.

* Fix malformed AWS docs URL in iam_manager.go comment

* dup

* Enhance IAM integration tests with negative cases and interface array support

Added test cases to TestTrustPolicyWildcardPrincipal to:
1. Verify rejection of roles when principal context does not match (negative test)
2. Verify support for principal arrays as []interface{} (simulating JSON unmarshaled roles)

* Fix syntax errors in filer_backup and filer_meta_backup

Restored missing closing braces for for-loops and re-added return statements.
The previous attempt to remove unreachable code accidentally broke the function structure.
Build now passes successfully.
2026-01-05 15:55:24 -08:00
Chris Lu
d15f32ae46 feat: add flags to disable WebDAV and Admin UI in weed mini (#7971)
* feat: add flags to disable WebDAV and Admin UI in weed mini

- Add -webdav flag (default: true) to optionally disable WebDAV server
- Add -admin.ui flag (default: true) to optionally disable Admin UI only (server still runs)
- Conditionally skip WebDAV service startup based on flag
- Pass disableUI flag to SetupRoutes to skip UI route registration
- Admin server still runs for gRPC and API access when UI is disabled

Addresses issue from https://github.com/seaweedfs/seaweedfs/pull/7833#issuecomment-3711924150

* refactor: use positive enableUI parameter instead of disableUI across admin server and handlers

* docs: update mini welcome message to list enabled components

* chore: remove unused welcomeMessageTemplate constant

* docs: split S3 credential message into separate sb.WriteString calls
2026-01-05 13:10:11 -08:00
Chris Lu
8269dc136d simplify 2026-01-04 11:26:21 -08:00
Taylor Jasko
6a9860098f fix: correcting S3 nil cipher dereference in filer init (#7952)
Resolves the following error reported in #7949:
```
I0103 21:38:30.230662 s3.go:275 Starting S3 API Server with standard IAM
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x38ca961]

goroutine 102 [running]:
github.com/seaweedfs/seaweedfs/weed/command.(*S3Options).startS3Server(0x7caf840)
    /go/src/github.com/seaweedfs/seaweedfs/weed/command/s3.go:295 +0x741
github.com/seaweedfs/seaweedfs/weed/command.runFiler.func1(...)
    /go/src/github.com/seaweedfs/seaweedfs/weed/command/filer.go:244
created by github.com/seaweedfs/seaweedfs/weed/command.runFiler in goroutine 1
    /go/src/github.com/seaweedfs/seaweedfs/weed/command/filer.go:242 +0x353
```
2026-01-03 18:53:38 -08:00
Chris Lu
25975bacfb fix(gcs): resolve credential conflict and improve backup logging (#7951)
* fix(gcs): resolve credential conflict and improve backup logging

- Workaround GCS SDK's "multiple credential options" error by manually constructing an authenticated HTTP client.
- Include source entry path in filer backup error logs for better visibility on missing volumes/404s.

* fix: address PR review feedback

- Add nil check for EventNotification in getSourceKey
- Avoid reassigning google_application_credentials parameter in gcs_sink.go

* fix(gcs): return errors instead of calling glog.Fatalf in initialize

Adheres to Go best practices and allows for more graceful failure handling by callers.

* read from bind ip
2026-01-03 14:41:25 -08:00
Chris Lu
b97d17f79f Standardize -ip.bind flags to default to empty and fall back to -ip (#7945)
* Add documentation for issue #7941 fix

* rm FIX_ISSUE_7941.md

* Standardize -ip.bind flags to default to empty string and fall back to -ip option

- Change s3 command -ip.bind default logic to use -ip instead of localhost
- Change sftp command -ip.bind default to empty and fall back to 0.0.0.0
- Update help text for consistency

* Fix compilation error: add -ip flag to s3 command and update bindIp fallback

* Revert -ip flag addition for s3 command, set bindIp fallback to 0.0.0.0

* Update s3 command -ip.bind help text to reflect correct default behavior
2026-01-02 18:23:17 -08:00