seaweedFS

Author	SHA1	Message	Date
Chris Lu	78a3441b30	fix: volume balance detection returns multiple tasks per run (#8559 ) * fix: volume balance detection now returns multiple tasks per run (#8551) Previously, detectForDiskType() returned at most 1 balance task per disk type, making the MaxJobsPerDetection setting ineffective. The detection loop now iterates within each disk type, planning multiple moves until the imbalance drops below threshold or maxResults is reached. Effective volume counts are adjusted after each planned move so the algorithm correctly re-evaluates which server is overloaded. * fix: factor pending tasks into destination scoring and use UnixNano for task IDs - Use UnixNano instead of Unix for task IDs to avoid collisions when multiple tasks are created within the same second - Adjust calculateBalanceScore to include LoadCount (pending + assigned tasks) in the utilization estimate, so the destination picker avoids stacking multiple planned moves onto the same target disk * test: add comprehensive balance detection tests for complex scenarios Cover multi-server convergence, max-server shifting, destination spreading, pre-existing pending task skipping, no-duplicate-volume invariant, and parameterized convergence verification across different cluster shapes and thresholds. * fix: address PR review findings in balance detection - hasMore flag: compute from len(results) >= maxResults so the scheduler knows more pages may exist, matching vacuum/EC handler pattern - Exhausted server fallthrough: when no eligible volumes remain on the current maxServer (all have pending tasks) or destination planning fails, mark the server as exhausted and continue to the next overloaded server instead of stopping the entire detection loop - Return canonical destination server ID directly from createBalanceTask instead of resolving via findServerIDByAddress, eliminating the fragile address→ID lookup for adjustment tracking - Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks with negative scores (high pending load, same rack/DC) are still selected as the best available destination - Add TestDetection_ExhaustedServerFallsThrough covering the scenario where the top server's volumes are all blocked by pre-existing tasks * test: fix computeEffectiveCounts and add len guard in no-duplicate test - computeEffectiveCounts now takes a servers slice to seed counts for all known servers (including empty ones) and uses an address→ID map from the topology spec instead of scanning metrics, so destination servers with zero initial volumes are tracked correctly - TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1 before checking duplicates, so the test actually fails if Detection regresses to returning a single task * fix: remove redundant HasAnyTask check in createBalanceTask The HasAnyTask check in createBalanceTask duplicated the same check already performed in detectForDiskType's volume selection loop. Since detection runs single-threaded (MaxDetectionConcurrency: 1), no race can occur between the two points. * fix: consistent hasMore pattern and remove double-counted LoadCount in scoring - Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check len > maxResults, and truncate — consistent truncation semantics - Remove direct LoadCount penalty in calculateBalanceScore since LoadCount is already factored into effectiveVolumeCount for utilization scoring; bump utilization weight from 40 to 50 to compensate for the removed 10-point load penalty * fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers - When MaxResults is 0 (omitted), treat as no explicit cap instead of defaulting to 1; only apply the +1 over-fetch probe when caller supplies a positive limit - Move decision trace emission after hasMore/trim so the trace accurately reflects the returned proposals - Seed serverVolumeCounts from ActiveTopology so servers that have a matching disk type but zero volumes are included in the imbalance calculation and MinServerCount check * fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order - Add early nil guard for clusterInfo in Detection to prevent panics in downstream helpers (detectForDiskType, createBalanceTask) - Change register.go DetectionFunc wrapper from maxResults=1 to 0 (no cap) so the legacy code path returns all detected tasks - Sort disk type keys before iteration so results are deterministic when maxResults spans multiple disk types (HDD/SSD) * fix: don't over-fetch in stateful detection to avoid orphaned pending tasks Detection registers planned moves in ActiveTopology via AddPendingTask, so requesting maxResults+1 would create an extra pending task that gets discarded during trim. Use len(results) >= maxResults as the hasMore signal instead, which is correct since Detection already caps internally. * fix: return explicit truncated flag from Detection instead of approximating Detection now returns (results, truncated, error) where truncated is true only when the loop stopped because it hit maxResults, not when it ran out of work naturally. This eliminates false hasMore signals when detection happens to produce exactly maxResults results by resolving the imbalance. * cleanup: simplify detection logic and remove redundancies - Remove redundant clusterInfo nil check in detectForDiskType since Detection already guards against nil clusterInfo - Remove adjustments loop for destination servers not in serverVolumeCounts — topology seeding ensures all servers with matching disk type are already present - Merge two-loop min/max calculation into a single loop: min across all servers, max only among non-exhausted servers - Replace magic number 100 with len(metrics) for minC initialization in convergence test * fix: accurate truncation flag, deterministic server order, indexed volume lookup - Track balanced flag to distinguish "hit maxResults cap" from "cluster balanced at exactly maxResults" — truncated is only true when there's genuinely more work to do - Sort servers for deterministic iteration and tie-breaking when multiple servers have equal volume counts - Pre-index volumes by server with per-server cursors to avoid O(maxResults * volumes) rescanning on each iteration - Add truncation flag assertions to RespectsMaxResults test: true when capped, false when detection finishes naturally * fix: seed trace server counts from ActiveTopology to match detection logic The decision trace was building serverVolumeCounts only from metrics, missing zero-volume servers seeded from ActiveTopology by Detection. This could cause the trace to report wrong server counts, incorrect imbalance ratios, or spurious "too few servers" messages. Pass activeTopology into the trace function and seed server counts the same way Detection does. * fix: don't exhaust server on per-volume planning failure, sort volumes by ID - When createBalanceTask returns nil, continue to the next volume on the same server instead of marking the entire server as exhausted. The failure may be volume-specific (not found in topology, pending task registration failed) and other volumes on the server may still be viable candidates. - Sort each server's volume slice by VolumeID after pre-indexing so volume selection is fully deterministic regardless of input order. * fix: use require instead of assert to prevent nil dereference panic in CORS test The test used assert.NoError (non-fatal) for GetBucketCors, then immediately accessed getResp.CORSRules. When the API returns an error, getResp is nil causing a panic. Switch to require.NoError/NotNil/Len so the test stops before dereferencing a nil response. * fix: deterministic disk tie-breaking and stronger pre-existing task test - Sort available disks by NodeID then DiskID before scoring so destination selection is deterministic when two disks score equally - Add task count bounds assertion to SkipsPreExistingPendingTasks test: with 15 of 20 volumes already having pending tasks, at most 5 new tasks should be created and at least 1 (imbalance still exists) * fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling Detection now calls ActiveTopology.GetTaskServerAdjustments() to initialize the adjustments map with source/destination deltas from existing pending and assigned balance tasks. This ensures effectiveCounts reflects in-flight moves, preventing the algorithm from planning additional moves in the same direction when prior moves already address the imbalance. Added GetTaskServerAdjustments(taskType) to ActiveTopology which iterates pending and assigned tasks, decrementing source servers and incrementing destination servers for the given task type.	2026-03-08 21:34:03 -07:00
Chris Lu	0d8588e3ae	S3: Implement IAM defaults and STS signing key fallback (#8348 ) * S3: Implement IAM defaults and STS signing key fallback logic * S3: Refactor startup order to init SSE-S3 key manager before IAM * S3: Derive STS signing key from KEK using HKDF for security isolation * S3: Document STS signing key fallback in security.toml * fix(s3api): refine anonymous access logic and secure-by-default behavior - Initialize anonymous identity by default in `NewIdentityAccessManagement` to prevent nil pointer exceptions. - Ensure `ReplaceS3ApiConfiguration` preserves the anonymous identity if not present in the new configuration. - Update `NewIdentityAccessManagement` signature to accept `filerClient`. - In legacy mode (no policy engine), anonymous defaults to Deny (no actions), preserving secure-by-default behavior. - Use specific `LookupAnonymous` method instead of generic map lookup. - Update tests to accommodate signature changes and verify improved anonymous handling. * feat(s3api): make IAM configuration optional - Start S3 API server without a configuration file if `EnableIam` option is set. - Default to `Allow` effect for policy engine when no configuration is provided (Zero-Config mode). - Handle empty configuration path gracefully in `loadIAMManagerFromConfig`. - Add integration test `iam_optional_test.go` to verify empty config behavior. * fix(iamapi): fix signature mismatch in NewIdentityAccessManagementWithStore * fix(iamapi): properly initialize FilerClient instead of passing nil * fix(iamapi): properly initialize filer client for IAM management - Instead of passing `nil`, construct a `wdclient.FilerClient` using the provided `Filers` addresses. - Ensure `NewIdentityAccessManagementWithStore` receives a valid `filerClient` to avoid potential nil pointer dereferences or limited functionality. * clean: remove dead code in s3api_server.go * refactor(s3api): improve IAM initialization, safety and anonymous access security * fix(s3api): ensure IAM config loads from filer after client init * fix(s3): resolve test failures in integration, CORS, and tagging tests - Fix CORS tests by providing explicit anonymous permissions config - Fix S3 integration tests by setting admin credentials in init - Align tagging test credentials in CI with IAM defaults - Added goroutine to retry IAM config load in iamapi server * fix(s3): allow anonymous access to health targets and S3 Tables when identities are present * fix(ci): use /healthz for Caddy health check in awscli tests * iam, s3api: expose DefaultAllow from IAM and Policy Engine This allows checking the global "Open by Default" configuration from other components like S3 Tables. * s3api/s3tables: support DefaultAllow in permission logic and handler Updated CheckPermissionWithContext to respect the DefaultAllow flag in PolicyContext. This enables "Open by Default" behavior for unauthenticated access in zero-config environments. Added a targeted unit test to verify the logic. * s3api/s3tables: propagate DefaultAllow through handlers Propagated the DefaultAllow flag to individual handlers for namespaces, buckets, tables, policies, and tagging. This ensures consistent "Open by Default" behavior across all S3 Tables API endpoints. * s3api: wire up DefaultAllow for S3 Tables API initialization Updated registerS3TablesRoutes to query the global IAM configuration and set the DefaultAllow flag on the S3 Tables API server. This completes the end-to-end propagation required for anonymous access in zero-config environments. Added a SetDefaultAllow method to S3TablesApiServer to facilitate this. * s3api: fix tests by adding DefaultAllow to mock IAM integrations The IAMIntegration interface was updated to include DefaultAllow(), breaking several mock implementations in tests. This commit fixes the build errors by adding the missing method to the mocks. * env * ensure ports * env * env * fix default allow * add one more test using non-anonymous user * debug * add more debug * less logs	2026-02-16 13:59:13 -08:00
Chris Lu	7064ad420d	Refactor S3 integration tests to use weed mini (#7877 ) * Refactor S3 integration tests to use weed mini * Fix weed mini flags for sse and parquet tests * Fix IAM test startup: remove -iam.config flag from weed mini * Enhance logging in IAM Makefile to debug startup failure * Simplify weed mini flags and checks in S3 tests (IAM, Parquet, SSE, Copying) * Simplify weed mini flags and checks in all S3 tests * Fix IAM tests: use -s3.iam.config for weed mini * Replace timeout command with portable loop in IAM Makefile * Standardize portable loop-based readiness checks in all S3 Makefiles * Define SERVER_DIR in retention Makefile * Fix versioning and retention Makefiles: remove unsupported weed mini flags * fix filer_group test * fix cors * emojis * fix sse * fix retention * fixes * fix * fixes * fix parquet * fixes * fix * clean up * avoid duplicated debug server * Update .gitignore * simplify * clean up * add credentials * bind * delay * Update Makefile * Update Makefile * check ready * delay * update remote credentials * Update Makefile * clean up * kill * Update Makefile * update credentials	2025-12-25 11:00:54 -08:00
chrislu	5167bbd2a9	Remove deprecated allowEmptyFolder CLI option The allowEmptyFolder option is no longer functional because: 1. The code that used it was already commented out 2. Empty folder cleanup is now handled asynchronously by EmptyFolderCleaner The CLI flags are kept for backward compatibility but marked as deprecated and ignored. This removes: - S3ApiServerOption.AllowEmptyFolder field - The actual usage in s3api_object_handlers_list.go - Helm chart values and template references - References in test Makefiles and docker-compose files	2025-12-06 21:54:12 -08:00
Chris Lu	eaecd8328b	S3: add fallback for CORS (#7404 ) * add fallback for cors * refactor * expose aws headers * add fallback to test * refactor * Only falls back to global config when there's explicitly no bucket-level config. * fmt * Update s3_cors_http_test.go * refactoring	2025-10-29 13:43:27 -07:00
Chris Lu	33b9017b48	fix listing objects (#7008 ) * fix listing objects * add more list testing * address comments * fix next marker * fix isTruncated in listing * fix tests * address tests * Update s3api_object_handlers_multipart.go * fixes * store json into bucket content, for tagging and cors * switch bucket metadata from json to proto * fix * Update s3api_bucket_config.go * fix test issue * fix test_bucket_listv2_delimiter_prefix * Update cors.go * skip special characters * passing listing * fix test_bucket_list_delimiter_prefix * ok. fix the xsd generated go code now * fix cors tests * fix test * fix test_bucket_list_unordered and test_bucket_listv2_unordered do not accept the allow-unordered and delimiter parameter combination * fix test_bucket_list_objects_anonymous and test_bucket_listv2_objects_anonymous The tests test_bucket_list_objects_anonymous and test_bucket_listv2_objects_anonymous were failing because they try to set bucket ACL to public-read, but SeaweedFS only supported private ACL. Updated PutBucketAclHandler to use the existing ExtractAcl function which already supports all standard S3 canned ACLs Replaced the hardcoded check for only private ACL with proper ACL parsing that handles public-read, public-read-write, authenticated-read, bucket-owner-read, bucket-owner-full-control, etc. Added unit tests to verify all standard canned ACLs are accepted * fix list unordered The test is expecting the error code to be InvalidArgument instead of InvalidRequest * allow anonymous listing( and head, get) * fix test_bucket_list_maxkeys_invalid Invalid values: max-keys=blah → Returns ErrInvalidMaxKeys (HTTP 400) * updating IsPublicRead when parsing acl * more logs * CORS Test Fix * fix test_bucket_list_return_data * default to private * fix test_bucket_list_delimiter_not_skip_special * default no acl * add debug logging * more logs * use basic http client remove logs also * fixes * debug * Update stats.go * debugging * fix anonymous test expectation anonymous user can read, as configured in s3 json.	2025-07-22 01:07:15 -07:00
Chris Lu	a524b4f485	Object locking need to persist the tags and set the headers (#6994 ) * fix object locking read and write No logic to include object lock metadata in HEAD/GET response headers No logic to extract object lock metadata from PUT request headers * add tests for object locking * Update weed/s3api/s3api_object_handlers_put.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_object_handlers.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * refactor * add unit tests * sync versions * Update s3_worm_integration_test.go * fix legal hold values * lint * fix tests * racing condition when enable versioning * fix tests * validate put object lock header * allow check lock permissions for PUT * default to OFF legal hold * only set object lock headers for objects that are actually from object lock-enabled buckets fix --- FAIL: TestAddObjectLockHeadersToResponse/Handle_entry_with_no_object_lock_metadata (0.00s) * address comments * fix tests * purge * fix * refactoring * address comment * address comment * Update weed/s3api/s3api_object_handlers_put.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_object_handlers_put.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_object_handlers.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * avoid nil * ensure locked objects cannot be overwritten --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-16 23:00:25 -07:00
Chris Lu	4b040e8a87	adding cors support (#6987 ) * adding cors support * address some comments * optimize matchesWildcard * address comments * fix for tests * address comments * address comments * address comments * path building * refactor * Update weed/s3api/s3api_bucket_config.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * address comment Service-level responses need both Access-Control-Allow-Methods and Access-Control-Allow-Headers. After setting Access-Control-Allow-Origin and Access-Control-Expose-Headers, also set Access-Control-Allow-Methods: * and Access-Control-Allow-Headers: * so service endpoints satisfy CORS preflight requirements. * Update weed/s3api/s3api_bucket_config.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_object_handlers.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_object_handlers.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix * refactor * Update weed/s3api/s3api_bucket_config.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_object_handlers.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update weed/s3api/s3api_server.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * simplify * add cors tests * fix tests * fix tests --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-15 00:23:54 -07:00

8 Commits