Files

Chris Lu 78a3441b30 fix: volume balance detection returns multiple tasks per run (#8559 )

* fix: volume balance detection now returns multiple tasks per run (#8551)

Previously, detectForDiskType() returned at most 1 balance task per disk
type, making the MaxJobsPerDetection setting ineffective. The detection
loop now iterates within each disk type, planning multiple moves until
the imbalance drops below threshold or maxResults is reached. Effective
volume counts are adjusted after each planned move so the algorithm
correctly re-evaluates which server is overloaded.

* fix: factor pending tasks into destination scoring and use UnixNano for task IDs

- Use UnixNano instead of Unix for task IDs to avoid collisions when
  multiple tasks are created within the same second
- Adjust calculateBalanceScore to include LoadCount (pending + assigned
  tasks) in the utilization estimate, so the destination picker avoids
  stacking multiple planned moves onto the same target disk

* test: add comprehensive balance detection tests for complex scenarios

Cover multi-server convergence, max-server shifting, destination
spreading, pre-existing pending task skipping, no-duplicate-volume
invariant, and parameterized convergence verification across different
cluster shapes and thresholds.

* fix: address PR review findings in balance detection

- hasMore flag: compute from len(results) >= maxResults so the scheduler
  knows more pages may exist, matching vacuum/EC handler pattern
- Exhausted server fallthrough: when no eligible volumes remain on the
  current maxServer (all have pending tasks) or destination planning
  fails, mark the server as exhausted and continue to the next
  overloaded server instead of stopping the entire detection loop
- Return canonical destination server ID directly from createBalanceTask
  instead of resolving via findServerIDByAddress, eliminating the
  fragile address→ID lookup for adjustment tracking
- Fix bestScore sentinel: use math.Inf(-1) instead of -1.0 so disks
  with negative scores (high pending load, same rack/DC) are still
  selected as the best available destination
- Add TestDetection_ExhaustedServerFallsThrough covering the scenario
  where the top server's volumes are all blocked by pre-existing tasks

* test: fix computeEffectiveCounts and add len guard in no-duplicate test

- computeEffectiveCounts now takes a servers slice to seed counts for all
  known servers (including empty ones) and uses an address→ID map from
  the topology spec instead of scanning metrics, so destination servers
  with zero initial volumes are tracked correctly
- TestDetection_NoDuplicateVolumesAcrossIterations now asserts len > 1
  before checking duplicates, so the test actually fails if Detection
  regresses to returning a single task

* fix: remove redundant HasAnyTask check in createBalanceTask

The HasAnyTask check in createBalanceTask duplicated the same check
already performed in detectForDiskType's volume selection loop.
Since detection runs single-threaded (MaxDetectionConcurrency: 1),
no race can occur between the two points.

* fix: consistent hasMore pattern and remove double-counted LoadCount in scoring

- Adopt vacuum_handler's hasMore pattern: over-fetch by 1, check
  len > maxResults, and truncate — consistent truncation semantics
- Remove direct LoadCount penalty in calculateBalanceScore since
  LoadCount is already factored into effectiveVolumeCount for
  utilization scoring; bump utilization weight from 40 to 50 to
  compensate for the removed 10-point load penalty

* fix: handle zero maxResults as no-cap, emit trace after trim, seed empty servers

- When MaxResults is 0 (omitted), treat as no explicit cap instead of
  defaulting to 1; only apply the +1 over-fetch probe when caller
  supplies a positive limit
- Move decision trace emission after hasMore/trim so the trace
  accurately reflects the returned proposals
- Seed serverVolumeCounts from ActiveTopology so servers that have a
  matching disk type but zero volumes are included in the imbalance
  calculation and MinServerCount check

* fix: nil-guard clusterInfo, uncap legacy DetectionFunc, deterministic disk type order

- Add early nil guard for clusterInfo in Detection to prevent panics
  in downstream helpers (detectForDiskType, createBalanceTask)
- Change register.go DetectionFunc wrapper from maxResults=1 to 0
  (no cap) so the legacy code path returns all detected tasks
- Sort disk type keys before iteration so results are deterministic
  when maxResults spans multiple disk types (HDD/SSD)

* fix: don't over-fetch in stateful detection to avoid orphaned pending tasks

Detection registers planned moves in ActiveTopology via AddPendingTask,
so requesting maxResults+1 would create an extra pending task that gets
discarded during trim. Use len(results) >= maxResults as the hasMore
signal instead, which is correct since Detection already caps internally.

* fix: return explicit truncated flag from Detection instead of approximating

Detection now returns (results, truncated, error) where truncated is true
only when the loop stopped because it hit maxResults, not when it ran out
of work naturally. This eliminates false hasMore signals when detection
happens to produce exactly maxResults results by resolving the imbalance.

* cleanup: simplify detection logic and remove redundancies

- Remove redundant clusterInfo nil check in detectForDiskType since
  Detection already guards against nil clusterInfo
- Remove adjustments loop for destination servers not in
  serverVolumeCounts — topology seeding ensures all servers with
  matching disk type are already present
- Merge two-loop min/max calculation into a single loop: min across
  all servers, max only among non-exhausted servers
- Replace magic number 100 with len(metrics) for minC initialization
  in convergence test

* fix: accurate truncation flag, deterministic server order, indexed volume lookup

- Track balanced flag to distinguish "hit maxResults cap" from "cluster
  balanced at exactly maxResults" — truncated is only true when there's
  genuinely more work to do
- Sort servers for deterministic iteration and tie-breaking when
  multiple servers have equal volume counts
- Pre-index volumes by server with per-server cursors to avoid
  O(maxResults * volumes) rescanning on each iteration
- Add truncation flag assertions to RespectsMaxResults test: true when
  capped, false when detection finishes naturally

* fix: seed trace server counts from ActiveTopology to match detection logic

The decision trace was building serverVolumeCounts only from metrics,
missing zero-volume servers seeded from ActiveTopology by Detection.
This could cause the trace to report wrong server counts, incorrect
imbalance ratios, or spurious "too few servers" messages. Pass
activeTopology into the trace function and seed server counts the
same way Detection does.

* fix: don't exhaust server on per-volume planning failure, sort volumes by ID

- When createBalanceTask returns nil, continue to the next volume on
  the same server instead of marking the entire server as exhausted.
  The failure may be volume-specific (not found in topology, pending
  task registration failed) and other volumes on the server may still
  be viable candidates.
- Sort each server's volume slice by VolumeID after pre-indexing so
  volume selection is fully deterministic regardless of input order.

* fix: use require instead of assert to prevent nil dereference panic in CORS test

The test used assert.NoError (non-fatal) for GetBucketCors, then
immediately accessed getResp.CORSRules. When the API returns an error,
getResp is nil causing a panic. Switch to require.NoError/NotNil/Len
so the test stops before dereferencing a nil response.

* fix: deterministic disk tie-breaking and stronger pre-existing task test

- Sort available disks by NodeID then DiskID before scoring so
  destination selection is deterministic when two disks score equally
- Add task count bounds assertion to SkipsPreExistingPendingTasks test:
  with 15 of 20 volumes already having pending tasks, at most 5 new
  tasks should be created and at least 1 (imbalance still exists)

* fix: seed adjustments from existing pending/assigned tasks to prevent over-scheduling

Detection now calls ActiveTopology.GetTaskServerAdjustments() to
initialize the adjustments map with source/destination deltas from
existing pending and assigned balance tasks. This ensures
effectiveCounts reflects in-flight moves, preventing the algorithm
from planning additional moves in the same direction when prior
moves already address the imbalance.

Added GetTaskServerAdjustments(taskType) to ActiveTopology which
iterates pending and assigned tasks, decrementing source servers
and incrementing destination servers for the given task type.

2026-03-08 21:34:03 -07:00

Makefile

S3: Implement IAM defaults and STS signing key fallback (#8348 )

2026-02-16 13:59:13 -08:00

README.md

adding cors support (#6987 )

2025-07-15 00:23:54 -07:00

s3_cors_http_test.go

S3: add fallback for CORS (#7404 )

2025-10-29 13:43:27 -07:00

s3_cors_test.go

fix: volume balance detection returns multiple tasks per run (#8559 )

2026-03-08 21:34:03 -07:00

s3_test_config.json

S3: Implement IAM defaults and STS signing key fallback (#8348 )

2026-02-16 13:59:13 -08:00

README.md

CORS Integration Tests for SeaweedFS S3 API

This directory contains comprehensive integration tests for the CORS (Cross-Origin Resource Sharing) functionality in SeaweedFS S3 API.

Overview

The CORS integration tests validate the complete CORS implementation including:

CORS configuration management (PUT/GET/DELETE)
CORS rule validation
CORS middleware behavior
Caching functionality
Error handling
Real-world CORS scenarios

Prerequisites

Go 1.19+: For building SeaweedFS and running tests
Network Access: Tests use localhost:8333 by default
System Dependencies: curl and netstat for health checks

Quick Start

The tests now automatically start their own SeaweedFS server, so you don't need to manually start one.

1. Run All Tests with Managed Server

# Run all tests with automatic server management
make test-with-server

# Run core CORS tests only
make test-cors-quick

# Run comprehensive CORS tests
make test-cors-comprehensive

2. Manual Server Management

If you prefer to manage the server manually:

# Start server
make start-server

# Run tests (assuming server is running)
make test-cors-simple

# Stop server
make stop-server

3. Individual Test Categories

# Run specific test types
make test-basic-cors           # Basic CORS configuration
make test-preflight-cors       # Preflight OPTIONS requests
make test-actual-cors          # Actual CORS request handling
make test-origin-matching      # Origin matching logic
make test-header-matching      # Header matching logic
make test-method-matching      # Method matching logic
make test-multiple-rules       # Multiple CORS rules
make test-validation           # CORS validation
make test-caching              # CORS caching behavior
make test-error-handling       # Error handling

Test Server Management

The tests use a comprehensive server management system similar to other SeaweedFS integration tests:

Server Configuration

S3 Port: 8333 (configurable via S3_PORT)
Master Port: 9333
Volume Port: 8080
Filer Port: 8888
Metrics Port: 9324
Data Directory: ./test-volume-data (auto-created)
Log File: weed-test.log

Server Lifecycle

Build: Automatically builds ../../../weed/weed_binary
Start: Launches SeaweedFS with S3 API enabled
Health Check: Waits up to 90 seconds for server to be ready
Test: Runs the requested tests
Stop: Gracefully shuts down the server
Cleanup: Removes temporary files and data

Available Commands

# Server management
make start-server              # Start SeaweedFS server
make stop-server               # Stop SeaweedFS server
make health-check              # Check server health
make logs                      # View server logs

# Test execution
make test-with-server          # Full test cycle with server management
make test-cors-simple          # Run tests without server management
make test-cors-quick           # Run core tests only
make test-cors-comprehensive   # Run all tests

# Development
make dev-start                 # Start server for development
make dev-test                  # Run development tests
make build-weed                # Build SeaweedFS binary
make check-deps                # Check dependencies

# Maintenance
make clean                     # Clean up all artifacts
make coverage                  # Generate coverage report
make fmt                       # Format code
make lint                      # Run linter

Test Configuration

Default Configuration

The tests use these default settings (configurable via environment variables):

WEED_BINARY=../../../weed/weed_binary
S3_PORT=8333
TEST_TIMEOUT=10m
TEST_PATTERN=TestCORS

Configuration File

The test_config.json file contains S3 client configuration:

{
  "endpoint": "http://localhost:8333",
  "access_key": "some_access_key1",
  "secret_key": "some_secret_key1",
  "region": "us-east-1",
  "bucket_prefix": "test-cors-",
  "use_ssl": false,
  "skip_verify_ssl": true
}

Troubleshooting

Compilation Issues

If you encounter compilation errors, the most common issues are:

AWS SDK v2 Type Mismatches: The MaxAgeSeconds field in types.CORSRule expects int32, not *int32. Use direct values like 3600 instead of aws.Int32(3600).
Field Name Issues: The GetBucketCorsOutput type has a CORSRules field directly, not a CORSConfiguration field.

Example fix:

// ❌ Incorrect
MaxAgeSeconds: aws.Int32(3600),
assert.Len(t, getResp.CORSConfiguration.CORSRules, 1)

// ✅ Correct
MaxAgeSeconds: 3600,
assert.Len(t, getResp.CORSRules, 1)

Server Issues

Server Won't Start

# Check for port conflicts
netstat -tlnp | grep 8333

# View server logs
make logs

# Force cleanup
make clean

Test Failures

# Run with server management
make test-with-server

# Run specific test
make test-basic-cors

# Check server health
make health-check

Connection Issues

# Verify server is running
curl -s http://localhost:8333

# Check server logs
tail -f weed-test.log

Performance Issues

If tests are slow or timing out:

# Increase timeout
export TEST_TIMEOUT=30m
make test-with-server

# Run quick tests only
make test-cors-quick

# Check server resources
make debug-status

Test Coverage

Core Functionality Tests

1. CORS Configuration Management (`TestCORSConfigurationManagement`)

PUT CORS configuration
GET CORS configuration
DELETE CORS configuration
Configuration updates
Error handling for non-existent configurations

2. Multiple CORS Rules (`TestCORSMultipleRules`)

Multiple rules in single configuration
Rule precedence and ordering
Complex rule combinations

3. CORS Validation (`TestCORSValidation`)

Invalid HTTP methods
Empty origins validation
Negative MaxAge validation
Rule limit validation

4. Wildcard Support (`TestCORSWithWildcards`)

Wildcard origins (*, https://*.example.com)
Wildcard headers (*)
Wildcard expose headers

5. Rule Limits (`TestCORSRuleLimit`)

Maximum 100 rules per configuration
Rule limit enforcement
Large configuration handling

6. Error Handling (`TestCORSErrorHandling`)

Non-existent bucket operations
Invalid configurations
Malformed requests

HTTP-Level Tests

1. Preflight Requests (`TestCORSPreflightRequest`)

OPTIONS request handling
CORS headers in preflight responses
Access-Control-Request-Method validation
Access-Control-Request-Headers validation

2. Actual Requests (`TestCORSActualRequest`)

CORS headers in actual responses
Origin validation for real requests
Proper expose headers handling

3. Origin Matching (`TestCORSOriginMatching`)

Exact origin matching
Wildcard origin matching (*)
Subdomain wildcard matching (https://*.example.com)
Non-matching origins (should be rejected)

4. Header Matching (`TestCORSHeaderMatching`)

Wildcard header matching (*)
Specific header matching
Case-insensitive matching
Disallowed headers

5. Method Matching (`TestCORSMethodMatching`)

Allowed methods verification
Disallowed methods rejection
Method-specific CORS behavior

6. Multiple Rules (`TestCORSMultipleRulesMatching`)

Rule precedence and selection
Multiple rules with different configurations
Complex rule interactions

Integration Tests

1. Caching (`TestCORSCaching`)

CORS configuration caching
Cache invalidation
Cache performance

2. Object Operations (`TestCORSObjectOperations`)

CORS with actual S3 operations
PUT/GET/DELETE objects with CORS
CORS headers in object responses

3. Without Configuration (`TestCORSWithoutConfiguration`)

Behavior when no CORS configuration exists
Default CORS behavior
Graceful degradation

Development

Running Tests During Development

# Start server for development
make dev-start

# Run quick test
make dev-test

# View logs in real-time
make logs

Adding New Tests

Follow the existing naming convention (TestCORSXxxYyy)
Use the helper functions (getS3Client, createTestBucket, etc.)
Add cleanup with defer cleanupTestBucket(t, client, bucketName)
Include proper error checking with require.NoError(t, err)
Use assertions with assert.Equal(t, expected, actual)
Add the test to the appropriate Makefile target

Code Quality

# Format code
make fmt

# Run linter
make lint

# Generate coverage report
make coverage

Performance Notes

Tests create and destroy buckets for each test case
Large configuration tests may take several minutes
Server startup typically takes 15-30 seconds
Tests run in parallel where possible for efficiency

Integration with SeaweedFS

These tests validate the CORS implementation in:

weed/s3api/cors/ - Core CORS package
weed/s3api/s3api_bucket_cors_handlers.go - HTTP handlers
weed/s3api/s3api_server.go - Router integration
weed/s3api/s3api_bucket_config.go - Configuration management

The tests ensure AWS S3 API compatibility and proper CORS behavior across all supported scenarios.

README.md

CORS Integration Tests for SeaweedFS S3 API

Overview

Prerequisites

Quick Start

1. Run All Tests with Managed Server

2. Manual Server Management

3. Individual Test Categories

Test Server Management

Server Configuration

Server Lifecycle

Available Commands

Test Configuration

Default Configuration

Configuration File

Troubleshooting

Compilation Issues

Server Issues

Performance Issues

Test Coverage

Core Functionality Tests

1. CORS Configuration Management (TestCORSConfigurationManagement)

2. Multiple CORS Rules (TestCORSMultipleRules)

3. CORS Validation (TestCORSValidation)

4. Wildcard Support (TestCORSWithWildcards)

5. Rule Limits (TestCORSRuleLimit)

6. Error Handling (TestCORSErrorHandling)

HTTP-Level Tests

1. Preflight Requests (TestCORSPreflightRequest)

2. Actual Requests (TestCORSActualRequest)

3. Origin Matching (TestCORSOriginMatching)

4. Header Matching (TestCORSHeaderMatching)

5. Method Matching (TestCORSMethodMatching)

6. Multiple Rules (TestCORSMultipleRulesMatching)

Integration Tests

1. Caching (TestCORSCaching)

2. Object Operations (TestCORSObjectOperations)

3. Without Configuration (TestCORSWithoutConfiguration)

Development

Running Tests During Development

Adding New Tests

Code Quality

Performance Notes

Integration with SeaweedFS

1. CORS Configuration Management (`TestCORSConfigurationManagement`)

2. Multiple CORS Rules (`TestCORSMultipleRules`)

3. CORS Validation (`TestCORSValidation`)

4. Wildcard Support (`TestCORSWithWildcards`)

5. Rule Limits (`TestCORSRuleLimit`)

6. Error Handling (`TestCORSErrorHandling`)

1. Preflight Requests (`TestCORSPreflightRequest`)

2. Actual Requests (`TestCORSActualRequest`)

3. Origin Matching (`TestCORSOriginMatching`)

4. Header Matching (`TestCORSHeaderMatching`)

5. Method Matching (`TestCORSMethodMatching`)

6. Multiple Rules (`TestCORSMultipleRulesMatching`)

1. Caching (`TestCORSCaching`)

2. Object Operations (`TestCORSObjectOperations`)

3. Without Configuration (`TestCORSWithoutConfiguration`)