Go to file

Chris Lu 31cb28d9d3 feat: auto-configure optimal volume size limit based on available disk space (#7833 )

* feat: auto-configure optimal volume size limit based on available disk space

- Add calculateOptimalVolumeSizeMB() function with OS-independent disk detection
- Reuses existing stats.NewDiskStatus() which works across Linux, macOS, Windows, BSD, Solaris
- Algorithm: available disk / 100, rounded up to nearest power of 2 (64MB, 128MB, 256MB, 512MB, 1024MB)
- Volume size capped to maximum of 1GB (1024MB) for better stability
- Minimum volume size is 64MB
- Uses efficient bits.Len() for power-of-2 rounding instead of floating-point operations
- Only auto-calculates volume size if user didn't specify a custom value via -master.volumeSizeLimitMB
- Respects user-specified values without override
- Master logs whether value was auto-calculated or user-specified
- Welcome message displays the configured volume size with correct format string ordering
- Removed unused autoVolumeSizeMB variable (logging handles source tracking)

Fixes: #0

* Refactor: Consolidate volume size constants and use robust flag detection for mini mode

This commit addresses all code review feedback on the auto-optimal volume size feature:

1. **Consolidate hardcoded defaults into package-level constants**
   - Moved minVolumeSizeMB=64 and maxVolumeSizeMB=1024 from local function-scope
     constants to package-level constants for consistency and maintainability
   - All three volume size constants (min, default, max) now defined in one place

2. **Implement robust flag detection using flag.Visit()**
   - Added isFlagPassed() helper function using flag.Visit() to check if a CLI
     flag was explicitly passed on the command line
   - Replaces the previous implementation that checked if current value equals
     default (which could incorrectly assume user intent if default was specified)
   - Now correctly detects user override regardless of the actual value

3. **Restructure power-of-2 rounding logic for clarity**
   - Changed from 'only round if above min threshold' to 'always round to power-of-2
     first, then apply min/max constraints'
   - More robust: works correctly even if min/max constants are adjusted in future
   - Clearer intent: all non-zero values go through consistent rounding logic

4. **Fix import ordering**
   - Added 'flag' import (aliased to fla9 package) to support isFlagPassed()
   - Added 'math/bits' import to support power-of-2 rounding

Benefits:
- Better code organization with all volume size limits in package constants
- Correct user override detection that doesn't rely on value equality checks
- More maintainable rounding logic that's easier to understand and modify
- Consistent with SeaweedFS conventions (uses fla9 package like other commands)

* fix: Address code review feedback for volume size calculation

This commit resolves three code review comments for better code quality and robustness:

1. **Handle comma-separated directories in -dir flag**
   - The -dir flag accepts comma-separated list of directories, but the volume size
     calculation was passing the entire string to util.ResolvePath()
   - Now splits on comma and uses the first directory for disk space calculation
   - Added explanatory comment about the multi-directory support
   - Ensures the optimal size calculation works correctly in all scenarios

2. **Change disk detection failure from verbose log to warning**
   - When disk status cannot be determined, the warning is now logged via
     glog.Warningf() instead of glog.V(1).Infof()
   - Makes the event visible in default logs without requiring verbose mode
   - Better alerting for operators about fallback to default values

3. **Avoid recalculating availableMB/100 and define bytesPerMB constant**
   - Added bytesPerMB = 1024*1024 constant for clarity and reusability
   - Replaced hardcoded (1024 * 1024) with bytesPerMB constant
   - Store availableMB/100 in initialOptimalMB variable to avoid recalculation
   - Log message now references initialOptimalMB instead of recalculating
   - Improves maintainability and reduces redundant computation

All three changes maintain the same logic while improving code quality and
robustness as requested by the reviewer.

* fix: Address rounding logic, logging clarity, and disk capacity measurement issues

This commit resolves three additional code review comments to improve robustness
and clarity of the volume size calculation:

1. **Fix power-of-2 rounding logic for edge cases**
   - The previous condition 'if optimalMB > 0' created a bug: when optimalMB=1,
     bits.Len(0)=0, resulting in 1<<0=1, which is below minimum (64MB)
   - Changed to explicitly handle zero case first: 'if optimalMB == 0'
   - Separate zero-handling from power-of-2 rounding ensures correct behavior:
     * optimalMB=0 → set to minVolumeSizeMB (64)
     * optimalMB>=1 → apply power-of-2 rounding
   - Then apply min/max constraints unconditionally
   - More explicit and easier to reason about correctness

2. **Use total disk capacity instead of free space for stable configuration**
   - Changed from diskStatus.Free (available space) to diskStatus.All (total capacity)
   - Free space varies based on current disk usage at startup time
   - This caused inconsistent volume sizes: same disk could get different sizes
     depending on how full it is when the service starts
   - Using total capacity ensures predictable, stable configuration across restarts
   - Better aligns with the intended behavior of sizing based on disk capacity
   - Added explanatory comments about why total capacity is more appropriate

3. **Improve log message clarity and accuracy**
   - Updated message to clearly show:
     * 'total disk capacity' instead of vague 'available disk'
     * 'capacity/100 before rounding' to match actual calculation
     * 'clamped to [min,max]' instead of 'capped to max' to show both bounds
     * Includes min and max values in log for context
   - More accurate and helpful for operators troubleshooting volume sizing

These changes ensure the volume size calculation is both correct and predictable.

* feat: Save mini configuration to file for persistence and documentation

This commit adds persistent configuration storage for the 'weed mini' command,
saving all non-default parameters to a JSON configuration file for:

1. **Configuration Documentation**
   - All parameters actually passed on the command line are saved
   - Provides a clear record of the running configuration
   - Useful for auditing and understanding how the system is configured

2. **Persistence of Auto-Calculated Values**
   - The auto-calculated optimal volume size (master.volumeSizeLimitMB) is saved
     with a note indicating it was auto-calculated
   - On restart, if the auto-calculated value exists, it won't be recalculated
   - Users can delete the auto-calculated entry to force recalculation on next startup
   - Provides stable, predictable configuration across restarts

3. **Configuration File Location**
   - Saved to: <data-folder>/.seaweedfs/mini.config.json
   - Uses the first directory from comma-separated -dir list
   - Directory is created automatically if it doesn't exist
   - JSON format for easy parsing and manual editing

4. **Implementation Details**
   - Uses flag.Visit() to collect only explicitly passed flags
   - Distinguishes between user-specified and auto-calculated values
   - Includes helpful notes in the JSON file
   - Graceful handling of save errors (logs warnings, doesn't fail startup)

The configuration file includes all parameters such as:
- IP and port settings (master, filer, volume, admin)
- Data directories and metadata folders
- Replication and collection settings
- S3 and IAM configurations
- Performance tuning parameters (concurrency limits, timeouts, etc.)
- Auto-calculated volume size (if applicable)

Example mini.config.json output:
{
  "debug": "true",
  "dir": "/data/seaweedfs",
  "master.port": "9333",
  "filer.port": "8888",
  "volume.port": "9340",
  "master.volumeSizeLimitMB.auto": "256",
  "_note_auto_calculated": "This value was auto-calculated. Remove it to recalculate on next startup."
}

This allows operators to:
- Review what configuration was active
- Replicate the configuration on other systems
- Understand the startup behavior
- Control when auto-calculation occurs

* refactor: Change configuration file format to match command-line options format

Update the saved configuration format from JSON to shell-compatible options format
that matches how options are expected to be passed on the command line.

Configuration file: .seaweedfs/mini.options

Format: Each line contains a command-line option in the format -name=value

Benefits:
- Format is compatible with shell scripts and can be sourced
- Can be easily converted to command-line options
- Human-readable and editable
- Values with spaces are properly quoted
- Includes helpful comments explaining auto-calculated values
- Directly usable with weed mini command

The file can be used in multiple ways:
1. Extract options: cat .seaweedfs/mini.options | grep -v '^#' | tr '\n' ' '
2. Inline in command: weed mini \$(cat .seaweedfs/mini.options | grep -v '^#')
3. Manual review: cat .seaweedfs/mini.options

* refactor: Save mini.options directly to -dir folder

* docs: Update PR description with accurate algorithm and examples

Update the function documentation comments to accurately reflect the implemented
algorithm and provide real-world examples with actual calculated outputs.

Changes:
- Clarify that algorithm uses total disk capacity (not free space)
- Document exact calculation: capacity/100, round to power of 2, clamp to [64,1024]
- Add realistic examples showing input disk sizes and resulting volume sizes:
  * 10GB disk → 64MB (minimum)
  * 100GB disk → 64MB (minimum)
  * 1TB disk → 64MB (minimum)
  * 6.4TB disk → 64MB
  * 12.8TB disk → 128MB
  * 100TB disk → 1024MB (maximum)
  * 1PB disk → 1024MB (maximum)
- Include note that values are rounded to next power of 2 and capped at 1GB

This helps users understand the volume size calculation and predict what size
will be set for their specific disk configurations.

* feat: integrate configuration file loading into mini startup

- Load mini.options file at startup if it exists
- Apply loaded configuration options before normal initialization
- CLI flags override file-based configuration
- Exclude 'dir' option from being saved (environment-specific)
- Configuration file format: option=value without leading dashes
- Auto-calculated volume size persists with recalculation marker

2025-12-21 12:47:27 -08:00

.github

fmt

2025-12-19 15:33:16 -08:00

docker

docker: add curl for HTTPS healthcheck support (#7709 )

2025-12-10 12:54:20 -08:00

k8s/charts

Fix worker and admin ca (#7807 )

2025-12-17 12:51:45 -08:00

note

Correct gopher on SVG logo (#5833 )

2024-07-29 09:13:41 -07:00

other

feat: add S3 bucket size and object count metrics (#7776 )

2025-12-15 19:23:25 -08:00

postgres-examples

Message Queue: Add sql querying (#7185 )

2025-09-09 01:01:03 -07:00

seaweedfs-rdma-sidecar

fix: add missing backslash for volume extraArgs in helm chart (#7676 )

2025-12-08 23:21:02 -08:00

snap

move to https://github.com/seaweedfs/seaweedfs

2022-07-29 00:17:28 -07:00

telemetry

Add Kafka Gateway (#7231 )

2025-10-13 18:05:17 -07:00

test

fmt

2025-12-19 15:33:16 -08:00

unmaintained

Migrate from deprecated azure-storage-blob-go to modern Azure SDK (#7310 )

2025-10-08 23:12:03 -07:00

util

util: added gostd script

2019-04-30 03:23:20 +00:00

weed

feat: auto-configure optimal volume size limit based on available disk space (#7833 )

2025-12-21 12:47:27 -08:00

.gitignore

fix: EC rebalance fails with replica placement 000 (#7812 )

2025-12-19 13:29:12 -08:00

backers.md

chore: add nimbus web services to backers.md (#4769 )

2023-08-20 15:31:23 -07:00

BUCKET_POLICY_ENGINE_INTEGRATION.md

S3: Enforce bucket policy (#7471 )

2025-11-12 22:14:50 -08:00

CODE_OF_CONDUCT.md

add code of conduct (#4109 )

2023-01-05 11:01:22 -08:00

DESIGN.md

Admin: misc improvements on admin server and workers. EC now works. (#7055 )

2025-07-30 12:38:03 -07:00

go.mod

Upgrade raft to v1.1.6 to fix panic on log compaction (#7811 )

2025-12-17 13:41:49 -08:00

go.sum

Upgrade raft to v1.1.6 to fix panic on log compaction (#7811 )

2025-12-17 13:41:49 -08:00

LICENSE

Update LICENSE, fix copyright license year (#6405 )

2025-01-01 01:55:42 -08:00

Makefile

Remove deprecated allowEmptyFolder CLI option

2025-12-06 21:54:12 -08:00

README.md

fix: add missing backslash for volume extraArgs in helm chart (#7676 )

2025-12-08 23:21:02 -08:00

SQL_FEATURE_PLAN.md

Message Queue: Add sql querying (#7185 )

2025-09-09 01:01:03 -07:00

SSE-C_IMPLEMENTATION.md

S3 API: Add SSE-KMS (#7144 )

2025-08-21 08:28:07 -07:00

SeaweedFS

SeaweedFS is an independent Apache-licensed open source project with its ongoing development made possible entirely thanks to the support of these awesome backers. If you'd like to grow SeaweedFS even stronger, please consider joining our sponsors on Patreon.

Your support will be really appreciated by me and other supporters!

Quick Start

Quick Start for S3 API on Docker

docker run -p 8333:8333 chrislusf/seaweedfs server -s3

Quick Start with Single Binary

Download the latest binary from https://github.com/seaweedfs/seaweedfs/releases and unzip a single binary file weed or weed.exe. Or run go install github.com/seaweedfs/seaweedfs/weed@latest.
export AWS_ACCESS_KEY_ID=admin ; export AWS_SECRET_ACCESS_KEY=key as the admin credentials to access the object store.
Run weed server -dir=/some/data/dir -s3 to start one master, one volume server, one filer, and one S3 gateway.

Also, to increase capacity, just add more volume servers by running weed volume -dir="/some/data/dir2" -master="<master_host>:9333" -port=8081 locally, or on a different machine, or on thousands of machines. That is it!

Quick Start SeaweedFS S3 on AWS

Setup fast production-ready SeaweedFS S3 on AWS with cloudformation

Introduction

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives:

to store billions of files!
to serve the files fast!

SeaweedFS started as an Object Store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation).

There is only 40 bytes of disk storage overhead for each file's metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases.

SeaweedFS started by implementing Facebook's Haystack design paper. Also, SeaweedFS implements erasure coding with ideas from f4: Facebook’s Warm BLOB Storage System, and has a lot of similarities with Facebook’s Tectonic Filesystem

On top of the object store, optional Filer can support directories and POSIX attributes. Filer is a separate linearly-scalable stateless server with customizable metadata stores, e.g., MySql, Postgres, Redis, Cassandra, HBase, Mongodb, Elastic Search, LevelDB, RocksDB, Sqlite, MemSql, TiDB, Etcd, CockroachDB, YDB, etc.

For any distributed key value stores, the large values can be offloaded to SeaweedFS. With the fast access speed and linearly scalable capacity, SeaweedFS can work as a distributed Key-Large-Value store.

SeaweedFS can transparently integrate with the cloud. With hot data on local cluster, and warm data on the cloud with O(1) access time, SeaweedFS can achieve both fast local access time and elastic cloud storage capacity. What's more, the cloud storage access API cost is minimized. Faster and cheaper than direct cloud storage!

System	File Metadata	File Content Read	POSIX	REST API	Optimized for large number of small files
SeaweedFS	lookup volume id, cacheable	O(1) disk seek		Yes	Yes
SeaweedFS Filer	Linearly Scalable, Customizable	O(1) disk seek	FUSE	Yes	Yes
GlusterFS	hashing		FUSE, NFS
Ceph	hashing + rules		FUSE	Yes
MooseFS	in memory		FUSE		No
MinIO	separate meta file for each file			Yes	No

SeaweedFS	comparable to Ceph	advantage
Master	MDS	simpler
Volume	OSD	optimized for small files
Filer	Ceph FS	linearly scalable, Customizable, O(1) or O(logN)

README.md Unescape Escape

SeaweedFS

Sponsor SeaweedFS via Patreon

Gold Sponsors

Table of Contents

Quick Start

Quick Start for S3 API on Docker

Quick Start with Single Binary

Quick Start SeaweedFS S3 on AWS

Introduction

Features

Additional Features

Filer Features

Kubernetes

Example: Using Seaweed Object Store

Start Master Server

Start Volume Servers

Write File

Save File Id

Read File

Rack-Aware and Data Center-Aware Replication

Allocate File Key on Specific Data Center

Other Features

Object Store Architecture

Master Server and Volume Server

Write and Read files

Storage Size

Saving memory

Tiered Storage to the cloud

Compared to Other File Systems

Compared to HDFS

Compared to GlusterFS, Ceph

Compared to GlusterFS

Compared to MooseFS

Compared to Ceph

Compared to MinIO

Dev Plan

Installation Guide

Disk Related Topics

Hard Drive Performance

Solid State Disk

Benchmark

Run WARP and launch a mixed benchmark.

Enterprise

License

Stargazers over time

README.md