filer: improve FoundationDB performance by disabling batch by default (#7770)

* filer: improve FoundationDB performance by disabling batch by default

This PR addresses a performance issue where FoundationDB filer was achieving
only ~757 ops/sec with 12 concurrent S3 clients, despite FDB being capable
of 17,000+ ops/sec.

Root cause: The write batcher was waiting up to 5ms for each operation to
batch, even though S3 semantics require waiting for durability confirmation.
This added artificial latency that defeated the purpose of batching.

Changes:
- Disable write batching by default (batch_enabled = false)
- Each write now commits immediately in its own transaction
- Reduce batch interval from 5ms to 1ms when batching is enabled
- Add batch_enabled config option to toggle behavior
- Improve batcher to collect available ops without blocking
- Add benchmarks comparing batch vs no-batch performance

Benchmark results (16 concurrent goroutines):
- With batch:    2,924 ops/sec (342,032 ns/op)
- Without batch: 4,625 ops/sec (216,219 ns/op)
- Improvement:   +58% faster

Configuration:
- Default: batch_enabled = false (optimal for S3 PUT latency)
- For bulk ingestion: set batch_enabled = true

Also fixes ARM64 Docker test setup (shell compatibility, fdbserver path).

* fix: address review comments - use atomic counter and remove duplicate batcher

- Use sync/atomic.Uint64 for unique filenames in concurrent benchmarks
- Remove duplicate batcher creation in createBenchmarkStoreWithBatching
  (initialize() already creates batcher when batchEnabled=true)

* fix: add realistic default values to benchmark store helper

Set directoryPrefix, timeout, and maxRetryDelay to reasonable defaults
for more realistic benchmark conditions.
This commit is contained in:
Chris Lu
2025-12-15 13:03:34 -08:00
committed by GitHub
parent 44beb42eb9
commit 5a03b5538f
11 changed files with 321 additions and 36 deletions

View File

@@ -5,13 +5,14 @@ ARG FOUNDATIONDB_VERSION=7.4.5
ENV FOUNDATIONDB_VERSION=${FOUNDATIONDB_VERSION}
# Install build dependencies and download prebuilt FoundationDB clients
RUN apt-get update && apt-get install -y \
SHELL ["/bin/bash", "-c"]
RUN set -euo pipefail && \
apt-get update && apt-get install -y \
build-essential \
git \
wget \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* && \
set -euo pipefail && \
case "${FOUNDATIONDB_VERSION}" in \
"7.4.5") EXPECTED_SHA256="f2176b86b7e1b561c3632b4e6e7efb82e3b8f57c2ff0d0ac4671e742867508aa" ;; \
*) echo "ERROR: No known ARM64 client checksum for FoundationDB ${FOUNDATIONDB_VERSION}. Please update this Dockerfile." >&2; exit 1 ;; \

View File

@@ -15,6 +15,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*
# Install FoundationDB server + client debs with checksum verification
SHELL ["/bin/bash", "-c"]
RUN set -euo pipefail && \
apt-get update && \
case "${FOUNDATIONDB_VERSION}" in \

View File

@@ -50,6 +50,24 @@ test-benchmark: ## Run performance benchmarks
@echo "$(YELLOW)Running FoundationDB performance benchmarks...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb -bench=. ./test/foundationdb/...
test-benchmark-filer: ## Run filer store benchmarks (batch vs no-batch comparison)
@echo "$(YELLOW)Running FoundationDB filer store benchmarks...$(NC)"
@echo "$(BLUE)Comparing batched vs non-batched write performance$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb \
-bench='BenchmarkFoundationDBStore_.*' \
-benchmem \
-benchtime=5s \
./weed/filer/foundationdb/...
test-benchmark-concurrent: ## Run concurrent operation benchmarks
@echo "$(YELLOW)Running concurrent operation benchmarks...$(NC)"
@cd ../../ && go test -v -timeout=$(TEST_TIMEOUT) -tags foundationdb \
-bench='BenchmarkFoundationDBStore_Concurrent.*' \
-benchmem \
-benchtime=10s \
-cpu=1,2,4,8 \
./weed/filer/foundationdb/...
# ARM64 specific targets (Apple Silicon / M1/M2/M3 Macs)
setup-arm64: ## Set up ARM64-native FoundationDB cluster (builds from source)
@echo "$(YELLOW)Setting up ARM64-native FoundationDB cluster...$(NC)"

View File

@@ -353,12 +353,31 @@ The tests are designed to be reliable in CI environments with:
Run performance benchmarks:
```bash
make test-benchmark
make test-benchmark # Run all benchmarks
make test-benchmark-filer # Filer store benchmarks with batch comparison
make test-benchmark-concurrent # Concurrent operation benchmarks (varies CPU count)
# Sample expected results:
# BenchmarkFoundationDBStore_InsertEntry-8 1000 1.2ms per op
# BenchmarkFoundationDBStore_FindEntry-8 5000 0.5ms per op
# BenchmarkFoundationDBStore_KvOperations-8 2000 0.8ms per op
# BenchmarkFoundationDBStore_InsertEntry-8 1000 1.2ms per op
# BenchmarkFoundationDBStore_FindEntry-8 5000 0.5ms per op
# BenchmarkFoundationDBStore_KvOperations-8 2000 0.8ms per op
# BenchmarkFoundationDBStore_InsertEntry_NoBatch-8 1000 1.0ms per op (optimal for S3)
# BenchmarkFoundationDBStore_InsertEntry_WithBatch-8 1200 0.9ms per op (bulk ingestion)
# BenchmarkFoundationDBStore_ConcurrentInsert_*-8 5000 0.3ms per op (parallel writes)
```
### Batch vs Non-Batch Performance
The FoundationDB filer store supports two write modes:
| Mode | Config | Best For | Latency | Throughput |
|------|--------|----------|---------|------------|
| **Direct Commit** (default) | `batch_enabled = false` | S3 API, low-latency workloads | ~1-5ms per op | Good |
| **Batched** | `batch_enabled = true` | Bulk ingestion, high throughput | Variable | Higher |
Run the comparison benchmark:
```bash
make test-benchmark-filer
```
## Contributing

View File

@@ -28,8 +28,8 @@ services:
echo 'testing:testing@fdb1:4500,fdb2:4502,fdb3:4504' > /var/fdb/config/fdb.cluster
fi
# Start FDB processes
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4501 --listen_address=0.0.0.0:4501 --coordination=fdb1:4500 &
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4500 --listen_address=0.0.0.0:4500 --coordination=fdb1:4500 --class=coordination &
/usr/sbin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4501 --listen_address=0.0.0.0:4501 --coordination=fdb1:4500 &
/usr/sbin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb1:4500 --listen_address=0.0.0.0:4500 --coordination=fdb1:4500 --class=coordination &
wait
"
@@ -59,8 +59,8 @@ services:
# Wait for cluster file from fdb1
while [ ! -f /var/fdb/config/fdb.cluster ]; do sleep 1; done
# Start FDB processes
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb2:4503 --listen_address=0.0.0.0:4503 --coordination=fdb1:4500 &
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb2:4502 --listen_address=0.0.0.0:4502 --coordination=fdb1:4500 --class=coordination &
/usr/sbin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb2:4503 --listen_address=0.0.0.0:4503 --coordination=fdb1:4500 &
/usr/sbin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb2:4502 --listen_address=0.0.0.0:4502 --coordination=fdb1:4500 --class=coordination &
wait
"
@@ -90,8 +90,8 @@ services:
# Wait for cluster file from fdb1
while [ ! -f /var/fdb/config/fdb.cluster ]; do sleep 1; done
# Start FDB processes
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb3:4505 --listen_address=0.0.0.0:4505 --coordination=fdb1:4500 &
/usr/bin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb3:4504 --listen_address=0.0.0.0:4504 --coordination=fdb1:4500 --class=coordination &
/usr/sbin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb3:4505 --listen_address=0.0.0.0:4505 --coordination=fdb1:4500 &
/usr/sbin/fdbserver --config_path=/var/fdb/config --datadir=/var/fdb/data --logdir=/var/fdb/logs --public_address=fdb3:4504 --listen_address=0.0.0.0:4504 --coordination=fdb1:4500 --class=coordination &
wait
"

View File

@@ -1,19 +1,8 @@
# FoundationDB Filer Configuration
# FoundationDB Filer Configuration for Testing
[foundationdb]
enabled = true
cluster_file = "/var/fdb/config/fdb.cluster"
api_version = 740
timeout = "5s"
max_retry_delay = "1s"
directory_prefix = "seaweedfs"
# For testing different configurations
[foundationdb.test]
enabled = false
cluster_file = "/var/fdb/config/fdb.cluster"
api_version = 740
timeout = "10s"
max_retry_delay = "2s"
directory_prefix = "seaweedfs_test"
location = "/test"
# api_version = 740
# timeout = "5s"
# directory_prefix = "seaweedfs"