add telemetry (#6926)
* add telemetry * fix go mod * add default telemetry server url * Update README.md * replace with broker count instead of s3 count * Update telemetry.pb.go * github action to deploy
This commit is contained in:
351
telemetry/README.md
Normal file
351
telemetry/README.md
Normal file
@@ -0,0 +1,351 @@
|
||||
# SeaweedFS Telemetry System
|
||||
|
||||
A privacy-respecting telemetry system for SeaweedFS that collects cluster-level usage statistics and provides visualization through Prometheus and Grafana.
|
||||
|
||||
## Features
|
||||
|
||||
- **Privacy-First Design**: Uses in-memory cluster IDs (regenerated on restart), no personal data collection
|
||||
- **Prometheus Integration**: Native Prometheus metrics for monitoring and alerting
|
||||
- **Grafana Dashboards**: Pre-built dashboards for data visualization
|
||||
- **Protocol Buffers**: Efficient binary data transmission for optimal performance
|
||||
- **Opt-in Only**: Disabled by default, requires explicit configuration
|
||||
- **Docker Compose**: Complete monitoring stack deployment
|
||||
- **Automatic Cleanup**: Configurable data retention policies
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
SeaweedFS Cluster → Telemetry Client → Telemetry Server → Prometheus → Grafana
|
||||
(protobuf) (metrics) (queries)
|
||||
```
|
||||
|
||||
## Data Transmission
|
||||
|
||||
The telemetry system uses **Protocol Buffers exclusively** for efficient binary data transmission:
|
||||
|
||||
- **Compact Format**: 30-50% smaller than JSON
|
||||
- **Fast Serialization**: Better performance than text-based formats
|
||||
- **Type Safety**: Strong typing with generated Go structs
|
||||
- **Schema Evolution**: Built-in versioning support
|
||||
|
||||
### Protobuf Schema
|
||||
|
||||
```protobuf
|
||||
message TelemetryData {
|
||||
string cluster_id = 1; // In-memory generated UUID
|
||||
string version = 2; // SeaweedFS version
|
||||
string os = 3; // Operating system
|
||||
repeated string features = 4; // Enabled features
|
||||
string deployment = 5; // Deployment type
|
||||
int32 volume_server_count = 6; // Number of volume servers
|
||||
uint64 total_disk_bytes = 7; // Total disk usage
|
||||
int32 total_volume_count = 8; // Total volume count
|
||||
int64 timestamp = 9; // Collection timestamp
|
||||
}
|
||||
```
|
||||
|
||||
## Privacy Approach
|
||||
|
||||
- **No Personal Data**: No hostnames, IP addresses, or user information
|
||||
- **In-Memory IDs**: Cluster IDs are generated in-memory and change on restart
|
||||
- **Aggregated Data**: Only cluster-level statistics, no individual file/user data
|
||||
- **Opt-in Only**: Telemetry is disabled by default
|
||||
- **Transparent**: Open source implementation, clear data collection policy
|
||||
|
||||
## Collected Data
|
||||
|
||||
| Field | Description | Example |
|
||||
|-------|-------------|---------|
|
||||
| `cluster_id` | In-memory UUID (changes on restart) | `a1b2c3d4-...` |
|
||||
| `version` | SeaweedFS version | `3.45` |
|
||||
| `os` | Operating system and architecture | `linux/amd64` |
|
||||
| `features` | Enabled components | `["filer", "s3api"]` |
|
||||
| `deployment` | Deployment type | `cluster` |
|
||||
| `volume_server_count` | Number of volume servers | `5` |
|
||||
| `total_disk_bytes` | Total disk usage across cluster | `1073741824` |
|
||||
| `total_volume_count` | Total number of volumes | `120` |
|
||||
| `timestamp` | When data was collected | `1640995200` |
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Deploy Telemetry Server
|
||||
|
||||
```bash
|
||||
# Clone and start the complete monitoring stack
|
||||
git clone https://github.com/seaweedfs/seaweedfs.git
|
||||
cd seaweedfs/telemetry
|
||||
docker-compose up -d
|
||||
|
||||
# Or run the server directly
|
||||
cd server
|
||||
go run . -port=8080 -dashboard=true
|
||||
```
|
||||
|
||||
### 2. Configure SeaweedFS
|
||||
|
||||
```bash
|
||||
# Enable telemetry in SeaweedFS master (uses default telemetry.seaweedfs.com:3091)
|
||||
weed master -telemetry=true
|
||||
|
||||
# Or in server mode
|
||||
weed server -telemetry=true
|
||||
|
||||
# Or specify custom telemetry server
|
||||
weed master -telemetry=true -telemetry.url=http://localhost:8080/api/collect
|
||||
```
|
||||
|
||||
### 3. Access Dashboards
|
||||
|
||||
- **Telemetry Server**: http://localhost:8080
|
||||
- **Prometheus**: http://localhost:9090
|
||||
- **Grafana**: http://localhost:3000 (admin/admin)
|
||||
|
||||
## Configuration
|
||||
|
||||
### SeaweedFS Master/Server
|
||||
|
||||
```bash
|
||||
# Enable telemetry
|
||||
-telemetry=true
|
||||
|
||||
# Set custom telemetry server URL (optional, defaults to telemetry.seaweedfs.com:3091)
|
||||
-telemetry.url=http://your-telemetry-server:8080/api/collect
|
||||
```
|
||||
|
||||
### Telemetry Server
|
||||
|
||||
```bash
|
||||
# Server configuration
|
||||
-port=8080 # Server port
|
||||
-dashboard=true # Enable built-in dashboard
|
||||
-cleanup=24h # Cleanup interval
|
||||
-max-age=720h # Maximum data retention (30 days)
|
||||
|
||||
# Example
|
||||
./telemetry-server -port=8080 -dashboard=true -cleanup=24h -max-age=720h
|
||||
```
|
||||
|
||||
## Prometheus Metrics
|
||||
|
||||
The telemetry server exposes these Prometheus metrics:
|
||||
|
||||
### Cluster Metrics
|
||||
- `seaweedfs_telemetry_total_clusters`: Total unique clusters (30 days)
|
||||
- `seaweedfs_telemetry_active_clusters`: Active clusters (7 days)
|
||||
|
||||
### Per-Cluster Metrics
|
||||
- `seaweedfs_telemetry_volume_servers{cluster_id, version, os, deployment}`: Volume servers per cluster
|
||||
- `seaweedfs_telemetry_disk_bytes{cluster_id, version, os, deployment}`: Disk usage per cluster
|
||||
- `seaweedfs_telemetry_volume_count{cluster_id, version, os, deployment}`: Volume count per cluster
|
||||
- `seaweedfs_telemetry_filer_count{cluster_id, version, os, deployment}`: Filer servers per cluster
|
||||
- `seaweedfs_telemetry_broker_count{cluster_id, version, os, deployment}`: Broker servers per cluster
|
||||
- `seaweedfs_telemetry_cluster_info{cluster_id, version, os, deployment, features}`: Cluster metadata
|
||||
|
||||
### Server Metrics
|
||||
- `seaweedfs_telemetry_reports_received_total`: Total telemetry reports received
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Data Collection
|
||||
```bash
|
||||
# Submit telemetry data (protobuf only)
|
||||
POST /api/collect
|
||||
Content-Type: application/x-protobuf
|
||||
[TelemetryRequest protobuf data]
|
||||
```
|
||||
|
||||
### Statistics (JSON for dashboard/debugging)
|
||||
```bash
|
||||
# Get aggregated statistics
|
||||
GET /api/stats
|
||||
|
||||
# Get recent cluster instances
|
||||
GET /api/instances?limit=100
|
||||
|
||||
# Get metrics over time
|
||||
GET /api/metrics?days=30
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
```bash
|
||||
# Prometheus metrics
|
||||
GET /metrics
|
||||
```
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
### Complete Stack (Recommended)
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
services:
|
||||
telemetry-server:
|
||||
build: ./server
|
||||
ports:
|
||||
- "8080:8080"
|
||||
command: ["-port=8080", "-dashboard=true", "-cleanup=24h"]
|
||||
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
ports:
|
||||
- "9090:9090"
|
||||
volumes:
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana:latest
|
||||
ports:
|
||||
- "3000:3000"
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_PASSWORD=admin
|
||||
volumes:
|
||||
- ./grafana-provisioning:/etc/grafana/provisioning
|
||||
- ./grafana-dashboard.json:/var/lib/grafana/dashboards/seaweedfs.json
|
||||
```
|
||||
|
||||
```bash
|
||||
# Deploy the stack
|
||||
docker-compose up -d
|
||||
|
||||
# Scale telemetry server if needed
|
||||
docker-compose up -d --scale telemetry-server=3
|
||||
```
|
||||
|
||||
### Server Only
|
||||
|
||||
```bash
|
||||
# Build and run telemetry server
|
||||
cd server
|
||||
docker build -t seaweedfs-telemetry .
|
||||
docker run -p 8080:8080 seaweedfs-telemetry -port=8080 -dashboard=true
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
### Protocol Buffer Development
|
||||
|
||||
```bash
|
||||
# Generate protobuf code
|
||||
cd telemetry
|
||||
protoc --go_out=. --go_opt=paths=source_relative proto/telemetry.proto
|
||||
|
||||
# The generated code is already included in the repository
|
||||
```
|
||||
|
||||
### Build from Source
|
||||
|
||||
```bash
|
||||
# Build telemetry server
|
||||
cd telemetry/server
|
||||
go build -o telemetry-server .
|
||||
|
||||
# Build SeaweedFS with telemetry support
|
||||
cd ../..
|
||||
go build -o weed ./weed
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Test telemetry server
|
||||
cd telemetry/server
|
||||
go test ./...
|
||||
|
||||
# Test protobuf communication (requires protobuf tools)
|
||||
# See telemetry client code for examples
|
||||
```
|
||||
|
||||
## Grafana Dashboard
|
||||
|
||||
The included Grafana dashboard provides:
|
||||
|
||||
- **Overview**: Total and active clusters, version distribution
|
||||
- **Resource Usage**: Volume servers and disk usage over time
|
||||
- **Deployments**: Deployment type and OS distribution
|
||||
- **Growth Trends**: Historical growth patterns
|
||||
|
||||
### Custom Queries
|
||||
|
||||
```promql
|
||||
# Total active clusters
|
||||
seaweedfs_telemetry_active_clusters
|
||||
|
||||
# Disk usage by version
|
||||
sum by (version) (seaweedfs_telemetry_disk_bytes)
|
||||
|
||||
# Volume servers by deployment type
|
||||
sum by (deployment) (seaweedfs_telemetry_volume_servers)
|
||||
|
||||
# Filer servers by version
|
||||
sum by (version) (seaweedfs_telemetry_filer_count)
|
||||
|
||||
# Broker servers across all clusters
|
||||
sum(seaweedfs_telemetry_broker_count)
|
||||
|
||||
# Growth rate (weekly)
|
||||
increase(seaweedfs_telemetry_total_clusters[7d])
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Network Security**: Use HTTPS in production environments
|
||||
- **Access Control**: Implement authentication for Grafana and Prometheus
|
||||
- **Data Retention**: Configure appropriate retention policies
|
||||
- **Monitoring**: Monitor the telemetry infrastructure itself
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**SeaweedFS not sending data:**
|
||||
```bash
|
||||
# Check telemetry configuration
|
||||
weed master -h | grep telemetry
|
||||
|
||||
# Verify connectivity
|
||||
curl -v http://your-telemetry-server:8080/api/collect
|
||||
```
|
||||
|
||||
**Server not receiving data:**
|
||||
```bash
|
||||
# Check server logs
|
||||
docker-compose logs telemetry-server
|
||||
|
||||
# Verify metrics endpoint
|
||||
curl http://localhost:8080/metrics
|
||||
```
|
||||
|
||||
**Prometheus not scraping:**
|
||||
```bash
|
||||
# Check Prometheus targets
|
||||
curl http://localhost:9090/api/v1/targets
|
||||
|
||||
# Verify configuration
|
||||
docker-compose logs prometheus
|
||||
```
|
||||
|
||||
### Debugging
|
||||
|
||||
```bash
|
||||
# Enable verbose logging in SeaweedFS
|
||||
weed master -v=2 -telemetry=true
|
||||
|
||||
# Check telemetry server metrics
|
||||
curl http://localhost:8080/metrics | grep seaweedfs_telemetry
|
||||
|
||||
# Test data flow
|
||||
curl http://localhost:8080/api/stats
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Make your changes
|
||||
4. Add tests if applicable
|
||||
5. Submit a pull request
|
||||
|
||||
## License
|
||||
|
||||
This telemetry system is part of SeaweedFS and follows the same Apache 2.0 license.
|
||||
Reference in New Issue
Block a user