Overview
QuestDB provides comprehensive monitoring through:- Prometheus Metrics: Detailed operational metrics
- Health Check Endpoints: HTTP endpoints for liveness/readiness probes
- Logging: Structured application logs
- Telemetry: Anonymous usage statistics
- Query Tracing: Execution plan analysis
Prometheus Metrics
Enable Metrics
- Endpoint:
http://localhost:9000/metrics - Format: Prometheus text format
Key Metrics
System Metrics
Connection Metrics
Query Metrics
Write Metrics
Table Metrics
Health Metrics
Prometheus Configuration
prometheus.yml:Health Check Endpoints
HTTP Health Check
Standard endpoint:HTTP MIN Server
Dedicated minimal health check endpoint (doesn’t log requests):Kubernetes Probes
Logging
Log Configuration
Log Files
Logs are written to<root>/log/:
Log Format
I: InfoW: WarningE: ErrorC: Critical
JVM Logging
Enable verbose JVM logging:Log Rotation
QuestDB rotates logs daily. Configure external log rotation: /etc/logrotate.d/questdb:Query Tracing
Enable Tracing
Trace Query Execution
Query Execution Logs
Withlog.sql.query.progress.exe=true, QuestDB logs:
Performance Monitoring
System Metrics
Monitor CPU usage:QuestDB System Tables
Table metadata:Connection Monitoring
Telemetry
Configuration
Telemetry Data
View telemetry events:DB_START: Server startupDB_STOP: Server shutdownTABLE_CREATE: Table creationQUERY_EXEC: Query execution
Alerting
Prometheus Alerting Rules
alerts.yml:Monitoring Dashboard
Grafana Dashboard JSON:Custom Monitoring Scripts
Monitor Query Performance
Monitor WAL Lag
Monitor Disk Space
Best Practices
- Enable Metrics: Always run with
metrics.enabled=truein production - Health Checks: Use HTTP MIN endpoint for Kubernetes probes
- Log Retention: Rotate logs to prevent disk exhaustion
- Alert Thresholds: Set alerts based on baseline metrics
- Dashboard: Create Grafana dashboard for real-time visibility
- Query Tracing: Enable temporarily for debugging, disable in production
- Monitor Leaks: Alert on non-zero
reader_leak_counter - WAL Lag: Keep below 60 seconds for real-time applications
- Telemetry: Review periodically for usage patterns
- Baseline: Establish performance baseline during testing
Troubleshooting
High Memory Usage
- Check memory metrics by tag
- Look for reader leaks
- Verify symbol cache settings
- Review page frame sizes
Slow Queries
- Enable query tracing
- Use EXPLAIN to analyze plan
- Check for missing indexes
- Review parallel execution settings
Connection Issues
- Check active connection count
- Verify connection limits
- Review timeout settings
- Monitor network errors
WAL Apply Lag
- Check WAL writer worker count
- Verify disk I/O performance
- Review commit interval settings
- Monitor WAL segment sizes