OpenClaw Best Practices: Security, Performance, and Optimization
Running OpenClaw in production requires attention to security, performance, and reliability. This guide covers essential best practices for deploying and maintaining OpenClaw at scale.
Security Best Practices
1. API Key Management
Never Hardcode Secrets:
# β Bad
openclaw:
ai:
api_key: "sk-abc123..."
# β
Good
openclaw:
ai:
api_key: ${OPENAI_API_KEY}
Use Environment Variables:
# .env file (add to .gitignore!)
OPENAI_API_KEY=sk-...
DISCORD_TOKEN=...
SLACK_TOKEN=...
Secret Management Tools:
# Using 1password skill
openclaw skill run 1password --get-secret OPENAI_API_KEY
# Using AWS Secrets Manager
aws secretsmanager get-secret-value --secret-id openclaw/prod
2. Access Control
Configure Allowed Hosts:
security:
allowed_hosts:
- "openclaw.yourdomain.com"
- "localhost"
cors_origins:
- "https://yourdomain.com"
Enable Authentication:
security:
auth:
type: jwt
secret: ${JWT_SECRET}
expiry: 24h
rate_limiting:
enabled: true
requests_per_minute: 60
3. Network Security
Use HTTPS:
server:
tls:
enabled: true
cert: /path/to/cert.pem
key: /path/to/key.pem
Firewall Rules:
# Allow only necessary ports
sudo ufw allow 8080/tcp # OpenClaw
sudo ufw deny 3306/tcp # Block direct DB access
4. Skill Sandboxing
skills:
security:
sandbox: true
network_access:
- "api.openai.com"
- "api.github.com"
file_access: "read-only"
max_memory: "512MB"
max_cpu: "50%"
Performance Optimization
1. Caching Strategy
Enable Response Caching:
cache:
enabled: true
type: redis # or memory, filesystem
ttl: 3600
patterns:
- pattern: "weather:*"
ttl: 1800
- pattern: "github:*"
ttl: 300
Skill-Level Caching:
skills:
weather:
cache:
enabled: true
ttl: 1800
content-research:
cache:
enabled: true
ttl: 86400
2. Connection Pooling
database:
pool:
min: 5
max: 20
acquire_timeout: 5000
idle_timeout: 30000
3. Async Processing
workers:
enabled: true
count: 4
queue: redis
retry:
max_attempts: 3
backoff: exponential
4. Monitoring
monitoring:
metrics:
enabled: true
endpoint: /metrics
tracing:
enabled: true
sample_rate: 0.1
alerting:
rules:
- condition: "error_rate > 0.05"
channel: "slack"
target: "#alerts"
Resource Management
1. Memory Limits
resources:
memory:
max: "2GB"
warning: "1.5GB"
skills:
max_concurrent: 10
timeout: 30000
2. CPU Throttling
resources:
cpu:
limit: "2 cores"
priority: "normal"
3. Disk Management
storage:
logs:
max_size: "1GB"
retention: "30d"
cache:
max_size: "500MB"
cleanup_interval: "1h"
Logging and Debugging
1. Structured Logging
logging:
level: info
format: json
fields:
- timestamp
- level
- skill
- action
- duration
- error
output:
- type: file
path: /var/log/openclaw/app.log
- type: stdout
2. Audit Logging
audit:
enabled: true
events:
- skill_execution
- config_change
- authentication
- authorization_failure
retention: "90d"
3. Debug Mode (Development Only)
# Only enable in development!
debug:
enabled: false
verbose_skills: false
log_requests: false
Backup and Recovery
1. Configuration Backup
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/openclaw/$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR
# Backup config
cp ~/.openclaw/config.yaml $BACKUP_DIR/
# Backup skill configs
tar czf $BACKUP_DIR/skills.tar.gz ~/.openclaw/skills/
# Backup databases (if using local)
# pg_dump openclaw > $BACKUP_DIR/database.sql
# Upload to S3
aws s3 sync $BACKUP_DIR s3://my-backups/openclaw/
2. Disaster Recovery
recovery:
auto_restart: true
max_restarts: 5
restart_window: 3600
health_check:
interval: 30
timeout: 10
path: /health
Scaling Strategies
1. Horizontal Scaling
cluster:
enabled: true
nodes: 3
load_balancer:
type: round_robin
health_check: /health
shared_state:
type: redis
url: redis://redis-cluster:6379
2. Skill Distribution
skill_routing:
rules:
- skill: "heavy-processing"
nodes: ["worker-1", "worker-2"]
- skill: "quick-tasks"
nodes: ["api-1", "api-2", "api-3"]
Maintenance
1. Regular Updates
# Update OpenClaw
npm update -g openclaw
# Update skills
openclaw skill update --all
# Security audit
npm audit
2. Health Checks
health:
checks:
- name: disk_space
command: "df -h / | awk 'NR==2 {print $5}' | sed 's/%//'"
warning: 80
critical: 90
- name: memory_usage
command: "free | grep Mem | awk '{print $3/$2 * 100}'"
warning: 80
critical: 95
- name: skill_response_time
endpoint: /health/skills
warning: 1000
critical: 5000
3. Cleanup Jobs
maintenance:
cleanup:
schedule: "0 2 * * *" # Daily at 2 AM
tasks:
- name: old_logs
action: delete
path: /var/log/openclaw/*.log
older_than: "30d"
- name: temp_files
action: delete
path: /tmp/openclaw/*
older_than: "1d"
- name: cache_cleanup
action: trim
size: "500MB"
Production Checklist
Pre-Deployment
- Security audit completed
- Secrets externalized
- HTTPS enabled
- Rate limiting configured
- Logging configured
- Monitoring enabled
- Backups configured
- Health checks added
- Documentation updated
Post-Deployment
- Verify all skills work
- Check error rates
- Monitor resource usage
- Test failover
- Verify backups
- Set up alerts
Troubleshooting Common Issues
High Memory Usage
# Check skill memory limits
skills:
memory_limits:
enabled: true
default: "256MB"
Slow Response Times
# Enable profiling
performance:
profiling:
enabled: true
slow_query_threshold: 1000
Skill Failures
# Add retry logic
skills:
retry:
enabled: true
max_attempts: 3
backoff: exponential
Recommended Tools
Monitoring
- Prometheus + Grafana: Metrics and dashboards
- Sentry: Error tracking
- Datadog: Full-stack monitoring
Security
- 1password: Secret management
- Vault: Dynamic secrets
- Snyk: Vulnerability scanning
Performance
- Redis: Caching and sessions
- Nginx: Reverse proxy and load balancing
- Cloudflare: CDN and DDoS protection
Deploy OpenClaw with confidence. More production guides available.