OpenClaw Media Processing: Images, Video, Audio, and Voice
OpenClaw extends beyond text into multimedia. Generate AI images, transcribe audio, process video, and create voice applications—all automated through skills.
Image Generation Skills
1. image-generator
Create AI images:
# Generate image from prompt
openclaw skill run image-generator \
--prompt "Modern tech office with AI theme, blue gradient" \
--style "minimalist" \
--size "1024x1024" \
--output "hero-image.png"
# Batch generation
openclaw skill run image-generator \
--prompts "prompts.txt" \
--variations 3 \
--output-dir "./images/"
2. openai-image-gen
Using DALL-E:
openclaw skill run openai-image-gen \
--prompt "Futuristic city skyline, cyberpunk style" \
--quality "hd" \
--style "vivid"
Audio Processing Skills
3. openai-whisper
Transcribe audio:
# Transcribe file
openclaw skill run openai-whisper \
--file "meeting.mp3" \
--language "en" \
--output "transcript.txt"
# Real-time transcription
openclaw skill run openai-whisper \
--stream \
--microphone
4. openai-whisper-api
API-based transcription:
openclaw skill run openai-whisper-api \
--url "https://example.com/audio.mp3" \
--translate \
--output-format "srt"
5. sag (Text-to-Speech)
Generate voice:
# Convert text to speech
openclaw skill run sag \
--text "Hello from OpenClaw" \
--voice "nova" \
--output "greeting.mp3"
# Long-form narration
openclaw skill run sag \
--file "article.txt" \
--voice "onyx" \
--chunk-size 4000
6. sherpa-onnx-tts
Local TTS:
openclaw skill run sherpa-onnx-tts \
--text "This runs locally on your machine" \
--model "en-US" \
--speed 1.2
Video Processing
7. video-frames
Extract frames:
# Extract key frames
openclaw skill run video-frames \
--video "presentation.mp4" \
--rate "1fps" \
--output "frames/"
# Generate thumbnail
openclaw skill run video-frames \
--video "video.mp4" \
--frame "00:01:30" \
--output "thumbnail.jpg"
Media Workflows
Podcast Production
# podcast-production.yaml
name: "Podcast Episode Creation"
steps:
- name: record-intro
skill: sag
action: generate
params:
script: "Welcome to the AI Podcast..."
voice: "nova"
- name: transcribe-interview
skill: openai-whisper
action: transcribe
params:
audio: "interview.mp3"
speakers: 2
- name: generate-show-notes
skill: summarize
action: extract-key-points
params:
transcript: "{{transcribe-interview.output}}"
- name: create-cover-art
skill: image-generator
action: create
params:
prompt: "Podcast cover, AI theme, episode 42"
Video Content Pipeline
# video-pipeline.yaml
name: "YouTube Video Production"
steps:
- name: generate-thumbnail
skill: image-generator
params:
prompt: "{{video.title}}, eye-catching thumbnail"
size: "1280x720"
- name: extract-clips
skill: video-frames
params:
video: "{{video.file}}"
extract_highlights: true
- name: generate-description
skill: summarize
params:
transcript: "{{video.transcript}}"
format: "youtube-description"
- name: create-chapters
skill: openai-whisper
action: detect-chapters
params:
audio: "{{video.audio}}"
Voice Applications
7. voice-call
Make voice calls:
# Place call
openclaw skill run voice-call \
--to "+1234567890" \
--message "This is an automated reminder..."
# Interactive call
openclaw skill run voice-call \
--to "+1234567890" \
--script "appointment-confirmation.yaml"
Media Management
Organizing Assets
# media-organization.yaml
name: "Asset Organization"
steps:
- name: scan-folder
skill: file-manager
action: scan
params:
path: "./media"
types: ["jpg", "png", "mp4", "mp3"]
- name: tag-content
skill: image-generator
action: analyze
params:
images: "{{scan-folder.images}}"
generate-tags: true
- name: organize-by-date
skill: file-manager
action: organize
params:
files: "{{scan-folder.all}}"
structure: "YYYY/MM"
Best Practices
1. Optimize Costs
- Use local models when possible
- Cache generated media
- Batch process when applicable
2. Quality Control
- Review AI-generated content
- Maintain brand consistency
- Use appropriate licenses
3. File Management
- Organize by project
- Version control assets
- Automate cleanup
Recommended Skills
For Content Creators
- image-generator - Featured images
- sag - Voiceovers
- video-frames - Thumbnails
For Podcasters
- openai-whisper - Transcription
- sag - Intro/outro
- summarize - Show notes
For Video Production
- video-frames - Clip extraction
- image-generator - Thumbnails
- openai-whisper - Subtitles
Create media at scale with OpenClaw. More tutorials available.