Behavioral Comparisons
MMSkills Case Studies
Each case compares the same OSWorld task under no skills, text-only skill guidance, and multimodal MMSkills. The clips keep the original 1080p resolution in browser-compatible MP4 form and are loaded lazily.
No Skills
Baseline visual-agent run without skill retrieval.
Text-only
Branch planner receives procedural text but no visual state evidence.
MMSkills
Runtime branch loads state cards and visual references before acting.