MMSkills

Behavioral Comparisons

MMSkills Case Studies

Each case compares the same OSWorld task under no skills, text-only skill guidance, and multimodal MMSkills. The clips keep the original 1080p resolution in browser-compatible MP4 form and are loaded lazily.

No Skills Baseline visual-agent run without skill retrieval.
Text-only Branch planner receives procedural text but no visual state evidence.
MMSkills Runtime branch loads state cards and visual references before acting.