🖼️ Multimodal Vision: Your agent can see what you see. It can view the scene, look through any camera, watch play mode, and inspect asset thumbnails. 🔎 Powerful Search: Go beyond the project panel ...
GUI grounding, which maps natural-language instructions to actionable UI elements, is a core capability of GUI agents. Prior works largely treats instructions as a static proxy for user intent, ...