We thank the trust from following projects for using OSWorld to benchmark the progress of multimodal agents!
Cradle: Empowering Foundation Agents Towards General Computer Control
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
…