General ===================== 1. **An Empirical Study & Evaluation of Modern CAPTCHAs** Arxiv *Andrew Searles, Yoshimichi Nakatsuka, Ercan Ozturk, Andrew Paverd, Gene Tsudik, Ai Enkoji* `pdf `_, 2023.7 2. **The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study** Arxiv *Rim_Assouel1, Tom Marty, Massimo Caccia, Issam H. Laradji, Alexandre Drouin, Sai Rajeswar, Hector Palacios, Quentin Cappart, David Vazquez, Nicolas Chapados, Maxime Gasse, Alexandre Lacoste* `pdf `_, 2023.12 3. **"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces** Arxiv *Faria Huq, Jeffrey P. Bigham, Nikolas Martelaro* `pdf `_, 2023.12 4. **Autonomous Evaluation and Refinement of Digital Agents** Arxiv *Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr* `pdf `_, 2024.4 5. **IDs for AI Systems** Arxiv *Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung* `pdf `_, 2024.6 6. **OS-ATLAS: A Foundation Action Model For Generalist GUI Agents** Arxiv *Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao* `pdf `_, 2024.10 7. **Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction** Arxiv *Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong* `pdf `_, 2024.12 8. **AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials** Arxiv *Yiheng Xu, Dunjie Lu, Zhennan Shen, Junli Wang, Zekun Wang, Yuchen Mao, Caiming Xiong, Tao Yu* `pdf `_, 2024.12 9. **Aria-UI: Visual Grounding for GUI Instructions** Arxiv *Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li* `pdf `_, 2024.12