General
An Empirical Study & Evaluation of Modern CAPTCHAs Arxiv
Andrew Searles, Yoshimichi Nakatsuka, Ercan Ozturk, Andrew Paverd, Gene Tsudik, Ai Enkoji pdf, 2023.7
The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study Arxiv
Rim_Assouel1, Tom Marty, Massimo Caccia, Issam H. Laradji, Alexandre Drouin, Sai Rajeswar, Hector Palacios, Quentin Cappart, David Vazquez, Nicolas Chapados, Maxime Gasse, Alexandre Lacoste pdf, 2023.12
“What’s important here?”: Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces Arxiv
Faria Huq, Jeffrey P. Bigham, Nikolas Martelaro pdf, 2023.12
Autonomous Evaluation and Refinement of Digital Agents Arxiv
Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr pdf, 2024.4
IDs for AI Systems Arxiv
Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung pdf, 2024.6
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents Arxiv
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao pdf, 2024.10
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Arxiv
Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong pdf, 2024.12
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Arxiv
Yiheng Xu, Dunjie Lu, Zhennan Shen, Junli Wang, Zekun Wang, Yuchen Mao, Caiming Xiong, Tao Yu pdf, 2024.12
Aria-UI: Visual Grounding for GUI Instructions Arxiv
Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li pdf, 2024.12