Vision and Roadmap ================== Vision ------ Roadmap ------- Here we provide a high-level road map for the project. We will update this road map as we make progress. If you are interested in contributing to the project, please check the `CONTRIBUTING.md` for more details. Road Map for Environment Infrastructure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ✓ Explore VMWare, and whether it can be connected and controlled through the mouse package - ✓ Explore Windows and MacOS, whether they can be installed - MacOS is closed source and cannot be legally installed - Windows is available legally and can be installed - ✓ Build a gym-like Python interface for controlling the VM - ✓ Recording of actions (mouse movement, click, keyboard) for humans to annotate, and we can replay it and compress it - ✓ Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser - ✓ Set up a pipeline and build agent implementation (zero-shot) for the task - ✓ Start to design which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public - ✓ Start to annotate the examples for ~~training~~ and testing - ✓ Error handling during file passing and file opening, etc. - ✓ Add accessibility tree from the OS into the observation space - ✓ Add pre-process and post-process action support for benchmarking setup and evaluation - ✓ Experiment logging and visualization system - ✓ Add more tasks, maybe scale to 300 for v1.0.0, and create a dynamic leaderboard - ✓ Multiprocess support, can enable reinforcement learning to be more efficient - ✓ Add support for automatic VM download and configuration, enable auto-scaling management - ✓ VPN setup doc for those who need it - ✓ Support running on platforms that have nested virtualization, AWS - ✓ Be able to run without virtual machine platform VMware Pro, e.g. VirtualBox, or other platforms - ☐ Support running on platforms that have nested virtualization, GCP - ☐ Prepare for the first release of Windows vm image for the environment - ☐ Add VNC-based video streaming as observation/actions for potential online-video understanding models to tackle tasks. Road Map of Annotation Tool ^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ☐ Improve the annotation tool base on DuckTrack/OpenAdapt, and make it more robust which aligns on accessibility tree - ☐ Annotate the steps of doing the task - ☐ Crawl all resources we explored from the internet, and make it easy to access - ☐ Set up ways for the crowdsourcing/community to contribute new examples