Google Account Guideline

Run some OSWorld examples requiring a Google account by following these steps:

Real Accounts

For tasks including Google or Google Drive, we need a real Google account as well as configured OAuth2.0 secrets.

Attention

To prevent environment reset and result evaluation conflicts caused by multiple people using the same Google account simultaneously, we will not provide the public test accounts available. Please register a private Google account.

Register A Blank Google Account

  1. Go to Google web site and register a blank new account

  • In this testbed, you do not need to provide any recovery email or phone, since we only use it for testing cases

  • Just IGNORE any security recommendations

  • Shut OFF the 2-Step Verification to avoid failure in environment setup (requesting phone verification code)

Shut Off 2-Step Verification

Shut Off 2-Step Verification

Attention

We strongly recommend that you register a new blank account instead of using an existing one, in order to avoid messing up your personal workspace.

  1. Next, copy and rename the template file settings.json.template into settings.json under folder evaluation_examples/settings/google/. Remember to replace the two fields email and password:

  • these two fields are used to simulate real people login to Chrome browser during environment setup for relevant examples in the virtual machine

{
    "email": "your_google_account@gmail.com",
    "password": "your_google_account_password"
}

Create A Google Cloud Project

  1. Navigate to Google Cloud Project Creation page and create a new GCP (see Create a Google Cloud Project for detailed steps). You can use any project name.

  2. Go to the Google Drive API console and enable the GoogleDrive API for the created project (see Enable and disable APIs for detailed steps)

Create GCP
Google Drive API

Create OAuth2.0 Credentials

  1. Go to the credentials page, click “CREATE CREDENTIALS -> OAuth client ID”

Create OAuth client ID
  1. For Application type, please choose “Desktop app”. You can use any Name. And click “CREATE”.

Desktop App
  1. Now, in the pop-up window, you can download the JSON file client_secret_xxxxx.json. Move and rename this .json file to file path evaluation_examples/settings/googledrive/client_secrets.json in the OSWorld project. The folder should look like:

- evaluation_examples/
  - settings/
    - google/
      - settings.json
      - settings.json.template
    - googledrive/
      - settings.yml
      - client_secrets.json
  1. Note that, when you first run a task including Google Drive, there will be a URL requesting your permission. Open the link in unsafe mode using the Gmail you filled in evaluation_examples/settings/google/settings.json, authorize, and confirm your choice once for all. Eventually, you will see a prompt message “The authentication flow has completed.” on a blank web page.

Unsafe mode
Authorization

Potential Issues

Due to strict checks by Google safety teams, even if we shut down the 2-step verification, Google still detects potential risks of your account, especially when you frequently change the login device. You may encounter the following issues:

Phone Verification Code Required

When the VM tries to log into the Google Drive page, Google requests you to provide a phone number and verification code. This may occur when you change your IP or device for the first time.

Phone Verification Code Required

To solve it, typing any phone number is adequate (since we shut off the 2-step verification and do not provide any recovery phone number). And fill in the received verification code. After that, hopefully, Google will remember this new login IP or device. Now, you can restart the task, and this time, it should work.

Identity Verification

Identity Verification

In this case, Google does not give you the chance to use a phone verification code. Since we do not provide any recovery email/phone and shut down the 2-step verification, we are unable to log in from the new device. We hypothesize that this problem may occur when you frequently change the login IPs or devices, such that Google detects the unusual usages. The only solution is to reset the password from the device in which you register this Google account.

Attention

Sadly, we do not have a permanent solution. The only suggestion is not to frequently change your login IP or device. If you encounter any problem above, Google may urge you to change the password. Also, remember to update the password in evaluation_examples/settings/google/settings.json.

Task Count and Evaluation Options

Total Task Overview: OSWorld benchmark contains 369 total tasks, including 8 tasks that specifically require Google Drive integration and the OAuth2.0 setup described in this guide.

Impact of Google Drive Configuration Issues:

Due to Google’s increasingly strict security policies, you may encounter persistent setup issues that prevent the 8 Google Drive tasks from running properly, even after following all configuration steps. This is a known limitation, particularly when:

  • Running evaluations from different IP addresses or geographic locations

  • Using virtual machines or cloud infrastructure

  • Encountering Google’s automated bot detection systems

  • Facing account security restrictions for new devices

Acceptable Evaluation Approaches:

The OSWorld benchmark officially supports two evaluation methodologies:

  1. Full Evaluation (369 tasks): Complete the Google Drive setup successfully and evaluate all tasks

  2. Standard Evaluation (361 tasks): Skip the 8 Google Drive tasks and evaluate the remaining tasks

Attention

Both approaches are acceptable for benchmark evaluation and result reporting. The choice between 369 vs 361 tasks does not invalidate your evaluation - simply specify which approach you used when reporting results.

Recommendation: If you encounter persistent Google Drive setup issues after following this guide, consider using the 361-task evaluation approach rather than spending excessive time on these 8 tasks, especially during initial development and testing phases.