Google’s Jarvis, powered by a future iteration of the Gemini AI, is reportedly designed to operate exclusively within web browsers, specifically optimized for Chrome. This innovative tool aims to streamline the automation of everyday web-based tasks by analyzing and interacting with elements displayed on the screen. It can perform actions such as clicking buttons and entering text after taking screenshots, though it requires a few seconds to complete each action, according to The Information.
This development aligns with a broader trend among leading AI companies focused on enhancing digital task automation. For example, Microsoft is set to release Copilot Vision, which will enable users to engage in discussions about the webpages they are viewing. Additionally, Apple is expected to introduce an Intelligence feature capable of managing screen-based actions across multiple applications within the next year. Meanwhile, Anthropic has launched a beta version of its AI, Claude, although it has faced criticism for being “cumbersome and error-prone.” OpenAI is also rumored to be developing a similar model to compete in this space.
Despite the excitement surrounding Jarvis, The Information has pointed out that Google’s timeline for previewing the tool in December is subject to change. The company may decide to conduct a limited release to a select group of testers, which would allow it to identify and address any potential issues before a wider rollout. As the competition in AI-driven task automation intensifies, Google’s Jarvis represents a significant step forward, but its success will ultimately depend on user feedback and refinement during the testing phase.