Initially launched in December 2023, Google Gemini has recently undergone a substantial upgrade with the early December release of Gemini 2.0. It’s built for what Google calls the “agentic era,” with capabilities that allow it to more independently act on complex, multistep processes.
Other core improvements include native image and audio processing, faster response times, improved coding capabilities, and new integrations that are being developed with other Google apps and solutions to help power your Android smartphone, computer, and other connected devices.
A head-spinning onslaught of new Gemini models
Google has been spinning out a ton of different AI models lately, with multiple new versions released in the past few weeks. In certain respects, such as speed in 2.0 Flash, improvements are readily noticeable. Others are in more specialized areas, like coding. Meanwhile, 2.0 Pro is still under development.
The new 2.0 models are available on the desktop, and more recently in the Gemini mobile app, where you’ll find a selector to choose between them. And let’s not forget the on-device Nano model, which is already powering certain Google Pixel features such as call summaries. It’s also worth noting that another new model, 2.0 Experimental Advanced, popped up on the desktop in the last few days.
As Taylor Kerns points out, though, Gemini is becoming more complex, and it’s getting hard to keep track of all the variants. Since there’s not a lot of information available on Experimental Advanced, I’ve stuck with the two in the comparison below.
Feature | Gemini 1.5 Pro | Gemini 2.0 Flash Experimental |
Context Window | 1 million tokens (around 750,000 words or 1,500 pages of text) | 1 million tokens (around 750,000 words or 1,500 pages of text) |
Speed | Responses within a few seconds | Around 2x faster |
Power Consumption | Higher | Lower |
Reasoning/Logic | Strong reasoning and collaboration | Claims improved reasoning and adds agentic capabilities |
Multimodal | Image and audio converted to text for processing. | Native image and audio processing. Can now “speak” using AI voices. |
Image creation | Was suspended | Supported |
Coding | Can generate code | Can generate and execute code, parse API responses, and integrate data into external applications |
Gemini 2.0 Flash is all about speed and efficiency
Source: Google
As the name suggests, Gemini 2.0 Flash is engineered for speed. Google claims it doubles the speed of its predecessor, and as a user of both 1.5 Pro and 2.0 Flash Experimental, I can attest to its snappiness.
2.0 provides nearly instantaneous responses to the same queries that might take 1.5 Pro a few seconds. That might not sound like a massive impact, but instantaneous response unlocks new potential for real-time applications such as speech interactions. It also makes the overall user experience feel more refined. Despite its increased power, Gemini 2.0 Flash is also designed to be more energy-efficient, which could directly translate into better battery life on your smartphone.
Gemini 2.0 Flash brings enhanced capabilities in other core areas. Google says it outperforms Gemini 1.5 Pro in complex tasks such as coding, math, and logical reasoning. Furthermore, Gemini 2.0 Flash can now directly execute code, autonomously process API responses, and call user-defined functions. 2.0 is starting to look more like an end-to-end development solution than a simple code generator.
Gemini wants to be your AI agent
Agentic AI moves Gemini towards proactive assistance. This means Gemini can now act as an agent, carrying out multistep tasks on your behalf. Future applications will include everything from gaming and robotics to travel planning.
Let’s say you’re planning a trip to Tokyo. Instead of simply asking Gemini for sightseeing suggestions, you could ask it to “create a detailed itinerary for a 5-day trip to Tokyo, including must-see attractions, local restaurant recommendations, and estimated costs.” I tried this exact prompt and the platform generated a compelling day-by-day itinerary for me. But there are still missing components.
Theoretically, Gemini could even go further by booking flights and accommodations, reserving tables at restaurants, and a lot more. In fact, 2.0 Flash does integrate with Google Flights and can show hotel availability at your destination, but that final step of automating the whole process is still to come. It’s easy to see how that could be a tough one to solve, as booking the wrong flight, for instance, can quite literally carry a steep price. Imagine an AI booking you a journey to the wrong Springfield!
Gemini 2.0 can see, hear, and speak
The advancements in multimodal input and output within Gemini 2.0 are another key feature. By seamlessly integrating information from various sources like text, images, video, and audio, Gemini 2.0 can experience the world more like we do. This paves the way for more human-like communication.
Gemini 2.0 can now converse by using an AI voice. In the mobile app, I found several different voices to choose from, selected one I liked, and had a surprisingly natural, flowing conversation where I asked the AI questions about a city I’d like to visit. The level of effort was definitely lower than typing queries and reading responses. While this functionality isn’t new to the industry–think AI “companion” apps–it is new to Gemini.
Native image and audio processing deliver noticeable improvements
A cool improvement in Gemini 2.0 is its ability to directly process images and audio. In contrast, its predecessors converted these inputs to text, leading to more information loss. Direct processing allows a deeper understanding of the input. Gemini 2.0 can not only identify elements within an image or audio but can also understand interrelationships and the scene as a whole.
During testing, I fed Gemini 2.0 Flash an image I took looking out from my office. In the foreground is a window screen, while there are shrubs and other objects in the mid-ground. The AI knew right away that the photo was shot through a screen, and described in great detail other elements in the scene. On the whole, I’ve found that the 2.0 model does offer more nuanced and detailed analysis of images than the previous version.
Gemini’s image generation is back, but does anyone care?
Despite the fanfare around Gemini 2.0’s improved capabilities, the return of its Imagen image generation feature was a bit dull. After the initial controversy and subsequent disabling of the feature due to biases and inaccuracies, the re-release feels uninteresting. Maybe Imagen has been watered down to avoid additional controversy, or maybe it’s just that the novelty of AI image generation wore off during Google’s long pause.
The above image is what Gemini 2.0 Flash Experimental created when prompted to “create the most interesting image you can conjure.” While I understand that’s a subjective prompt, I would nonetheless call the result underwhelming. At best, it looks like a scene in a video game.
Through further experimentation, when I prompted 2.0 Flash Experimental simply to “create an image of people,” it refused. Switching back to 1.5 Pro and giving the same prompt did result in a stock photo-like image of a group of friends in vibrant colors. With Imagen, we see through the eyes of Googe’s AI, and its perspective isn’t very inspiring.
New integrations foreshadow the future
Source: Google
By weaving Gemini’s capabilities into core services like Search, Maps, and Workspace, Google aims to deliver a more unified user experience.
In the future, your search queries on Google will yield dynamic, AI-powered responses that will likely draw on information from your emails, documents, and even your location history to provide more personally relevant results. Google is already experimenting with AI search summaries that feature Audio Overviews in the style of its sister product, NotebookLM.
Early initiatives like Project Astra and Project Mariner are finally seeing the light of day in the latest Gemini models. Astra comprises experimentation with AI-powered code agents, such as Jules. Meanwhile, Mariner could enable tasks like automatically filling out forms or summarizing web pages. These projects are essentially the philosophical pillars on which Google is developing its AI applications and services.
Related
Google’s experimental Gemini 2.0 Advanced model is here, but not for everyone
Your Pixel’s free subscription might come in handy
Google is building a solid AI foundation with Gemini
Gemini 2.0 is a significant step forward for Google AI, offering faster speeds, enhanced reasoning, and seamless multimodal integration. The lackluster return of image generation and the confusing array of model variants highlight the complexities of this fast-moving category.
However, the advancements in agentic AI, new coding, voice and image capabilities, along with the deeper integration with core Google services foreshadow good things to come in 2025.
Article by:Source