Google is trying to make a splash with Gemini, a new generative AI platform that recently debuted. But while Gemini shows promise in some aspects, it falls short in others. So what is Gemini? How can it be used? How does it measure up to the competition?
To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll continue to update as new Gemini models and features are released.
What is Gemini?
Gemini is Google I promised a long time ago, a family of next-generation AI models, developed by Google’s AI research labs DeepMind and Google Research. Comes in three flavours:
- Gemini UltraGemini’s leading model
- Gemini Prothe “light” Gemini archetype.
- Gemini NanoIt’s a smaller, “distilled” model that works on mobile devices like the Pixel 8 Pro
All Gemini models are trained to be “natively multimodal” – in other words, able to work with and use more than just text. They are pre-trained and well-tuned on a variety of audio files, images and videos, a wide range of code bases, and texts in different languages.
This sets Gemini apart from models like Google’s LaMDA, which is trained only on textual data. LaMDA cannot understand or generate anything other than text (such as articles, email drafts, etc.) – but this is not the case with Gemini forms. Their ability to understand images, sound and other modalities is still limited, but it is better than nothing.
What’s the difference between Bard and Gemini?
Google once again proved that it lacked branding talent, and did not make it clear from the beginning that Gemini was separate and distinct from Bard. Bard is simply an interface through which some Gemini models can be accessed – think of it as an application or client for Gemini and other AI models. Gemini, on the other hand, is a family of models – not an application or a front-end. There is no independent Gemini experience, and it is unlikely that there will ever be. If you were to compare to OpenAI products, Bard is compatible with ChatGPT, the popular conversational AI application from OpenAI, and Gemini is compatible with the language model it runs, which in ChatGPT’s case is GPT-3.5 or 4.
Incidentally, Gemini is also completely independent of Imagen-2, a text-to-image model that may or may not fit into the company’s overall AI strategy. Don’t worry, you’re not the only one confused by this!
What can Gemini do?
Because Gemini models are multimodal, they can theoretically perform a range of tasks, from transcribing speech to annotating photos and videos and even creating works of art. Only a few of these capabilities have made it to the product stage so far (more on that later), but Google promises all of them — and more — at some point in the not-so-distant future.
Of course, it’s a bit difficult to take the company at its word.
Google seriously didn’t deliver with the original Bard launch. Recently, she sparked a video purporting to show Gemini’s abilities that turned out to be highly manipulated and rather ambitious. twin He isTo the tech giant’s credit, it’s available in some form today — but in a somewhat limited form.
However, assuming Google is somewhat honest in its claims, here’s what the different levels of the Gemini models will be able to do once they’re released:
Gemini Ultra
Few people have gotten their hands on the Gemini Ultra model, the “base” model on which others have been built so far — just a “select group” of customers across a few Google apps and services. This won’t change until later this year, when Google’s largest model is launched on a larger scale. Most of the information about the Ultra comes from Google-led product demos, so it’s best taken with a pinch of salt.
Google says Gemini Ultra can be used to help with things like physics homework, solving problems step-by-step on a worksheet, and pointing out potential errors in answers already filled out. Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says, extracting information from those papers and “updating” a chart from one of them by creating the formulas needed to recreate the chart with more recent data.
Gemini Ultra technically supports image creation, as previously noted. But this capability won’t find its way into the production version of the model at launch, according to Google — perhaps because the mechanism is more complex than how apps like ChatGPT generate images. Instead of feeding the prompts to an image generator (like DALL-E 3, in the case of ChatGPT), Gemini outputs the images “locally” without an intermediate step.
Gemini Pro
Unlike the Gemini Ultra, the Gemini Pro is available to the general public today. But confusingly, its capabilities depend on where it is used.
Google says that in Bard, where Gemini Pro was first launched in text-only form, the model is an improvement over LaMDA in its reasoning, planning and understanding capabilities. independent Stady Researchers from Carnegie Mellon University and BerriAI found that Gemini Pro is actually better than OpenAI’s GPT-3.5 at handling longer, more complex reasoning chains.
But the study also found that, like all large language models, Gemini Pro in particular struggles with mathematical problems involving multiple numbers, and users found plenty of examples of bad reasoning and errors. I made a lot of factual errors for simple inquiries like who won the most recent Oscars. Google has promised improvements, but it’s not clear when they will arrive.
Gemini Pro is also available via an API in Vertex AI, Google’s fully managed AI developer platform, which accepts text as input and generates text as output. An additional endpoint, Gemini Pro Vision, can process text And Images – including photos and video – and text output modeled after OpenAI’s GPT-4 with vision model.
Within Vertex AI, developers can customize Gemini Pro for specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external third-party APIs to perform certain actions.
Sometime in “early 2024,” Vertex customers will be able to tap into Gemini Pro to run custom-designed voice and chat agents (i.e. chatbots). Gemini Pro will also become an option to drive Vertex AI’s search summarization, recommendation and answer generation features, drawing on cross-modal documents (such as PDFs and images) from different sources (such as OneDrive and Salesforce) to fulfill queries.
In AI Studio, Google’s web-based tool for app and platform developers, there are workflows for creating free-form, structured prompts and chat prompts with Gemini Pro. Developers have access to the Gemini Pro and Gemini Pro Vision endpoints, and can adjust model temperature to control the creative range of the output and provide examples to give tone and style instructions – as well as adjust security settings.
Gemini Nano
The Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is powerful enough to run directly on (some) phones rather than sending the job to a server somewhere. So far it powers two features on the Pixel 8 Pro: Summarize in Recorder and Smart Reply in Gboard.
The Recorder app, which lets users press a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations, and other excerpts. Users get these summaries even if they don’t have a signal or Wi-Fi connection available — and in a nod to privacy, no data leaves their phone in the process.
Gemini Nano is also present in Gboard, Google’s keyboard app, as a file Developer look. There, it comes into play with a feature called Smart Reply, which helps suggest the next thing you want to say when having a conversation in the messaging app. Google says that the feature initially works with the WhatsApp application only, but it will reach more applications in 2024.
Is Gemini better than OpenAI’s GPT-4?
There is no way to know how Gemini’s family is doing truly It’s not until Google releases Ultra later this year, but the company is claiming improvements to the state-of-the-art — which is typically OpenAI’s GPT-4.
Google has several times praised Gemini’s superiority in benchmarks, claiming that Gemini Ultra exceeds the current state-of-the-art results on “30 out of 32 widely used academic benchmarks used in large language model research and development.” The company says Gemini Pro, meanwhile, is more capable of performing tasks like content summarizing, brainstorming, and writing than GPT-3.5.
But leaving aside the question of whether the benchmarks really point to a better model, the results Google reports look only marginally better than the corresponding OpenAI models. As mentioned earlier, some early impressions were not great, with users and Academics Pointing out that Gemini Pro tends to get basic facts wrong, struggles with translations, and makes poor code suggestions.
How much will Gemini cost?
Gemini Pro is free to use in Bard and, at the moment, AI Studio and Vertex AI.
However, once Gemini Pro is out of preview in Vertex, the model will cost $0.0025 per character while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words), and in the case of models like the Gemini Pro Vision, per image ($0.0025).
Let’s say a 500-word article contains 2,000 characters. Summarizing this article using Gemini Pro will cost $5. while, generation An article of similar length would cost $0.1.
Where can you try Gemini?
Gemini Pro
The easiest place to try Gemini Pro is in Bard. There’s an enhanced Pro version that answers Bard text queries in English in the US right now, with additional languages and countries supported arriving in the future.
Gemini Pro is also accessible in preview in Vertex AI via API. The API is free to use “within limits” at the moment and supports 38 languages and regions including Europe, as well as features such as chat and filtering functions.
Elsewhere, Gemini Pro can be found in AI Studio. Using the service, developers can replicate Gemini-based prompts and chatbots and then obtain API keys to use in their applications — or export the code to a more featured IDE.
Duet AI for developers, Google’s suite of AI-powered helpers for code completion and generation, will begin using the Gemini model in the coming weeks. Google plans to bring Gemini Models to development tools for Chrome and Firebase mobile development platform around the same time, in early 2024.
Gemini Nano
Gemini Nano is on the Pixel 8 Pro, and will be available on other devices in the future. Interested developers can integrate the form into their Android applications subscription To take a peek.
We will keep this post updated with the latest developments.