Google considers ‘Project Elman’, uses Gemini AI to tell life stories

‘Project Elman’ is an internal Google proposal to use AI to give users a ‘bird’s eye view’ of their life stories.
The idea is to use an LLM like Gemini to ingest search results, identify patterns in users’ photos, and create a chatbot to “answer previously impossible questions” about a person’s life. Thing.
The team also demoed “Ellmann Chat” with the explanation, “Imagine opening ChatGPT, but it already knows everything about your life.”

Google’s team proposed using AI technology to get a “bird’s eye view” into users’ lives using mobile phone data such as photos and searches.

The idea, named “Project Elman” after biographer and literary critic Richard David Elman, uses a Gemini-like LLM to ingest search results, find patterns in users’ photos, and then The idea is to create a bot to “answer questions that were previously impossible.” A copy of the presentation viewed by CNBC. Ellman’s goal is to become “your life storyteller,” the book says.

It’s unclear whether the company plans to bring these features within Google Photos or other products.According to one company, Google Photos has more than 1 billion users and 4 trillion photos and videos of him. blog post.

Project Ellman is just one of the many ways Google proposes using AI technology to create or improve its products. On Wednesday, Google announced Gemini, its latest and “most capable” advanced AI model. This model outperformed OpenAI’s GPT-4 in some cases. The company plans to license Gemini to various customers through Google Cloud, allowing them to use it in their own applications. One of Gemini’s distinguishing features is that it is multimodal. This means it can process and understand information beyond text, such as images, video, and audio.

Google Photos product managers introduced Project Ellman with the Gemini team at a recent internal summit, according to documents seen by CNBC. They write that the team has spent the past few months determining that large-scale language models are the ideal technology to make this bird’s-eye approach to a person’s life story a reality. .

Ellman said in his presentation that biographies, past moments, and subsequent photos can be used to derive context and explain a user’s photo more deeply than “just a pixel with a label and metadata.” It suggests being able to pinpoint a series of moments, such as your college years, your Bay Area years, or your time as a parent.

“You can’t answer difficult questions or tell a good story if you can’t see your life in perspective,” one caption reads, accompanied by a photo of a small boy playing with a dog in the dirt. It is being

“We comb through your photos, examining their tags and locations to identify meaningful moments,” a presentation slide reads. “When we step back and understand your life as a whole, your whole story becomes clear.”

The presentation said large-scale language models could potentially infer moments like the birth of a user’s child. “This LLM of his can use knowledge from higher up the tree to deduce that this is the birth of Jack and that he is James and Gemma’s first and only child.”

“One of the reasons LLM is so powerful in this bird’s-eye approach is that it takes unstructured context from all the different elevations across this tree and uses that to understand other areas of the tree. The slides show illustrations of different ‘moments’ and ‘chapters’ in the user’s life.

The presenter gave another example of determining that a user recently went to a class reunion. “It’s probably 10 years since he graduated and he’s full of faces he hasn’t seen in 10 years, so it’s probably a reunion,” the team speculated in their presentation.

The team also created a new version of “Ellmann Chat” with the explanation, “Imagine opening ChatGPT, but it already knows everything about your life. What do you ask it?” We also held a demonstration.

A sample chat was shown in which a user asked, “Do you have a pet?” In response, the user replied that she has a dog wearing a red raincoat, and offered the dog’s name and the names of her two family members who often hang out with her.

Another example of a chat is where a user asked when their sibling last visited. Another said he was thinking of moving and asked him to list towns similar to where he lives. Mr. Ellman answered both.

Elman also presented an overview of users’ eating habits shown in other slides. “You seem to like Italian food. There are several photos of pasta dishes, and there’s also a photo of pizza.” Also, one of the photos includes a menu that includes dishes I don’t recognize. He said that users seemed to be enjoying the new food.

The technology also determined what products users were considering purchasing, their interests, work, and travel plans based on their screenshots, the presentation said. He also suggested that users can learn about their favorite websites and apps, citing Google Docs, Reddit, and Instagram as examples.

“Google Photos has always leveraged AI to help people find photos and videos, and we’re excited about the potential for LLM to unlock even more helpful experiences,” a Google spokesperson told CNBC. “This is a brainstorming concept that the team is in the early stages of.” As always, we will take the time necessary to work responsibly, with the privacy of our users as our top priority. ”

The proposed Project Elman could help Google create more personalized memories of life in an arms race among tech giants.

Google Photos and Apple Photos have been providing “memories” for years, generating albums based on photo trends.

Google in November announced With the help of AI, Google Photos can now group similar photos and organize your screenshots into easy-to-find albums.

Apple announced in June that its latest software update would include the ability for its Photos app to recognize people, dogs, and cats in photos.That’s already organize your face Users will be able to search by name.

Apple also announced its upcoming Journal app. It uses on-device AI to create personalized suggestions and prompts users to write sentences describing memories and experiences based on recent photos, locations, music, and workouts.

But Apple, Google, and other tech giants are still grappling with the complex challenge of properly displaying and identifying images.

For example, Apple and Google continue to avoid labeling gorillas after reports in 2015 that the companies were found to have incorrectly labeled black people as gorillas.new york times investigation This year, it was revealed that Apple and Google’s Android software, which powers most of the world’s smartphones, had turned off the ability to visually search for primates for fear of classifying people as animals.

Over time, companies like Google, Facebook, and Apple have added controls to minimize unwanted memories, but users sometimes complain that still on the surface Users need to toggle some settings to minimize unwanted memories.