[1/4]Meta CEO Mark Zucker gives a speech while the letters “AI”, meaning artificial intelligence, are displayed on the screen at the MetaConnect event held at the company’s headquarters in Menlo Park, California, USA on September 27, 2023. Mr. Berg.Reuters/Carlos Barria/File photo Obtaining license rights
MENLO PARK, Calif., Sept. 28 (Reuters) – Meta Platforms (META.O) used public Facebook and Instagram posts to train some of its new Meta AI virtual assistants, but Private posts shared only with friends were excluded. Regarding consumer privacy, the company’s policy director said in an interview with Reuters.
Meta also no longer uses private chats on messaging services as training data for its models and has taken steps to filter personal details from public datasets used for training, said Nick, Meta’s president of global affairs. Clegg spoke on the sidelines of the company’s annual Connect conference. this week.
“We have tried to exclude datasets that are predominantly private,” Clegg said, adding that “the vast majority” of the data Meta used for training was publicly available.
He cited LinkedIn as an example of a website whose content Meta intentionally chose not to use due to privacy concerns.
Clegg’s comments come as tech companies such as Meta, OpenAI and Alphabet’s Google (GOOGL.O) collect data from the internet to train AI models that ingest large amounts of data to summarize information and generate images. This was done amid criticism that the information was used without permission. .
While both companies face lawsuits from authors accusing them of copyright infringement, they are also concerned about how much private or copyrighted material their AI systems may reproduce, which is siphoned off in the process. We are considering how to handle it.
Meta AI was the most important product in the company’s first consumer AI tools, announced by CEO Mark Zuckerberg at Meta’s annual Connect product conference on Wednesday. Unlike past conferences that focused on augmented reality and virtual reality, this year’s event was dominated by topics related to artificial intelligence.
Meta created its assistant using a custom model based on the powerful Llama 2 large language model, which the company released for general commercial use in July, and a new model called Emu that generates images in response to text prompts. He said he did.
The product can generate text, audio and images, and will provide access to real-time information through a partnership with Microsoft’s (MSFT.O) Bing search engine.
Clegg said the public Facebook and Instagram posts used to train MetaAI included both text and photos.
These posts were used to train the emulator on the product’s image generation elements, and the chat functionality was based on Llama 2, with the addition of several publicly available annotated datasets, a Meta spokesperson said. told Reuters.
According to a spokesperson, collaboration with MetaAI may also be used to improve functionality in the future.
Clegg said Meta imposes safety restrictions on the content Meta AI tools can produce, such as prohibiting the creation of photorealistic images of public figures.
Regarding copyrighted material, Mr. Clegg said the question of whether it is “subject to existing fair use doctrine” which allows for limited use of protected works for purposes such as commentary and research. He said he expected there to be a “substantial amount of litigation.” And a parody.
“We think so, but we highly doubt it will lead to litigation,” Clegg said.
Some companies with image generation tools make it easy to recreate iconic characters like Mickey Mouse, while others offer the option of paying for materials or including those characters in training data. Some companies intentionally avoid this.
For example, OpenAI signed a six-year deal this summer with content provider Shutterstock to use its image, video, and music library for training.
When asked whether Meta took such steps to avoid copying copyrighted images, a Meta spokesperson said that users create content that violates privacy and intellectual property rights. He pointed out the new terms and conditions that prohibit doing so.
Reporting by Katie Paul in Menlo Park, California.Editing: Kenneth Lee, Matthew Lewis, Lincoln Feast
Our standards: Thomson Reuters Trust Principles.