The leaked audio shows Microsoft’s carefully selected samples to make the AI appear to be working.

“It wasn’t so easy to get a good answer.”

choose and decide

After frequently “hallucinating” erroneous responses, Microsoft says it has “handpicked” examples of its generative AI’s output, business insider report.

Contents

“It wasn’t so easy to get a good answer.”choose and decide harushi nation cherry on top

The scoop comes from leaked audio of an internal presentation about an early version of Microsoft’s Security Copilot, a ChatGPT-like AI tool designed to help cybersecurity professionals.

according to BIthe audio includes Microsoft researchers discussing the results of the “Threat Hunter” test, which used AI to analyze Windows security logs for potential malicious activity.

“I had to be a little more selective to get examples that seemed good. That’s because it can deviate, and because it’s a probabilistic model, you can ask the same question and get different answers. ,” said Lloyd Greenwald of Microsoft Security. Partner giving the presentation (citer) BI.

“It wasn’t that easy to get a good answer,” he added.

harushi nation

Functioning like a chatbot — enter your queries in a chat window and get answers in the style of a customer service representative — Security Copilot is primarily built on OpenAI’s GPT-4 large language model. , which also underpins Microsoft’s other generative AI efforts. Bing Search Assistant. Greenwald said Microsoft had early access to his GPT-4, and these demos were an “initial exploration” of the technology’s capabilities.

Similar to Bing AI, which returned responses so insane that they required “lobotomy” in the early stages of release, researchers say Security Copilot frequently “hallucinated” incorrect responses during early iterations. It is said that technology.

“Hallucinations are a big problem in LLM, and at Microsoft we do a lot of work to eliminate hallucinations, and part of that is grounding them in real data,” Greenwald said in the audio. Contains all data. ”

In other words, the LLM that Microsoft used to build Security Copilot, GPT-4, was not trained on cybersecurity-specific data at the time. Instead, it was used out of the box, relying solely on standard (but still vast) common datasets.

cherry on top

Greenwald shared another set of security questions and clarified: “This is just what we demonstrated to the government.”

It’s unclear whether Microsoft used these “selected” examples in presentations to governments or other potential customers, or whether the company’s researchers were so forthcoming about how the examples were chosen. It is.

A Microsoft spokesperson said: BI “The technology discussed at the meeting was exploratory work that predates Security Copilot and was tested with simulations created from public datasets for model evaluation,” it said, adding that “no customer data was used. ” he added.

Learn more about AI: Microsoft packs talking generative AI into your car