Join leaders in San Francisco on January 10 for an exclusive evening of networking, insight, and conversation. Request an invitation here.
Catching weak signals across endpoints and predicting potential intrusion attempt patterns is the perfect challenge for large-scale language models (LLMs) to tackle. The goal is to fine-tune the LLM and models while mining attack data to find new threat patterns and correlations.
Major endpoint detection and response (EDR) and extended detection and response (XDR) vendors are addressing this challenge.Nikesh Arora palo alto networks Chairman and CEO said: “We collect the most endpoint data from XDR in the industry. We collect approximately 200 megabytes per endpoint, which is often 10 to 20 times more than most participants in the industry. Why do we do that? We take that raw data and apply automated attack surface management using XDR to correlate and harden most firewalls. ”
cloud strike Co-founder and CEO George Kurtz told a keynote audience at the company’s annual Fal.Con event last year. You can then link these to find new detections. We’re now extending that to third-party partners so they can look at other weak signals across the domain, not just endpoints, and devise new detection methods. ”
XDR has proven successful in reducing noise and providing a better signal. Major XDR platform providers include Broadcom, Cisco, CrowdStrike, Fortinet, Microsoft, Palo Alto Networks, SentinelOne, Sophos, TEHTRIS, Trend Micro, and VMWare.
VB event
AI Impact Tour
Access to the AI Governance Blueprint – Request an invitation to the January 10th event.
learn more
Why LLM is the new DNA for endpoint security
Augmenting LLM with telemetry and human annotated data defines the future of endpoint security. In Gartner’s latest Hype Cycle for Endpoint Security, the authors state: write“Endpoint security innovations are focused on faster, automated threat detection, prevention, and remediation, and are powered by integrated Extended Detection and Response (XDR) for endpoint, network, web, and email security. , and correlate data points and telemetry from your identity solution.”
Spending on EDR and XDR is growing faster than the broader information security and risk management market. This has led to a higher level of increased competition between EDR and XDR vendors. gartner forecasts the endpoint protection platform market to grow from the current $14.45 billion to $26.95 billion by 2027, at a compound annual growth rate (CAGR) of 16.8%. The global information security and risk management market is projected to grow from $164 billion in 2022 to $287 billion in 2027, achieving a CAGR of 11%.
CrowdStrikes CTO on how LLM strengthens cybersecurity
VentureBeat recently spoke (virtually) with: Elia Zaitsev, CrowdStrike’s CTO, understands why training LLMs with endpoint data improves cybersecurity. His insights reflect how LLM is rapidly becoming the new DNA of endpoint security.
VentureBeat: What made you start looking at endpoint telemetry data as a source of insights that you could ultimately use to train your LLM?
Elia Zaitsev: “So when we started the company, one of the reasons we set it up as a cloud-native company was because we wanted to use AI and ML technology to solve difficult problems for our customers. Because if you think about legacy technology, everything was happening at the edge. All the decisions were made by the user and all the data lived at the edge, but the idea that if you want to use AI technology you have to use older ML-type solutions, especially those that are still in development. there was. , very effective. We need that much information, and to get it we need to use cloud technology to capture all of that information.
You can train these powerful classifiers in the cloud and deploy them to the edge. So train in the cloud, deploy to the edge, and make smarter decisions. But what’s interesting is that this is happening now that generative AI is on the rise, even though they are different technologies. Rather than deciding what’s good and what’s bad, these focus on empowering humans, including embracing and accelerating workflows. ”
VentureBeat: What do you think about LLMs and Gen AI tools replacing cybersecurity professionals?
Zaitsev: “It’s not about replacing humans, it’s about enhancing humans. It’s AI-assisted humans. I think this is a very important concept. I think too many people are involved in technology. I say, all the focus should be on technology, and sometimes the focus is on it, and I’ve always wanted it to replace humans, and I think that’s very misplaced, especially in the cyber space. But when you think about the underlying technology, how AI works, it’s actually not necessarily about quantity. It’s more about quality. To create these models in the first place, you need a lot of You need the data, but the time comes when you actually teach them to do something specific, which is important if you want to move from a general model where you can speak English or any other language you want to teach. Sometimes you want to do so-called fine-tuning, how you summarize incidents for security analysts, how you interact with the platform, those are the kinds of things that our generative product, Charlotte AI, does.”
VentureBeat: How automation technologies like LLM impact the role of humans in cybersecurity, especially in the context of the use of AI by adversaries and the ongoing arms race in cyber threats. Can you talk about that?
Zaitsev: “Most of these automation technologies, whether it’s LLM or anything like that, don’t really replace humans. It tends to take up valuable time so that you can focus on more difficult things. Usually people start asking what will happen to the enemy using AI. And for me, it’s a very simple conversation. Typical In an arms race, adversaries use AI and other technologies to automate baseline-level threats. Great. We use AI to counter this. So, on balance, What’s left? We still have very savvy, smart human attackers who can cut through the noise, so we’re going to continue to need really smart, smart defenders.”
VentureBeat: What was the most valuable lesson you learned from training LLM using telemetry data?
Zaitsev: “When building LLMs, it’s actually easier to train many smaller LLMs based on these specific use cases. Take the Overwatch dataset that Falcon completed, for example. [threat] Intel dataset. In practice, it’s easier and less hallucinatory to use dedicated small, large-scale language models. Or you could call it a small-scale language model.
Rather than trying to make these big monolithic things like a jack-of-all-trades, working on smaller, purpose-built things allows you to actually tune them for greater accuracy and less hallucinations. So what we use is a concept called expert blending. In fact, specialization often increases your effectiveness when using these LLM technologies. It’s two really purpose-built LLMs working together versus trying to get one really smart LLM that doesn’t really do anything particularly well. Many things go wrong, but certain things go particularly well.
It also applies validation. Let LLM do some processing, but then also check the output. Used for platform operation. Ultimately, we rely on responses based on telemetry on the platform API to place some level of trust in the underlying data. It doesn’t just come out of the ether, out of the LLM brain, so to speak, right? It is rooted in a foundation of truth.
VentureBeat: Can you elaborate on the importance and role of human expert teams in the development and training of AI systems, especially given your company’s long-term approach towards AI-assisted rather than human tasks being replaced by AI? mosquito?”
Zaitsev: When you start these types of use cases, you don’t need millions, billions, or trillions of samples. What you actually need is often thousands or even tens of thousands of samples, but they should be of very high quality and ideally what we call human-annotated datasets. . Basically, he wants the expert to say to the AI system, “This is what I would do, learn from my example.” So, I’m not going to take any credit for myself, but I’m not going to say that I knew 11, 12 years ago that there was going to be a generative AI boom, but we’re going to see AI not replace humans. We’ve always believed passionately in this idea of helping humans, and that’s why we established all these experts on the Human Team from day one.
After all, we’ve uniquely invested in human capabilities in so many ways and built such high-quality platform data that is annotated by humans that we now suddenly have this goldmine, yes, this very treasure. I was able to get a mountain of. The right kind of information needed to create these generative AI large-scale language models is specifically fine-tuned for cybersecurity use cases on our platform. So there’s a little bit of luck there.
VentureBeat: How are the advances in LLM training reflected in current and future products?
Zaitsev: Our approach is to use the old adage, all you have is a hammer and everything looks like a nail, right? And this doesn’t just apply to AI technology. This is how you approach the data storage layer. We have always supported this concept of using all technologies. Because if you weren’t constrained to use one thing, you wouldn’t need to use it. In other words, Charlotte is a multimodal system. We use multiple LLMs, but we also use non-LLM technologies. LLMs are good at following instructions. They’re going to take a natural language interface and turn it into structured tasks.
VentureBeat: Are your LLMs trained on customer data or vulnerability data?
Zaitsev: The output that users see from Charlotte is almost always based on some kind of platform data. For example, vulnerability information from Spotlight products. We might take that data and tell Charlotte to summarize it for laymen. Again, what LLM is good at could potentially be trained from internal data. By the way, this is not customer-specific. This is general information about the vulnerability and how it addresses the privacy aspects. No customer-specific data is trained on Charlotte, but general knowledge about vulnerabilities. Customer-specific data is powered by the platform. That’s how we maintain the separation of church and state, so to speak. Private data resides on the Falcon platform. LLMs receive general cybersecurity training and maintain general cybersecurity knowledge. Under no circumstances should you ever expose your naked LLM to end users so that verification can be applied.
VentureBeat’s mission will be a digital town square for technical decision makers to gain knowledge and transact on transformative enterprise technologies. Please see the briefing.