Kolena, a startup building tools for testing AI models, raises $15 million

Colinaa startup building tools for testing, measuring, and validating the performance of AI models, today announced that it has raised $15 million in a funding round led by Lobby Capital with participation from SignalFire and Bloomberg Beta.

The new funds bring the total amount raised by Kolena to $21 million, which will be allocated to growing the company’s research team, partnering with regulatory agencies and expanding Kolena’s sales and marketing efforts, Mohamed El Gendy, co-founder and CEO, told TechCrunch in an email interview.

“The use cases for AI are huge, but AI lacks the trust of both developers and the public,” Elgindy said. “This technology needs to be deployed in a way that makes digital experiences better, not worse. The genie won’t go back into the bottle, but as an industry we can make sure we get the right desires.”

Soldier launched Kolena in 2021 with Andrew Shi and Gordon Hart, with whom he worked for about six years in AI departments within companies including Amazon, Palantir, Rakuten, and Synapse. Through Kolena, the trio sought to build a “model quality framework” that delivers unit testing and end-to-end model testing in a customizable, enterprise-friendly package.

“First and foremost, we wanted to provide a new framework for model quality – not just a tool to simplify existing methods,” Elgindy said. “Kolena enables continuous scenario or unit testing. It also provides end-to-end testing of the entire AI and ML product, not just the sub-components.

To that end, Collina can provide insights to identify gaps in AI model test data coverage, Elgindy says. The platform includes risk management features that help track the risks associated with deploying a particular AI system (or systems, as the case may be). Using Kolena’s user interface, users can create test cases to evaluate model performance and find out possible reasons for poor model performance while comparing its performance to several other models.

“With Kolena, teams can manage and run tests for specific scenarios that the AI product will have to handle, rather than applying a blanket ‘aggregate’ metric such as an accuracy score, which can obscure the details of a model’s performance,” Elgindy said. . “For example, a model that is 95% accurate in detecting cars is not necessarily better than one that is 89% accurate. Each has its own strengths and weaknesses – for example detecting cars in different weather conditions or levels of occlusion, and determining the direction Car, etc.

If Kolena works as advertised, it could actually be useful to data scientists who spend a lot of time building models to run AI applications.

Image credits: Colina

According to one reconnaissanceAI engineers report that they devote only 20% of their time to analyzing and developing models, with the rest going to sourcing and cleaning the data used to train them. last a report It found that because of the challenges of developing accurate performance models, only about 54% of models eventually move from pilot to production.

But there are other players building tools to test, monitor, and validate models. Beyond established companies like Amazon, Google, and Microsoft, a host of startups are experimenting with new ways to measure the accuracy of models before and after they enter production.

Prolific recently raised $32 million for its platform for training and testing AI models using a network of testers. Meanwhile, Robust Intelligence and Deepchecks are creating their own toolkits for companies to prevent AI models from failing — and to continually validate them. Bobidi rewards developers who test companies’ AI models.

But Elgindy says Colina’s platform is one of the few that gives customers “complete control” over the data types, evaluation logic, and other components that make up AI model testing. It also underscores Collina’s approach to privacy, which eliminates the need for customers to upload their data or forms to the platform; Kolena only stores model test results in order to measure future performance, which can be deleted upon request.

“Minimizing risks from AI and ML systems requires rigorous testing before deployment, yet organizations do not have robust tools or processes around model validation,” Elgindi said. Proof of machine learning of concepts. Kolena focuses on comprehensive and accurate model evaluation. We give machine learning managers, product managers, and executives unparalleled visibility into model testing coverage and product-specific functional requirements, allowing them to effectively influence product quality from the start.

San Francisco-based Kolena, which has 28 full-time employees, would not share how many clients it currently works with. But Elgindy said the company is taking an “eclectic approach” to partnering with “mission-critical” companies for now, and plans to roll out team packages to mid-sized enterprises and early-stage AI startups in the second quarter of 2024.