The conflict between open source software and proprietary software is well understood. But tensions that have permeated software circles for decades have spilled over into the burgeoning field of artificial intelligence, with a controversial pursuit.
New York Times recently Posted a gushing review Meta CEO Mark Zuckerberg, noting how his embrace of “open source AI” has made him popular again in Silicon Valley. But the problem is that Meta’s large Llama-branded language models aren’t really open source.
Or are they?
By most estimates, they are not. But he highlights that the idea of “open source AI” will only generate more controversy in the coming years. This is something the Open Source Initiative did (OSI) is trying to address it under the leadership of the CEO Stefano Mavolli (pictured above), which has been working on this problem for more than two years through a global effort that includes conferences, workshops, panels, webinars, reports, and more.
AI is not software code
OSI was a supervisor Definition of open source (OSD) has been around for over a quarter of a century, demonstrating how the term “open source” can, or should, be applied to software. A license that meets this definition can legally be considered “open source”, even though it acknowledges “open source”. Scope of licenses They range from very permissible to completely impermissible.
But transferring legacy licensing and naming conventions from software to AI is problematic. Joseph Jacksopen source evangelist and VC firm founder OSS CapitalHe goes so far as to say that there isThere is no such thing as open source AI“, noting that “open source was invented explicitly for software source code.”
In contrast, “neural network weights” (NNWs) — a term used in the AI world to describe the parameters or parameters by which a network learns during the training process — cannot be compared in any meaningful way to software.
“Neural net weights are not software source code; they are not human-readable, nor are they patchable,” Jack points out. “Furthermore, the fundamental rights of open source also do not translate to nuclear weapons in any identical way.”
This led to fellow Jacks and OSS Capital Heather Maker to Come up with their own definition of sortsAbout the concept of “open weights”.
So, before we get to a meaningful definition of “open source AI,” we can already see some of the tensions inherent in trying to reach that goal. How can we agree on a definition if we cannot agree that the “thing” we define exists?
For what it’s worth, Mafuli agrees.
“The point is valid,” he told TechCrunch. “One of the initial discussions we had was whether to call it open source AI at all, but everyone was already using that term.”
This reflects some of the challenges in the broader field of artificial intelligence, where debates abound about whether or not the thing we call “artificial intelligence” today? It really is artificial intelligence Or just powerful systems taught to detect patterns among vast swaths of data. But naysayers mostly resign themselves to the fact that the term “artificial intelligence” is here, and there is no point in fighting it.
![Llama illustration](https://techcrunch.com/wp-content/uploads/2024/06/GettyImages-959993436-e1718640411389.jpg?w=680)
Founded in 1998, OSI is a non-profit public benefit corporation engaged in a myriad of open source-related activities around advocacy, education, and its primary raison d’être: defining open source. Today, the organization relies on sponsorship for funding, with respected members such as Amazon, Google, Microsoft, Cisco, Intel, Salesforce, and Meta.
Meta’s involvement with OSI is particularly notable right now as it relates to the concept of “open source AI.” Although Meta hangs her hat to artificial intelligence On the wedge is open sourcethe company has notable limitations regarding how its Llama models can be used: Sure, they can be used for free in cases of research and commercial use, but app developers with more than 700 million monthly users must request a special license from Meta, which will be granted purely in accordance with For its own discretion.
Simply put, the meta bros of Big Tech can blow the whistle if they want to join.
The Meta language around LLMs is fairly flexible. While the company contacted her Lama Model 2 is open sourceWith the arrival of Llama 3 in April, I took a step back from the terminology. Using phrases Such as “openly available” and “openly available” instead. But in some places, it is Still referring to The model is considered “open source”.
“Everyone involved in the conversation is in complete agreement that Llama itself cannot be considered open source,” Mafuli said. “People I’ve talked to who work in Meta know that it’s a bit exaggerated.”
Furthermore, some might argue that there is a conflict of interest here: a company that has shown a desire to leverage the open source brand is also providing financial resources to the “meta” maintainers?
This is one of the reasons why OSI has been trying to diversify its funding, recently receiving a grant from Sloan Foundation, which helps fund its global multi-stakeholder campaign to reach the definition of open source AI. TechCrunch can reveal the value of this grant is around $250,000, and Maffoli hopes this will change the perception of its reliance on corporate funding.
“That’s one of the things that the Sloan grant made most clear: We can say goodbye to Meta money at any time,” Maffoli said. “We can do this even before the Sloan Grant, because I know we’ll get donations from others. And Meta knows that very well. They’re not involved in any of this.” [process]Neither does Microsoft, GitHub, Amazon, or Google – they know full well that they cannot intervene, because the structure of the organization does not allow it.
The working definition of open source artificial intelligence
![Illustration of the concept depicting finding a definition](https://techcrunch.com/wp-content/uploads/2024/06/GettyImages-1383744480-e1718635120202.jpg?w=680)
The current draft definition of open source AI is located at Version 0.0.8It consists of three basic parts: the “preamble,” which defines the terms of reference of the document; The definition of open source AI itself; A checklist that runs through the components required for a compatible open source AI system.
According to the current draft, an open source AI system should grant the freedoms to use the system for any purpose without permission; To allow others to study how the system works and examine its components; Modify and share the system for any purpose.
But one of the biggest challenges was around data – i.e. could an AI system be classified as “open source” if the company didn’t make the training data set available for others to benefit from? According to Mafoli, it’s important to know where the data comes from, and how the developer has sorted, deduplicated and filtered the data. And also access to the code that was used to compile the dataset from its various sources.
“Knowing that information is much better than having the simple data set and not the rest of the data,” Mafoli said.
While access to the full data set would be nice (OSI makes this an “optional” component), Mafoli says that is not possible or practical in many cases. This may be because there is confidential or copyrighted information embedded in the dataset and the developer does not have permission to redistribute it. Furthermore, there are techniques for training machine learning models where the data itself is not actually shared with the system, using techniques such as federated learning, differential privacy, and homomorphic cryptography.
This perfectly highlights the fundamental differences between “open source software” and “open source AI”: the intentions may be similar, but they are not comparable, and this contrast is what the OSI is trying to capture in its report. identification.
In software, source code and binary code are two views of the same artifact: they reflect the same program in different forms. But training datasets and subsequent trained models are different things: you can take the same dataset, and you won’t necessarily be able to recreate the same model consistently.
“There is a variety of statistical and random logic that occurs during training which means it cannot be replicated in the same way as software,” Mafoli added.
So an open source AI system should be easy to replicate, with clear instructions. This is where the open source AI definition checklist comes into play, which is based on a A recently published academic paper It’s called the “Openness Model Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in AI.”
This paper proposes the Model Openness Framework (MOF), a classification system that classifies machine learning models “based on their completeness and openness.” The Ministry of Finance requires that “specific components of AI model development be included and released under appropriate open licences,” including training methodologies and details on model parameters.
steady state
![Stefano Mavolli gives a presentation at the Digital Public Goods Alliance (DPGA) Member Summit in Addis Ababa](https://techcrunch.com/wp-content/uploads/2024/06/Stef-OSAID-e1718873379686.jpg?w=680)
OSI calls the official launch of the definition a “stable release,” much as the company does for an application that has undergone extensive testing and patches before primetime. OSI intentionally does not call it a “final release” because parts of it will likely evolve.
“We can’t expect this definition to last 26 years like the open source definition,” Mafoli said. “I don’t expect the top part of the definition – like ‘What is an AI system?’ – to change much. But the parts that we refer to in the checklist, those lists of components, depend on the technology? Tomorrow, who knows what the technology will look like.”
It is expected that the fixed definition of open source artificial intelligence will be approved by the Board of Directors of Everything is open conference At the end of October, with OSI embarking on a global roadshow in the intervening months spanning five continents, seeking more “diverse input” on how to define “open source AI” moving forward. But any final changes will likely be nothing more than “small tweaks” here and there.
“This is the last stretch,” Mafuli said. “We have arrived at a complete version of the definition; we have all the elements we need. Now we have a checklist, so we are checking that there are no surprises in it; there are no systems that should be included or excluded.”