"How Wikipedia uses bots and how bots use Wikipedia are extremely different, however. For years it has been clear that fledgling A.I. systems were being trained on the site’s articles, as part of the process whereby engineers “scrape” the web to create enormous data sets for that purpose. In the early days of these models, about a decade ago, Wikipedia represented a large percentage of the scraped data used to train machines. The encyclopedia was crucial not only because it’s free and accessible, but also because it contains a mother lode of facts and so much of its material is consistently formatted. In more recent years, as so-called Large Language Models, or L.L.M.s, increased in size and functionality — these are the models that power chatbots like ChatGPT and Google’s Bard — they began to take in far larger amounts of information. In some cases, their meals added up to well over a trillion words. The sources included not just Wikipedia but also Google’s patent database, government documents, Reddit’s Q. and A. corpus, books from online libraries and vast numbers of news articles on the web. But while Wikipedia’s contribution in terms of overall volume is shrinking — and even as tech companies have stopped disclosing what data sets go into their A.I. models — it remains one of the largest single sources for L.L.M.s. Jesse Dodge, a computer scientist at the Allen Institute for AI in Seattle, told me that Wikipedia might now make up between 3 and 5 percent of the scraped data an L.L.M. uses for its training. “Wikipedia going forward will forever be super valuable,” Dodge points out, “because it’s one of the largest well-curated data sets out there.” There is generally a link, he adds, between the quality of data a model trains on and the accuracy and coherence of its responses."
Quote Details
Added by wikiquote-import-bot
Unverified quote
0 likes
Original Language: English
Available Languages (1)
Sources
Imported from EN Wikiquote
https://en.wikiquote.org/wiki/Wikipedia
Revision History
No revisions have been submitted for this quote.
Categories
Wikipedia
221 quotes on TrueQuotesView all quotes by Wikipedia →
Related Quotes
"After a year or so of working on Nupedia, Larry had the idea to use Wiki software for a separate project specifically…"
"What if we could get everyone in the world together to record what they know in one place?"
"At present I am overworked and the [Nupedia] project is suffering to some extent as a result... I just don't have the…"
"I take a half-full-glass view, based on a different understanding of what [Wikipedia's competition is]: not the tradi…"
"It's an idea to add a little feature to Nupedia. ..."Wiki," pronounced \wee'-kee\, derives from a Polynesian word, "w…"
"We wouldn't call it "the Nupedia wiki" though that's what it would be. ... On the "wikipedia" we would say that this …"
"It was a cold Friday evening in January 2001. I was on duty in one of my uni's computer labs..."
"Hello, World!"
"...why 2 sites, or 2 encyclopedias? My impression of them is Wikipedia is the "everyman's" encyclopedia and Nupedia i…"
"Wikis don't work if people aren't bold. You've got to get out there and make those changes, correct that grammar, add…"