Wikimedia projects

252 quotes found

"How Wikipedia uses bots and how bots use Wikipedia are extremely different, however. For years it has been clear that fledgling A.I. systems were being trained on the site’s articles, as part of the process whereby engineers “scrape” the web to create enormous data sets for that purpose. In the early days of these models, about a decade ago, Wikipedia represented a large percentage of the scraped data used to train machines. The encyclopedia was crucial not only because it’s free and accessible, but also because it contains a mother lode of facts and so much of its material is consistently formatted. In more recent years, as so-called Large Language Models, or L.L.M.s, increased in size and functionality — these are the models that power chatbots like ChatGPT and Google’s Bard — they began to take in far larger amounts of information. In some cases, their meals added up to well over a trillion words. The sources included not just Wikipedia but also Google’s patent database, government documents, Reddit’s Q. and A. corpus, books from online libraries and vast numbers of news articles on the web. But while Wikipedia’s contribution in terms of overall volume is shrinking — and even as tech companies have stopped disclosing what data sets go into their A.I. models — it remains one of the largest single sources for L.L.M.s. Jesse Dodge, a computer scientist at the Allen Institute for AI in Seattle, told me that Wikipedia might now make up between 3 and 5 percent of the scraped data an L.L.M. uses for its training. “Wikipedia going forward will forever be super valuable,” Dodge points out, “because it’s one of the largest well-curated data sets out there.” There is generally a link, he adds, between the quality of data a model trains on and the accuracy and coherence of its responses."

- Wikipedia

0 likesWikimedia projects
"Within the Wikipedia community, there is a cautious sense of hope that A.I., if managed right, will help the organization improve rather than crash. Selena Deckelmann, the chief tech officer, expresses that perspective most optimistically. “What we’ve proven over 22 years now is: We have a volunteer model that is sustainable,” she told me. “I would say there are some threats to it. Is it an insurmountable threat? I don’t think so.” The longtime Wikipedia editor who wrote “Death of Wikipedia” told me that he feels there is a case to be made for a good outcome in the coming years, even if the longer term seems far less certain. The Wikimedia plug-in is the first significant move toward protecting its future. Projects are also in the works to use recent advances in A.I. internally. Albon says that he and his colleagues are in the process of adapting A.I. models that are “off the shelf” — essentially models that have been made available by researchers for anyone to freely customize — so that Wikipedia’s editors can use them for their work. One focus is to have A.I. models aid new volunteers, say, with step-by-step chatbot instructions as they begin working on new articles, a process that involves many rules and protocols and often alienates Wikipedia’s newcomers. Leila Zia, the head of research at the Wikimedia Foundation, told me that her team was likewise working on tools that could help the encyclopedia by predicting, for example, whether a new article or edit would be overruled. Or, she said, perhaps a contributor “doesn’t know how to use citations” — in that case, another tool would indicate that. I asked whether it could help Wikipedia entries maintain a neutral point of view as they were writing. “Absolutely,” she says."

- Wikipedia

0 likesWikimedia projects
"While Wikipedia’s licensing policy lets anyone tap its knowledge and text — to “reuse and remix” it however they might like — it does have several conditions. These include the requirements that users must “share alike,” meaning any information they do something with must subsequently be made readily available, and that users must give credit and attribution to Wikipedia contributors. Mixing Wikipedia’s corpus into a chatbot model that gives answers to queries without explaining the sourcing may thus violate Wikipedia’s terms of use, two people in the open-source software community told me. It is now a topic of conversation inside the Wikimedia community whether some legal recourse exists. Data providers may be able to exert other kinds of leverage as well. In April, Reddit announced that it would not make its corpus available for scraping by big tech companies without compensation. It seems very unlikely that the Wikimedia Foundation could issue the same dictum and close its sites off — an action that Nicholas Vincent has called a “data strike” — because its terms of service are more open. But the foundation could make arguments in the name of fairness and appeal to firms to pay for its A.P.I., just as Google does now. It could further insist that chatbots give Wikipedia prominent attribution and offer citations in their answers, something Selena Deckelmann told me the foundation is discussing with various firms. Vincent says that A.I. companies would be foolhardy to try to build a global encyclopedia themselves, with individual contractors. Instead, he told me, “there might be an intermediary stage here where Wikipedia says, ‘Hey, look at how important we’ve been to you.’”"

- Wikipedia

0 likesWikimedia projects
"A few times a week, Alastair Haines, a grad student at the Presbyterian Theological Centre in Sydney, sits down with a Greek version of the New Testament and translates a bit of Paul's first letter to the Corinthians. Haines doesn't speak Greek, but he can read it. When he's done, he loads his work onto a Wikipedia page as part of the Wiki Bible Project, a take-all-comers effort launched in January to create "an original, open content translation of the Bible's source texts," which by most counts includes about 30,000 manuscripts. Along with Haines, who admits to signing up for duty as a way to put off finishing his dissertation, 21 others have answered Wikipedia's call to "claim a chapter!" The eclectic group includes a liberal Christian living in the United Arab Emirates and a Methodist financial counselor in Texas. Some claim to be formally trained in Biblical Hebrew and classical Greek; others, such as user John Kloosterman, admit to being "without qualifications of any kind." The project will take a few years to complete and require constant refinement, says John Vandenberg, one of project's main administrators. But "that is part of the beauty," he writes. "It's a laissez-faire translation." But Biblical scholars see the potential for an inaccurate, bias-filled mess. "Democratization isn't necessarily good for scholarship," says Bart Ehrman, a professor of religious studies at the University of North Carolina at Chapel Hill, who worked on the most recent translation of the New Revised Standard Version in 1988. "Those were the best Greek and Hebrew scholars in the country, and it took them 20 years.""

- Wikisource

0 likesInternetWikimedia projects