"“Editing Wikipedia from a bomb shelter is difficult,” said Mykola Kozlenko, the vice president of the Wikimedia Ukraine user group. “To be honest, covering the invasion is not our main priority now. People are mainly trying to put in place their plan B, either by evacuating to a safer place, by joining the army, or by joining volunteer organizations.”"

"Having that protection there is what has allowed Wikipedia to be written by thousands of volunteer editors around the world over the last 22 years. So without the protections of Section 230, that becomes a much more difficult scenario for us."

"The U.S. Supreme Court has declined to hear a bid by the operator of the popular Wikipedia internet encyclopedia to resurrect its lawsuit against the National Security Agency challenging mass online surveillance."

"Every day, millions of people gather on Wikipedia to fight over the facts displayed. Wikipedia doesn’t just preserve the some 22.14 GB worth of text displayed on its pages, it’s also preserving years of edits and arguments about how those words were written. Every page on the site is argued over, fussed about, and tweaked constantly. All those discussions are here, and can be pored over. Running an ad-free website where millions of people gather every day to discuss facts and update scores of pages is a monumental task. It’s incredible that Wikipedia doesn’t often go down and has few technical problems. Most of the time, Wikipedia works without issue. The same is not true for X, formerly Twitter. And where does the money go? The Wikimedia Foundation is a 501(c)(3) nonprofit organization that publishes its financial records that are routinely audited by third parties. Every year, it publishes portions of this financial audit for the public. According to its 2022 report, it received about $160 million in donations. It spent $88 million of this on salaries and wages for its employees, $2.7 million on internet hosting, and about $1.2 million on travel. It’s very easy to see these reports with a cursory search. A community note on Musk’s own post says as much."

"The civic tech expert Ed Saperia used as his parable the difference between Wikipedia and Facebook. Jimmy Wales’s big experiment, which started life in 1999 as Nupedia, has created an open-source collection of human knowledge in hundreds of languages that is essentially trustworthy. If a mistake creeps in through the gates of human generosity, it gets corrected in the same way. If malicious actors try to slander their foes, the punishment is not cancellation, but more like lifelong ridicule, which is proportionate, given how long a slanderous person is likely to carry on doing ridiculous things. In other words, it is the best of humanity, all natural desire to help each other with cross-pollinated knowledge concentrated in one place. Facebook, for brevity, takes the same raw material – all the people in the world – and finds the worst in it. Facebook manages to winkle out things we didn’t know we were capable of – levels of vitriol, gullibility and hysteria – in between a scare ad for dark politics and a mesmerising video of five types of mince baked around a kilo of cheese. (I am paraphrasing a bit; I don’t think civic tech gurus dwell much on the cheese.)"

"The new A.I. chatbots have typically swallowed Wikipedia’s corpus, too. Embedded deep within their responses to queries is Wikipedia data and Wikipedia text, knowledge that has been compiled over years of painstaking work by human contributors. While estimates of its influence can vary, Wikipedia is probably the most important single source in the training of A.I. models. “Without Wikipedia, generative A.I. wouldn’t exist,” says Nicholas Vincent, who will be joining the faculty of Simon Fraser University in British Columbia this month and who has studied how Wikipedia helps support Google searches and other information businesses. Yet as bots like ChatGPT become increasingly popular and sophisticated, Vincent and some of his colleagues wonder what will happen if Wikipedia, out-flanked by A.I. that has cannibalized it, suffers from disuse and dereliction. In such a future, a “Death of Wikipedia” outcome is perhaps not so far-fetched. A computer intelligence — it might not need to be as good as Wikipedia, merely good enough — is plugged into the web and seizes the opportunity to summarize source materials and news articles instantly, the way humans now do with argument and deliberation."

"How Wikipedia uses bots and how bots use Wikipedia are extremely different, however. For years it has been clear that fledgling A.I. systems were being trained on the site’s articles, as part of the process whereby engineers “scrape” the web to create enormous data sets for that purpose. In the early days of these models, about a decade ago, Wikipedia represented a large percentage of the scraped data used to train machines. The encyclopedia was crucial not only because it’s free and accessible, but also because it contains a mother lode of facts and so much of its material is consistently formatted. In more recent years, as so-called Large Language Models, or L.L.M.s, increased in size and functionality — these are the models that power chatbots like ChatGPT and Google’s Bard — they began to take in far larger amounts of information. In some cases, their meals added up to well over a trillion words. The sources included not just Wikipedia but also Google’s patent database, government documents, Reddit’s Q. and A. corpus, books from online libraries and vast numbers of news articles on the web. But while Wikipedia’s contribution in terms of overall volume is shrinking — and even as tech companies have stopped disclosing what data sets go into their A.I. models — it remains one of the largest single sources for L.L.M.s. Jesse Dodge, a computer scientist at the Allen Institute for AI in Seattle, told me that Wikipedia might now make up between 3 and 5 percent of the scraped data an L.L.M. uses for its training. “Wikipedia going forward will forever be super valuable,” Dodge points out, “because it’s one of the largest well-curated data sets out there.” There is generally a link, he adds, between the quality of data a model trains on and the accuracy and coherence of its responses."

"Wikipedia’s fundamental goal is to spread knowledge as broadly and freely as possible, by whatever means. About 10 years ago, when site administrators focused on how Google was using Wikipedia, they were in a situation that presaged the advent of A.I. chatbots. Google’s search engine was able, at the top of its query results, to present Wikipedians’ work to users all over the world, giving the encyclopedia far greater reach than before — an apparent virtue. In 2017, three academic computer scientists, Connor McMahon, Isaac Johnson and Brent Hecht, conducted an experiment that tested how random users would react if just part of the contributions made to Google’s search results by Wikipedia were removed. The academics perceived an “extensive interdependence”: Wikipedia makes Google a “significantly better” search engine for many queries, and Wikipedia, in turn, gets most of its traffic from Google."

"Aaron Halfaker, who led the machine-learning research team at the Wikimedia Foundation for several years (and who now works for Microsoft), told me that search-engine summaries at least offer users links and citations and a way to click back to Wikipedia. The responses from large language models can resemble an information smoothie that goes down easy but contains mysterious ingredients. “The ability to generate an answer has fundamentally shifted,” he says, noting that in a ChatGPT answer there is “literally no citation, and no grounding in the literature as to where that information came from.” He contrasts it with the Google or Bing search engines: “This is different. This is way more powerful than what we had before.”"

"Wikipedia’s most devoted supporters will readily acknowledge that it has plenty of flaws. The Wikimedia Foundation estimates that its English-language site has about 40,000 active editors — meaning they make at least five edits a month to the encyclopedia. According to recent data from the Wikimedia Foundation, about 80 percent of that cohort is male, and about 75 percent of those from the United States are white, which has led to some gender and racial gaps in Wikipedia’s coverage. And lingering doubts about reliability remain. For a popular article that might have thousands of contributors, “Wikipedia is literally the most accurate form of information ever created by humans,” Amy Bruckman, a professor at the Georgia Institute of Technology, told me. But Wikipedia’s short articles can sometimes be hit or miss. “They could be total garbage,” says Bruckman, who is the author of the recent book “Should You Believe Wikipedia?”"

"Within the Wikipedia community, there is a cautious sense of hope that A.I., if managed right, will help the organization improve rather than crash. Selena Deckelmann, the chief tech officer, expresses that perspective most optimistically. “What we’ve proven over 22 years now is: We have a volunteer model that is sustainable,” she told me. “I would say there are some threats to it. Is it an insurmountable threat? I don’t think so.” The longtime Wikipedia editor who wrote “Death of Wikipedia” told me that he feels there is a case to be made for a good outcome in the coming years, even if the longer term seems far less certain. The Wikimedia plug-in is the first significant move toward protecting its future. Projects are also in the works to use recent advances in A.I. internally. Albon says that he and his colleagues are in the process of adapting A.I. models that are “off the shelf” — essentially models that have been made available by researchers for anyone to freely customize — so that Wikipedia’s editors can use them for their work. One focus is to have A.I. models aid new volunteers, say, with step-by-step chatbot instructions as they begin working on new articles, a process that involves many rules and protocols and often alienates Wikipedia’s newcomers. Leila Zia, the head of research at the Wikimedia Foundation, told me that her team was likewise working on tools that could help the encyclopedia by predicting, for example, whether a new article or edit would be overruled. Or, she said, perhaps a contributor “doesn’t know how to use citations” — in that case, another tool would indicate that. I asked whether it could help Wikipedia entries maintain a neutral point of view as they were writing. “Absolutely,” she says."

"Three years ago, in anticipation of Wikipedia’s 20th anniversary, Joseph Reagle, a professor at Northeastern University, wrote a historical essay exploring how the death of the site had been predicted again and again. Wikipedia has nevertheless found ways to adapt and endure. Reagle told me that the recent debates over A.I. recall for him the early days of Wikipedia, when its quality was unflatteringly compared to that of other encyclopedias. “It served as a proxy in this larger culture war about information and knowledge and quality and authority and legitimacy. So I take a sort of similar model to thinking about ChatGPT, which is going to improve. Just like Wikipedia is not perfect, it’s not perfect — it’s never going to be perfect — but what is the relative value given the other information that’s out there?”"

"While Wikipedia’s licensing policy lets anyone tap its knowledge and text — to “reuse and remix” it however they might like — it does have several conditions. These include the requirements that users must “share alike,” meaning any information they do something with must subsequently be made readily available, and that users must give credit and attribution to Wikipedia contributors. Mixing Wikipedia’s corpus into a chatbot model that gives answers to queries without explaining the sourcing may thus violate Wikipedia’s terms of use, two people in the open-source software community told me. It is now a topic of conversation inside the Wikimedia community whether some legal recourse exists. Data providers may be able to exert other kinds of leverage as well. In April, Reddit announced that it would not make its corpus available for scraping by big tech companies without compensation. It seems very unlikely that the Wikimedia Foundation could issue the same dictum and close its sites off — an action that Nicholas Vincent has called a “data strike” — because its terms of service are more open. But the foundation could make arguments in the name of fairness and appeal to firms to pay for its A.P.I., just as Google does now. It could further insist that chatbots give Wikipedia prominent attribution and offer citations in their answers, something Selena Deckelmann told me the foundation is discussing with various firms. Vincent says that A.I. companies would be foolhardy to try to build a global encyclopedia themselves, with individual contractors. Instead, he told me, “there might be an intermediary stage here where Wikipedia says, ‘Hey, look at how important we’ve been to you.’”"

"Without ingesting the growing millions of Wikipedia pages or vacuuming up Reddit arguments about plot twists in “The Bear,” new L.L.M.s can’t be adequately trained. In fact, no one I spoke with in the tech community seemed to know if it would even be possible to build a good A.I. model without Wikipedia."

"We love the amount of support — from the trainings to the course guides to the assistance available via email. With the use of Wikipedia, students are thinking critically about the knowledge gaps and inequities found in public information sources and resources — and they work to improve these conditions with each assignment. It's great that students get to see the impact of their work so quickly too, as the number of page views grows far faster than the number of scholars and colleagues who may otherwise read their published work."

"Wikipedia’s articles about history and religion have real-life impact on the world. What people read on Wikipedia shapes the opinions they form about politics, social justice and so forth. Therefore we need to make sure Wikipedia gets it right, and this project is going to help that goal."

"If you know how to navigate the site, Wikipedia is a uniquely transparent knowledge-sharing platform. So students get to see how the articles are developed in ways that are typically black-boxed in academia’s peer-review process or in what happens in the office of news media organizations. This makes it a great learning opportunity for identifying how bias can shape Wikipedia content, and for practicing how to intervene in those processes."

"I'm a passionate believer in what Wikipedia does. I tell people over and over again, Wikipedia is where facts are going to live. Wikipedia is a critically important platform, because it's built on lifting up reliable sources, something which is getting harder and harder to find. Wikimedia New York City is a great resource for the New York City community to both learn about Wikipedia and learning how to contribute to Wikipedia. It's one of the places where anyone can access free knowledge that's trustworthy. And that's increasingly rare in today's world."

"When you choose to become an editor, it’s because you’re passionate about an issue or you’re passionate about making sure that knowledge exists and it’s free for people to use, you don’t get paid to do this, and you didn’t sign up to be attacked."

"Ex Oriente Lux. (Aus dem Osten kommt das Licht.)"

"Ein Bild sagt mehr als tausend Worte."

Wikipedia

Languages

Top Categories

Timeline

Related Authors

All Quotes by This Author