{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibeti6yvhfrxigw6t5cdhtuj5r2s36tf3bvq7n6obgjf36beil3yi",
    "uri": "at://did:plc:46ti67tc37qcmwp2vaynk6fq/app.bsky.feed.post/3mhpy4lzlej22"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreie4eqa4vfa22vmlx4t3pey7avydzuz323ca7amvrrgjba7ygm26gm"
    },
    "mimeType": "image/png",
    "size": 72614
  },
  "path": "/copyrighteous/how-taboo-shapes-knowledge-production-on-wikipedia",
  "publishedAt": "2026-03-23T11:41:48.256Z",
  "site": "https://mako.cc",
  "tags": [
    "a previously published post",
    "Kaylea Champion",
    "the Community Data Science Blog",
    "Clitoris",
    "Menstration",
    "Cell membrance",
    "Philip Pullman",
    "https://doi.org/10.1145/3610090",
    "https://doi.org/10.1145/3687044",
    "replication materials for the paper",
    "Benjamin Mako Hill"
  ],
  "textContent": "_**Note:** I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface them for folks who missed them, I will periodically (re) publish blog posts about some “older” published projects. This post draws material from a previously published post by Kaylea Champion on the Community Data Science Blog._\n\nTaboo subjects—such as sexuality and mental health—are as important to discuss as they are difficult to raise in conversation. Although many people turn to online resources for information on taboo subjects, censorship and low-quality information are common in search results. In two papers I recently published at CSCW—both led by Kaylea Champion—we presented a series of analyses showing how taboo shapes the process of collaborative knowledge building on English Wikipedia.\n\nThe first study is a quantitative analysis showing that articles on taboo subjects are much more popular and are the subject of more vandalism than articles on non-taboo topics. In surprising news, we also found that they were edited more often and were of higher quality!\n\nShort video of Kaylea’s presentation of the work given at Wikimania in August 2023.\n\nThe first challenge we faced in conducting this work was identifying taboo articles. Kaylea had a brilliant idea for a new computational approach to doing so without relying on our individual intuitions about what qualifies as taboo (something we understood would be highly specific to our own culture, class, etc). Her approach was to make use of an insight from linguistics: _people develop euphemisms as ways to talk about taboos_ (i.e., think about all the euphemisms we’ve devised for death, or sex, or menstruation, or mental health).\n\nWe used this insight to build a new machine-learning classifier based on English Wiktionary definitions. If a ‘sense’ of a word was tagged as euphemistic, we treated the words in the definition as indicators of taboo. The end result was a series of words and phrases that most powerfully differentiate taboo from non-taboo. We then did a simple match between those words and phrases and the titles of Wikipedia articles. The topics were taboo enough that we were a little uncomfortable discussing them in our meetings! We built a comparison sample of articles whose titles are words that, like our taboo articles, appear in Wiktionary definitions.\n\nIn the first paper, we used this new dataset to test a series of hypotheses about how taboo shapes collaborative production in Wikipedia. Our initial hypotheses were based on the idea that taboo information is often in high demand but that Wikipedians might be reluctant to associate their names (or usernames) with taboo topics. The result, we argued, would be articles that were in high demand but of low quality.\n\nWe found that taboo articles are thriving on Wikipedia! In summary, we found that in comparison to non-taboo articles:\n\n  * Taboo articles are more popular (_as expected_).\n  * Taboo articles receive more contributions (_contrary to expectations_).\n  * Taboo articles receive more low-quality contributions (_as expected_).\n  * Taboo articles are higher quality (_contrary to expectations_).\n  * Taboo article contributors are more likely to contribute without an account (_as expected_), and have less experience (_as expected_), but that accountholders are more likely to make themselves more identifiable by having a user page, disclosing their gender, and making themselves emailable (_all three of these are contrary to expectation_!).\n\nImage of the estimated qualiy of articles of the four articles in the second mixed-methods paper. Extreme dips reflect periods of frequent vandalism.\n\nKaylea attempted to understand these somewhat confusing results by designing a fantastic mixed-methods analysis that sought to unpack some of the nuance missing in the quantitative analysis by delving deep into the “life histories” of four articles on English Wikipedia: two on taboo topics related to women’s anatomy (Clitoris and Menstration) and two nontaboo articles chosen for comparison (Cell membrance and Philip Pullman).\n\nAlthough the findings from the analysis can be difficult to summarize succinctly (as with many qualitative studies), we showed how the taboo example articles’ success was hard-won amid real challenges and attacks. The paper describes how challenges were overcome through resilient leadership, often provided by a single dedicated individual. The paper provides a template for how taboo can be—and frequently is—overcome by dedicated Wikipedians in ways that provide useful knowledge resources in real demand.\n\nFor more details, visualizations, statistics, and more, we hope you’ll take a look at our papers, both linked below.\n\n* * *\n\nThe full citation for the papers are: (1) Champion, Kaylea, and Benjamin Mako Hill. 2023. “Taboo and Collaborative Knowledge Production: Evidence from Wikipedia.” _Proceedings of the ACM on Human-Computer Interaction_ 7 (CSCW2): 299:1-299:25. https://doi.org/10.1145/3610090. (2) Champion, Kaylea, and Benjamin Mako Hill. 2024. “Life Histories of Taboo Knowledge Artifacts.” _Proceedings of the ACM: Human-Computer Interaction_ 8 (CSCW2): 505:1-505:32. https://doi.org/10.1145/3687044.\n\nWe have also released replication materials for the paper, including all the data and code used to conduct the analyses.\n\nThis blog post and the paper it describes are collaborative work by Kaylea Champion and Benjamin Mako Hill.",
  "title": "Benjamin Mako Hill: How taboo shapes knowledge production on Wikipedia",
  "updatedAt": "2026-03-23T09:33:53.000Z"
}