Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiehhkr2zxfoc3maslgkzr64o2sm3filcogk6vdqja6kzc6b75veti",
    "uri": "at://did:plc:akauiygo3cboznlozms62vqw/app.bsky.feed.post/3mm6xwyhmi4f2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreicwqmapm2udljoqsxzpzly63ub5jbv3t2qlyrf3dyvlczzt4n5q2q"
    },
    "mimeType": "image/jpeg",
    "size": 75754
  },
  "description": "On language, cognition, and what we are encoding when we build AI in English.",
  "path": "/language-is-not-neutral/",
  "publishedAt": "2026-05-19T08:29:19.000Z",
  "site": "https://julie.beliao.fr",
  "tags": [
    "documented",
    "study",
    "analysis",
    "2025 paper",
    "report",
    "article"
  ],
  "textContent": "More than twenty years ago, I spent several years inside languages that are nothing like French or English. Quechua. Aymara. Tzeltal. Yucatec Maya. I was at INALCO in Paris, the Institut national des langues et civilisations orientales, doing a bachelor's degree. I was not just studying these languages from the outside. I was also learning to think inside them, slowly, imperfectly, but genuinely. Then life moved on. A PhD. A tech career. Twenty years working exclusively in Indo-European languages. French, Portuguese, Spanish, and English. I forgot what it felt like to be on the outside of that family.\n\nA year ago, I started learning Mandarin, not as a hobby and not as a side project. I want to understand this part of the world from the inside, across tech, business, culture, and the way decisions get made. What I did not expect was that it would put me back inside something I had forgotten.\n\nIt is putting me back inside a different way of seeing again. And that is making me ask questions about the technology we are building that I had stopped asking.\n\nTwo things happen when you step into a radically different language. The first is structural.\n\nGrammar encodes what a culture has decided matters enough to track automatically, in every sentence, without thinking. What you are forced to specify. What you are allowed to leave unsaid. And that is not a small thing, because what a language forces you to specify, you end up thinking about more carefully.\n\nQuechua, the language of the Andean highlands, requires you to signal every time you make a statement how you know what you know. Did you see it yourself, or did someone tell you, or are you inferring? Some varieties have up to six formal ways to encode this, built into the verb itself. You cannot make a claim without declaring its source. It is not a choice of the speaker; it is just grammar.\n\nIn English, I can say \"the model is improving\" and move on. In Quechua, the grammar stops me before I finish and asks: how do you know that? Did you measure it? Did someone report it to you? Are you inferring it from the behaviour you observed? The language builds accountability for knowledge into the act of speaking itself.\n\nThink about what that does to a culture's relationship with information. With rumour. With the difference between knowing and assuming. The languages dominating our media, our platforms, and our model training data treat epistemic sourcing as optional. That is not a coincidence. It is perhaps part of why we are where we are with disinformation, with systems that assert confidently and source nothing. It was never built into the grammar.\n\nTzeltal, a Mayan language spoken in Chiapas, does not use the body as a reference point for space. In English, and in most European languages, things are to your left or your right, in front of you or behind. You are the center of the map. In Tzeltal, the center is the terrain. Specifically, the slope of the land. Things are uphill or downhill, regardless of which way you are facing. If you put a cup on a table and ask a Tzeltal speaker where it is relative to a plate, they will tell you it is on the uphill side. Not to the left. The orientation is fixed to the world, not to the observer. Stephen Levinson at the Max Planck Institute documented this over decades, showing that the cognitive effect is real: Tzeltal speakers recall and navigate spatial information differently from European language speakers.\n\nRobotics and spatial AI researchers have long debated whether systems should represent space egocentrically, anchored to the agent, or allocentrically, anchored to the environment. Both approaches exist, and neither has fully won. But the question of what it would mean to design spatial reasoning from the ground up with a terrain-centered, environment-anchored grammar as the default assumption has not been seriously asked. Not because it is unanswerable. Because it did not occur to us to ask it.\n\nAymara, spoken in the Andes, orients time the other way. The past is in front, because you can see it. The future is behind, because it is unknown. Researchers Nunez and Sweetser confirmed this not just in language but in gesture, in a study published in Cognitive Science in 2006: Aymara speakers wave backward over their shoulders when speaking of what is to come, and gesture forward when describing what has already happened, the exact inverse of what English speakers do. A culture that orients toward the known and visible, that treats the future as the unseen thing at your back, will approach risk differently. Will approach planning differently. Most leadership frameworks, most forecasting tools, are built on a forward-facing orientation. An Aymara-oriented framework would ask first: what do we actually know? What is visible? What are we calling insight that is really just projection? In conditions of genuine uncertainty, that prior question might be the more rigorous one. English does not force you to ask it.\n\nAnd this is where it connects to something I cannot stop thinking about.\n\nAlmost everything being built in AI right now is being built in English. English makes up around 92% of GPT-3's training data, roughly 90% of Llama 2 and Claude 2. A 2025 arxiv analysis put it clearly: this is not a technical necessity but a historical artifact. The fairness argument around this is well known: more languages, more inclusion, and it matters. But that is not what I keep thinking about. What I keep coming back to is more uncomfortable. If language shapes how a model reasons, not just what it says, then building in English is really not a neutral choice. It is a conceptual one, and we have barely started asking what that means.\n\nA 2025 paper by researchers at MBZUAI examined how large language models reason differently when operating in Chinese versus English. What they found goes beyond output differences. When reasoning causally in Chinese, models show measurably different attention patterns inside the network than when reasoning in English, focusing on different parts of sentences, applying different logical structures. The connections activate differently. The same model, the same weights, but different internal behaviour depending on the language it is working in. The authors conclude that LLMs do not just mimic surface linguistic forms. They internalise the reasoning biases shaped by the language itself.\n\nThis means the language a model is trained in does not just determine what it says. It also shapes how it reasons. The grammatical structures it learns to predict become the conceptual structures it uses to think. A model trained primarily on English is not a neutral engine that happens to speak English. Its assumptions about how knowledge works, how space is oriented, and how time moves are all inherited from the language it was built on. That is not a software problem.\n\nAnd that brings the obvious question. What would a model look like if it had been built from the ground up in Quechua, or in Tzeltal, or in a language that treats epistemic sourcing as non-negotiable, that anchors space in the world rather than the observer, that faces the known past rather than the projected future?\n\nThe answer is: we do not know. And not only because no one tried, but because the data barely exists. For languages like Quechua or Tzeltal, most of what has ever been said lives in oral traditions, in communities, in forms that were never written down, let alone digitised. The tokenisation systems we use were built for English morphology and perform poorly on languages where a single word does the work of an entire sentence. The benchmarks do not exist. The infrastructure points in one direction and has for decades. A Stanford report from 2025 mapped how deep that gap runs. It is not something you close with good intentions.\n\nThat does not make the question less important. It makes it more so.\n\nThe second thing that happens when you step inside a radically different language is conceptual. Some languages have words for things that other languages leave unnamed, not in a vague or romantic sense, but in a precise and practical one. The concept exists, people experience it, but the language never carved it out, never made it holdable.\n\nI came across 委屈 (weiqū) through an article written by Linda Fu, who was herself marvelling at it. The Mandarin word for the specific feeling of swallowing a justified grievance, of being wronged and absorbing it in silence because the situation does not allow you to speak. English circles around this with phrases and approximations. Mandarin names it in two characters. Having the word changes something. It makes the experience visible to yourself. You can locate it, examine it, and decide what to do with it.\n\nAnd then there is 落地 (luòdì). Literally, landing. In China, when people talk about taking AI from concept to reality, they do not say strategy or implementation. They say AI landing. 落地. As in: something that was in the air has touched the ground, has made contact with the earth, has arrived. We say AI strategy. They say AI landing. That is not a translation difference. It is a different theory of what execution means, what done looks like, what you are actually aiming for. Strategy as posture. Execution as physical contact with reality.\n\nA model trained on Mandarin internalises one way of organising reality. A model trained on Quechua, another. One built on Tzeltal, another still. They are all different. We have spent the past decade, at an enormous scale, building only the English one.",
  "title": "Language is not neutral",
  "updatedAt": "2026-05-19T08:44:00.798Z"
}