Can a Language Model Identify Social Patterns in Shared Archives?
On pattern discovery, life-history, and democracy beyond institutions
I shared two samples from my life-history collection with a colleague: runaway-slave narratives and Holocaust rescue files. He ran them through a language model with minimal prompting. The model surfaced a pattern of social organization the two sets shared across a century and two continents. Then it pointed to where that pattern is most likely still active today.
Why life-histories?
I grew up with two accounts of the same history. My grandmother spoke from lived experience. My state taught a pedagogy of collective catastrophe. I thought the two would never converge. Her account located the source of danger in the state itself. The national pedagogy located it in the “other.” As a child, I trusted the pedagogy. I loved my grandmother, but I assumed she spoke from the wound of childhood trauma.
I changed my mind when something she had warned me about actually happened. At fourteen, on my way out the door, I heard the leader of the opposition at the time, Benjamin Netanyahu, caught on camera whispering to a Mizrahi rabbi: “the lefties forgot what it means to be Jews.”
I froze.
My grandmother’s words came back to me:
“I was a German citizen. I was the majority.
Until the idiot decided I no longer belonged.”
I looked at the screen and thought: here is my idiot.
That was the first time her account and the official one met, and not in the direction the pedagogy had prepared me for. After that, I stopped trusting the compressed version of history and started collecting life-history narratives instead. Over the years, one thing became impossible to ignore: the grammar of the successful cases was always the same.
My grandmother’s narrative had better predictive power than the curriculum, because her model was grounded in experience. It described what people do when they can no longer depend on institutions.
The grammar of democratic movements:
A sense of self worthy of life and freedom.
A symmetric tie with a citizen willing to take the risk of helping.
A network of like-minded people who share that risk and widen access to rights, until the rule of law aligns with a society’s values.
That is a model no government has any incentive to make legible.
The experiment
In May of this year, at the Beautiful Business Forum in Athens, I joined a workshop run by Carlos Henestrosa of Cloud District on working better with generative AI. Until then it had not occurred to me that a model could help with my inquiry. The components of the pattern I was after sit nowhere on the surface of the life-histories; they have to be inferred from meaning, and there was no scholarship to lean on. The workshop made a different case: a model can infer not only structure and procedure, but culture.
Afterward, I described my fascination with life-history archives to Carlos. He was direct: “If there is a pattern in the data, the model will recognize it.” He offered to run the inquiry himself, with as little steering as possible, and let the model find what it could.
The samples were small. The first held 291 first-person narratives from the Underground Railroad; the second, 24 Yad Vashem dossiers of Belgian citizens recognized as Righteous Among the Nations. What connects them is stated nowhere in the texts. No scholarship links them, and the pattern is not yet named in the literature. I had carried the hypothesis for years without finding an academic willing to test it.
“What patterns recur in these texts?”
Carlos started with “what is in the folder?” then asked the model to read the slavery files as an anthropologist and identify meaningful patterns. The model returned four recurring themes and weighted them equally:
Personhood under denial.
Kinship under attack.
The body as evidence.
Movement against immobility.
The first theme already held one of the components I was tracking: a sense of self worthy of freedom. Of 291 files, 182 open with the phrase “I was born,” an act of self-naming forbidden under slavery. The model also identified literacy as the genre’s hinge: ‘the moment the narrator becomes a person who can no longer be enslaved in mind.’ This maps onto the ignition condition behind most escapes: perceiving oneself as free while still enslaved.
The relationship I most wanted to understand, the one between escapees and the people who helped them, sat inside the fourth theme, “movement against immobility.” The model treated it as one theme among four, so Carlos asked directly:
“I am particularly interested in the relationship between the escapees and those who helped them escape. What patterns can be found in this relationship?”
The question was leading but the fix is simple for the next sample: loop the open pattern-discovery question on the fourth theme alone, rather than naming the relationship. Even so, the answer went well beyond the prompt.
The model inferred the coercive environment in which escape took place by calculating the prevalence of betrayal narratives in the files (98%). Survival and freedom depended on reading intentions correctly, often in seconds. Reading the environment was an acquired skill, and allies were recognized by a learned grammar: Quaker plain dress as a quiet signal, careful verbal probes, names passed from one escapee to the next like passwords. Telling friend from foe was the precondition of a successful escape.
The model also traced the symmetric ties in the sample, naming them as fictive kinship. What I had described as a bond between risk-sharers across a gap in power that neither side had any normative reason to cross, the model described as:
Helpers are addressed and remembered in kinship terms. Mrs. Bruce becomes a mother-figure to Jacobs. Captain Minner is ‘my good master,’ the word lifted from the slavery lexicon into a new register of chosen patronage. Quaker helpers are ‘Friends.’ Free Black hosts are ‘aunt’ and ‘uncle.’ The helper network is the social tissue with which the escapee reconstitutes a family destroyed by sale.
The model also recognized that ‘almost no escapee was helped by a single person. A successful flight ran as a chain, sometimes ten or twelve links of strangers, each performing one task and handing the escapee on.’ It called this a cellular network: compartmentalized, deniable, each node ignorant of the others.
“Do the same analysis with the second set. Are there similarities?”
Despite the leading move and the small sample, the model found the same grammar at work. Transcribing each result myself, I could confirmed it held across all the cases: the same coercive world where telling friend from foe came first, the same symmetric tie recognized again as fictive kinship, the same relay of strangers passing people to safety, and the same refusal to give up a self.
What makes the convergence striking is that the model recognized the same pattern even though the two sets invert each other in almost every respect. In the slave narrative, the rescued person speaks; in the Yad Vashem dossier, the rescued person speaks for someone else, to honor the rescuer. One set was written forward, to help abolish an institution that still existed; the other backward, decades later, to honor resistance to an institution already destroyed. The pattern surfaced through the inversion itself.
Slavery and the Shoah remain distinct, and must never be flattened into each other. But the infrastructure of resistance to them, built ad hoc by ordinary people, converges on the same social pattern.
“Could you recognize this in a text you had never seen?”
The most striking moment came when Carlos asked the model to make the pattern portable: a way to recognize the same grammar in a text it had never seen. The model reached straight for the present. The cases it offered were today’s: asylum files, refugee oral histories, testimony of people fleeing persecution for who they love or who they are. It read the pattern as a live grammar, still operating in testimony produced right now.
How does a pattern travel when no one is carrying it?
How can the same pattern appear across different periods, places, and contexts without being propagated by any central authority, the way a custom would be?
An interesting direction for a hypothesis comes from biology. Prof. Michael Levin, a biologist at Tufts, studies how living tissue builds and repairs itself with no central controller telling each cell what to do. What he finds is that the cells improvise toward a goal. Block the normal route and they reach the same outcome by a different one: scramble the features on a developing tadpole’s face, and the eyes and mouth still migrate into the shape of a normal frog, arriving at the right final form by a path the organism had never used before.
Levin treats this as a kind of collective intelligence. A crowd of simple parts, none of them in charge, converges on the same end again and again, even when the way there has to change. These archives suggest a social version of the same principle. People in antebellum America and wartime Belgium, a century apart and on two continents, with no knowledge of one another, organized toward the same goal in the same shape. Each faced the same contradiction: a society’s stated values set against a legal system that violated them.
What is at stake for the archives, and for AI
The usual worry about AI and our information environment is contamination: AI-generated content makes firsthand accounts harder to tell from inventions, whether fabricated Holocaust “victims” circulating online or deepfaked footage designed to meddle in an election. That worry is real, but it obscures a deeper one: calcification. A single compressed interpretation of history becomes the only version children are taught and machines read. If AI systems already index the compressed versions of our archives at scale, then whoever trains them, and with whatever framework, decides which interpretation becomes legible and which ones disappear.
This matters beyond the archive. Identifying the source of unpredictability in our environment is central to our capacity to respond and adapt. Knowing where danger comes from is the most basic condition of survival, and of building societies that last. If the record we inherit, and the models we train on it, misidentify the source of danger, they send us looking for solutions in the wrong direction at the moment it matters most.
Taught that the danger is the “other,” we grow more hateful and more extreme, and lose our capacity to orient collective action.
Consensus Lab
That is what I am building. Consensus Lab aims to democratize the archives. This small exercise already points to two practical uses: a model can be trained to locate the grammar across an archive at scale, and to generate an expert-level reading that feeds a computed-consensus tool.
A purpose-built annotation tool lets readers mark the components of this grammar directly in the excerpts, and their readings are set against expert and peer ones. Their readings are compared to expert readings. The gap between them is not a quality check. It is the knowledge product. It becomes material others can trust and build from.
The second model learns from that map rather than from any single authority. Models trained this way need not stay in the historical archive. Every asylum file, every oral history, every account of ordinary people organizing to meet a shared threat becomes something you can test for the pattern, at a scale no single researcher could reach.
To sum up
My grandmother’s teaching prepared me for the troubling times we face today. I took her teaching and deepened it by reading life-histories readily available in shared archives.
I am writing this to share a simple message: a good story of how we overcome upheaval can turn our attention in the right direction, toward our heartfelt desires, toward our sense of self-worth and shared responsibility, toward one another and the beautiful planet we share.
Democracy, peace, and data sovereignty rest on our conviction that a better future is possible, and on our capacity to enact new patterns of organization from which better institutions are built.
With thanks to Carlos Henestrosa for running the experiment, and for the years’ worth of curiosity he answered in an afternoon.


