On August 4, 2006, AOL accidentally published the complete search histories of 657,426 of its users. They replaced screen names with numbers and called it anonymous. It was not. Within days, journalists had put a real name and face to User #4417749. What the internet found in the rest of the dataset was considerably darker: a user researching how to kill his wife, a man whose church youth group searches sat next to something the internet did not forget, and — now confirmed against the raw dataset by reader Andrew — a user whose search history is directly connected to a real 1983 cold case murder victim. Rita has read the dataset record. Here is what is documented, what is confirmed, and what the primary record actually shows.
AOL Had a Great Idea. It Was a Catastrophic Idea.
In the summer of 2006, AOL was not doing well. Google had eaten its search business. AIM was in decline. The brand that had once sent floppy disks to every household in America was trying to reinvent itself as a serious technology company with serious research ambitions. So when Abdur Chowdhury, AOL’s chief research scientist, authorized the release of a massive dataset of user search queries for academic study, it probably felt like exactly the kind of open, forward-thinking gesture that a company in crisis needed to make.
On August 4, 2006, AOL Research published a compressed text file containing 21,011,340 search queries from 657,426 users, collected over a three-month window from March to May of that year. The announcement from Chowdhury framed it as a gift to the research community — “anyone with a desire to work on interesting problems.” The file included the search term, the date and time it was made, and whether the user clicked a result. Each user’s screen name had been replaced with a random number. AOL considered this sufficient. The internet was about to demonstrate, in excruciating detail, that it was not.
The file covered searches conducted at search.aol.com from March 1 through May 31, 2006. It contained 21,011,340 individual query records assigned to 657,426 unique user IDs. Each record included the anonymized user ID, the exact search string as typed, the date and time of the query, and the URL of any result the user clicked. AOL also simultaneously released supplemental datasets: 2 million queries about .gov domains, 20,000 queries from a 2004 sample, and 3.5 million additional categorized queries.
The file was available on AOL’s public research website for approximately three days before being removed. By that point it had been downloaded hundreds of times and mirrored across the internet. It has never truly disappeared.
Why “Anonymous” Was Always the Wrong Word
The problem was not that AOL had bad intentions. The problem was that AOL — like most of the technology industry in 2006 — did not understand what anonymization actually requires. They removed the one field that explicitly said “this is a person” and declared the job done. What they left in the file was something far more revealing: an unbroken thread of every question a person had typed into a search box over three months, in sequence, with timestamps.
A search history is not a list of facts. It is a diary. It contains the things you are afraid of, the things you want, the things you are ashamed of, the questions you would never ask out loud. People searched for their own names to see what the internet knew about them. They searched for their addresses, their doctors, their exes, their symptoms. They searched for things they had not yet told their families. And because each query was linked to a consistent user ID, anyone who read the file could follow a single person’s mind across ninety days of their life.
Latanya Sweeney, the Harvard researcher who had already demonstrated in the 1990s that 87% of Americans can be uniquely identified using only their zip code, date of birth, and gender, had been warning about exactly this failure mode for a decade before the AOL leak. Removing a name is not anonymization. It is pseudonymization — and pseudonyms collapse under sustained attention.
The Electronic Frontier Foundation called it the Data Valdez, invoking the Exxon oil spill — a disaster caused not by malice but by stunning institutional negligence. The World Privacy Forum filed a complaint with the FTC within four days. TechCrunch founder Michael Arrington, who was among the first to write about the leak, called the release “staggering” in its stupidity. He was right, and he was also somewhat burying the more disturbing lede: the searches themselves.
User #4417749: Thelma Arnold, Lilburn, Georgia
Reporters Michael Barbaro and Tom Zeller Jr. at the New York Times obtained the dataset and did what a careful reader does: they picked a user and followed the thread. User #4417749 had searched for “numb fingers,” “60 single men,” “dog that urinates on everything,” “landscapers in Lilburn, Ga,” and “homes sold in shadow lake subdivision gwinnett county georgia.” They cross-referenced with a phone book. They found Thelma Arnold.
Thelma Arnold was 62 years old, a widow, a dog lover who spent considerable energy researching her friends’ medical ailments. She was not a criminal. She was not doing anything wrong. She was just using a search engine the way millions of people use search engines — as a private thinking space, a place to ask questions she did not want to ask aloud. When the Times reporter read her searches back to her over the phone, she said: “Those are my searches.” She agreed to be named. She said she felt violated.
Thelma Arnold became the human face of the AOL leak because she was willing to be. She is the story the internet tells itself when it discusses what happened. But she is the least alarming person in that dataset. She is, in some ways, the alibi — the evidence that most of those 657,426 users were just ordinary people going about their ordinary, private lives. The rest of the file was considerably more complicated.
The People the Internet Found in the Rest of the File
Within hours of the file going public, bloggers and forum users had begun combing through it. What they found ranged from poignant to genuinely disturbing. Below are the documented notable users — their attributions, what is known, and where the record ends.
user-ct-test-collection-02.txt. This user searched for Tara Marowski’s name and the details of her death across two separate sessions: a 34-minute burst on the night of March 27–28, 2006, and a follow-up session on May 25, 2006. The March session begins with name searches, escalates to specific detail queries (“tara found dead in car in san jose”), and ends with “unsolved mysteries tara marowski.” The May session returns to the name twice across a span of five minutes. Additional related searches from this user include “unsolved murders in san jose” and “edward beaton questioned about the murder” (capitalization original). The dataset log is reproduced in full below.User #5342598 and the Name Tara Marowski: What the Record Actually Shows
The original version of this article noted that the Something Awful and blameitonjorge attribution for User #5342598 could not be independently confirmed against the raw dataset. That caveat no longer stands.
Reader Andrew went to the primary source — the actual AOL dataset file user-ct-test-collection-02.txt — and confirmed programmatically that User #5342598 searched for Tara Marowski’s name and the circumstances of her death. The specific queries and their line numbers within the original file are reproduced in the log below. Every entry is verbatim from the dataset, including original capitalization and spelling variants.
This is exactly what this kind of work is supposed to look like: someone with access to the primary record goes and reads it, then brings the findings back. Thank you, Andrew.
Read the log. Look at the timestamps. Session 1 starts at 11:57 p.m. on March 27, 2006, and runs until after midnight. The first three queries are identical — the name, repeated three times, as though testing whether the search engine knows it. Then the queries escalate: “unsolved murder of tara marowski,” “murder of tara marowski,” “tara marowski found dead in car.” Then twelve consecutive attempts at “tara found dead in car,” each a minute or so apart, a person who is not finding what they are looking for and cannot stop looking. Then the geographic specification begins: “in san jose.” Then “young woman named tara found dead in car in san jose.” Then a pause of eight minutes, then “unsolved mysteries tara marowski,” “unsolved mysteries tara,” “unsolved mysteries tara.”
Fifty-eight days later, on May 25, 2006, the same user returns. Session 2 is different in character: shorter, more focused, and notably includes the full middle name. “tara lynn marowski.” Then a spelling variant: “markowski” with a k. Someone who is not sure how to spell her name. Someone who, in the two months since March, has been thinking about this enough to try the alternate spelling.
Among the additional related queries that Andrew flagged — not reproduced in the name-specific log above but confirmed in the surrounding entries — are “unsolved murders in san jose” and “edward beaton questioned about the murder,” with the capitalization preserved as it appears in the original file. That last one is significant. Edward Beaton is a name that appears in contemporaneous San Jose Mercury News coverage of the Marowski investigation. It is not a name that an average curious person in 2006 would have happened upon. The case was largely out of public view. Someone searching for Beaton by name, in connection with this murder, had done more than casual research.
What the confirmed dataset entries establish is that someone, using AOL’s search engine on the nights of March 27–28 and May 25, 2006, searched repeatedly for Tara Marowski, the circumstances of her death, and by name the suspect who had been questioned in connection with it. The search pattern is methodical and returns to the subject across two months.
What the dataset cannot establish is who that person was, what their relationship to the case was, or what they intended by searching. People search for crime victims for all manner of reasons: they knew them, they knew someone who knew them, they are journalists, they are researchers, they are family members, they are true crime readers, they are the person who committed the crime monitoring what the internet has found. The record shows the behavior. It does not answer the question the behavior raises.
What the Leak Actually Proved — and Why It Still Matters
The AOL Data Valdez is taught in computer science courses, law schools, and data ethics programs because it was the moment the technology industry was forced to confront something it had been pretending was not true: that behavioral data is identity data, and that stripping a name off a record does not make it safe to publish.
Latanya Sweeney had shown in 1997 that 87% of Americans can be uniquely identified from just three data points — zip code, date of birth, and sex — available in public records. The AOL dataset gave researchers not three data points but thousands, all linked to a single consistent pseudonym, all timestamped. The failure was not technical. It was conceptual. AOL’s researchers believed that identity lived in names. It does not. Identity lives in patterns.
Every search query is a confession. Aggregated across time, those confessions form a portrait more complete than most people would share with their closest friends. The question was never whether the data was anonymous. The question was what happens when someone reads it carefully enough. In 2006, the answer arrived in about seventy-two hours. In 2026, a reader named Andrew sat down with the file and confirmed it in an afternoon.
AOL’s CTO Maureen Govern resigned on August 21, 2006. Two employees were fired — the researcher who authorized the release and his direct supervisor. A class action lawsuit, Doe v. AOL, was filed in September 2006 in the Northern District of California, alleging violations of the Electronic Communications Privacy Act and fraudulent business practices. It settled in 2013 for $5 million, with affected users eligible for up to $100 each.
The data was never truly deleted. It is still out there — mirrored, archived, downloadable. User #4417749 is still Thelma Arnold. User #17556639 still searched for how to kill his wife. User #5342598 — now confirmed — searched for Tara Marowski across two sessions in the spring of 2006, when her case was cold and Christopher Holland had not yet been charged with anything. They searched for Edward Beaton by name. They spelled her middle name and tried alternate spellings of her last name.
AOL thought it was releasing a research dataset. What it actually released was a window into the private lives of 657,426 people who had no idea anyone was watching. Some of those people were perfectly ordinary. Some of them were not. And twenty years later, a reader named Andrew opened the file and confirmed what the file said all along.
Sources — Updated May 2026
Rita Williams, Rita Ruins Everything: The AOL Data Leak, User #5342598, and the Name Tara Marowski, Clutch Justice (May 2, 2026; updated May 2026), https://clutchjustice.com/2025/05/02/rita-ruins-everything-aol-data-leak/.
Williams, R. (2026, May 2; updated May 2026). Rita ruins everything: The AOL data leak, User #5342598, and the name Tara Marowski. Clutch Justice. https://clutchjustice.com/2025/05/02/rita-ruins-everything-aol-data-leak/
Williams, Rita. “Rita Ruins Everything: The AOL Data Leak, User #5342598, and the Name Tara Marowski.” Clutch Justice, 2 May 2026, updated May 2026, clutchjustice.com/2025/05/02/rita-ruins-everything-aol-data-leak/.