Big Data Journalism: Not as New as You’d Expect

We live in a unique time in history where it seems everything is driven by big data. If you’re on social media, watching streaming videos, or simply shopping online, your actions are generating data, and that’s affecting how companies interact with you.

Cities are becoming smarter as they connect traffic lights, street signs, and even buildings to the internet to measure traffic patterns. It’s a whole world with data everywhere, seemingly infiltrating every industry and profession, even those that might not seem like a good fit with big data.

Take journalism, for instance. In recent years, journalists have hopped on board the big data bandwagon, using vast amounts of data to craft stories intended to inform and educate. Going through such large sets of data is helped by today’s modern technology, so it’s no surprise many view data journalism as a relatively new phenomenon. That, however, would be a mistaken assumption.

Big data journalism is no new fad. In fact, the tradition stretches back more than a hundred years, and while these journalists may not have used big data analytics as we know it now, they were still pioneers of their time. Technology may have changed, but many of the principles behind it are still the same.

One of the most oft-cited and well-known uses of data in journalism came not from a journalist but a physician. Back in 1854, London experienced a severe outbreak of cholera in one of its districts. At the time, cholera was a disease that was grossly misunderstood, with most people thinking it spread through polluted air. Physician John Snow disagreed with the common wisdom of the day, believing instead that cholera was a waterborne disease. With the technology of the time, testing out this hypothesis was a tricky endeavor, but when cholera hit London, he decided to see if he was right by collecting data. This required the journalistic practice of interviewing the residents in the district where the outbreak occurred, keeping careful notes on who contracted the disease, where they lived, and who ended up dying from it.

Snow eventually came up with a detailed cholera map, which also showed the location of the water pumps in the area. From this visual aid, it was clear to see the role water played in spreading the disease. Investigators later discovered the water had been contaminated with a sick baby’s fecal matter. While the prevailing air theory surrounding cholera would continue into the 1870s, this was an important first step in dispelling the faulty notion. Snow’s use of data to tell a truthful story proved powerfully convincing.

Data journalism goes back even further than that. In 1848, Horace Greeley, an editor for a New York newspaper, was also a U.S. congressman who took on the current practice of reimbursing congressional representatives for their travel to and from Washington. The policy at the time gave congressmen 40 cents for every mile they traveled. After looking at each congressman’s mileage records, Greeley felt the policy was archaic and a waste of taxpayer money, but he had to prove it and sell the idea to the American people.

So, with the help of his newspaper staff, Greeley took on a project that compiled data on how much each congressman was reimbursed, comparing it to the shortest route to travel to Washington based on postal routes. The project sought to show the difference in cost and how much extra congressmen were asking for. The story was later published with tables showing the detailed information on the investigation. Few congressional representatives were spared. Even Abraham Lincoln, a congressman at the time, was found to have received around $677 (that’s more than $18,000 in today’s money) in extra mileage. Word spread quickly across the country, earning shouts of outrage from the public, while politicians defended the practice. Even so, the House later voted to change the policy to reflect the shorter routes.

These stories show just how prevalent data has been in journalism. It’s not a recent trend that reporters have picked up on but rather a core principle many strive to follow. The above examples also show how important it is to make data easily accessible to the casual audience. Simply throwing a bunch of numbers at people isn’t enough to make an impact. Putting that data in a visual form is essential, and even in the mid-1800s, journalists knew that.

From the lessons of these early data journalist reports, today’s reporters can learn about the value of data and its role in telling a story.