2,800 years of data proliferation in 3 minutes

Jay Krall
Sep 20, 2023
3 min read

Updated: Oct 3, 2023

How do societies decide what sorts of information is publicly accessible? Ancient history provides a clear answer: transparency grows in response to civil unrest.

Three centuries before democracy emerged in ancient Greece, in the 8th century BC, landholding aristocrats in Athens began to consolidate their power across the surrounding Attica peninsula, creating some of humanity's first class divisions. As anywhere, wealthy elites in the city struggled to control impoverished outlying areas where people regarded their leaders as corrupt and prejudiced. The legal scholar Deirdre Dionysia von Dornum detailed this fragile pre-democracy period in a 1997 Columbia Law Review article, "The Straight and the Crooked: Legal Accountability in Ancient Greece."

von Dornum describes an annual public audit hearing called the euthnya, in which anyone could question the decisions and behavior of government leaders including treasurers, judges and military generals. Legitimate complaints would result in criminal charges and trials of accused bureaucrats. This practice, von Dornum argues, created a social fabric rich in accountability, paving the way to the first elections. Once the public had the power to remove officials, the right to choose them in the first place was only a short step away.

From that early moment nearly three millennia ago, the notion of government accountability has been readily accepted in free societies. Information surrounding the financial health of companies remained secret for much longer. The 1622 audit of the Dutch East India Company, amidst shareholder accusations of grift, is generally cited as the first ever audit of a publicly traded company. While the audit did appease shareholders, its findings were not shared in public. We have to fast forward all the way to 1903 to find the first published annual report from a public company, produced by Price, Waterhouse & Co. for U.S. Steel.

So, both public institutions and private industry, have long track records of disclosing information only under duress. From the 15th century, the printing press would present new challenges, as newspapers fed the public ever more information. Copyright wasn't entirely a new idea then, but protections had previously been provided mostly to the church. In the 6th century, nearly 3,000 people died at the battle of Cúl Dreimhne in what is now County Sligo in the Republic of Ireland, in a bloody dispute over the unauthorized copying of a book of psalms. But these writings weren't in public circulation at the time, they belonged to the clergy. Not until 1710 does the first Western copyright law appear, with the British Statute of Anne, designed to deter piracy in the early days of newsprint, and to ensure professional writers would be fairly paid for their work.

New communications mediums would bring new controls, such as decency standards on broadcast television. But mostly, the public flow of information remained in the hands of governments, companies and news organizations until the user-generated social media evolution of the late 2000s and 2010s. Now, for the first time, it wasn't primarily institutional information to be regulated and monetized, but the broad voice of the populous as well. Twitter, for example, began formally licensing data to researchers via a network of resellers in 2012. Facebook, Reddit and others would soon follow suit. In a digital world where several million news articles are published online each day globally, along with billions of daily, public social media posts, with companies using this data to train powerful AI tools, what sorts of data should be made available for commercial purposes?

US case law paints a complex picture in this area. Cases from the past decade brought by news organizations against media intelligence firms clearly established that using copyrighted content in business analytics requires a license from the publisher. More recently, hiQ Labs vs LinkedIn established civil protections for data scrapers under the US Computer Fraud and Abuse Act, provided they do not collect copyrighted content or emulate regular users in violation of a social platform's terms of use. These rulings have created divergent landscapes for news and social data acquisition. With new scraping cases being considered today, such as Meta vs. Voyager, companies are trying to evaluate how much data is safe for them to collect. This much is certain: we should expect the topography to continue to evolve quickly.