By Chris Wood, Santa Fe Institute
Note: This article appeared in the Santa Fe New Mexican on December 2, 2013.
What do the National Security Agency, the National Science Foundation, Google, Netflix, Amazon, and even your local grocery have in common?
Big Data, that’s what.
Big Data is a loose term for the collection, storage, and sophisticated analysis of massive amounts of data, far larger and from many more kinds of sources than ever before. Organizations like those above, and more every day, are collecting and analyzing the myriad electronic bread crumbs we generate in our daily activities, and they’re exploiting that data to predict our actions and behaviors to help accomplish their objectives.
The Economist recently enthused: “Big data is the electricity of the 21st century – a new kind of power that changes everything it touches in business, government, and private life.”
In the biggest Big Data effort of all, the NSA’s goal is to be able to acquire intelligence data from “anyone, anytime, anywhere.” The classified documents leaked by whistle-blower Edward Snowden make clear that NSA’s penetration of the telecommunications and computer industries is far broader and deeper than even the agency’s most extreme critics imagined.
In addition to being the hottest new trend in business and government, Big Data is fast becoming a pervasive force in modern science. Last year the Obama administration launched a $200M Big Data in Science initiative with the goals of enhancing economic growth and job creation, education and health, clean energy and environmental sustainability, public safety, and global development.
What are we to make of all this? Are Big Data and predictive analytics truly a gold mine for business, science, and government? Or are they a serious threat to our privacy and freedom?
This is a highly complex problem with enormous consequences for both science and society. That’s why my colleagues and I at the Santa Fe Institute and its Business Network recently invited more than a hundred experts from industry, science and government at Bishop’s Lodge in Santa Fe to give careful thought to Big Data’s opportunities and threats. Here are some highlights of what we learned:
- Kenneth Cukier, data editor for The Economist, made the case stated in the title of his recent book Big Data: A Revolution That Will Transform How We Live, Work, and Think.
- Computer technologist and author Jaron Lanier summarized the key idea of his recent book Who Owns the Future? that the internet is an engine of increasing inequality in wealth and power. Without acting quickly to stem this trend, he contends our economy and society will grow increasingly extreme, polarized, and dysfunctional.
- Dan Wagner, CEO of the start-up Civis Analytics and data analytics lead of President Obama’s 2012 campaign, described how he and his colleagues helped transform political campaigning from a focus on traditional voting blocks based on age, gender, ethnicity, etc., to campaigns targeted to specific individual citizens.
- Astrophysicist Alex Szalay of Johns Hopkins University contended that science is moving rapidly toward a “Fourth Paradigm: Data-Intensive Scientific Discovery,” the title of his talk.
- In his talk “Big Data, from Galileo to Gödel,” Simon DeDeo, a former SFI Omidyar Postdoctoral Fellow, showed how the constructive interplay of big data, theory, and computation can reveal underlying truths, not only in the physical and biological sciences, but also in the social sciences and even the humanities.
- Noted historian of the NSA James Bamford addressed the “Dangerous Duo: When Big Brother and Big Data Come Together.” Without needed legal constraints and congressional and court oversight, he argued, the NSA’s ever more sophisticated data collection, analysis, and code-breaking capabilities pose serious threats to privacy and freedom.
So what should we conclude? Is Big Data the opportunity its proponents contend? Or is it a threat whose costs outweigh its potential benefits? Based on our assessment, Big Data is quite clearly both, depending upon the specific application being considered.
In business, the mix can vary across type of business and the degree to which customers perceive Big Data to be in their own interest or just the interests of those trying to sell them something. For example, some of us will find that the “free” services and the convenience of “you might be interested in…” offered by Google, Facebook and Amazon are well worth the costs of providing them extensive information about ourselves or viewing the ads they relentlessly deliver us. Others will decide the benefits are not worth those costs and will “just say no.” But at least in the cases of Google, Facebook, Amazon, and their kin we have the opportunity to choose. In other cases (e.g., auto insurance, credit histories, law enforcement, the NSA), we do not.
In science the mix of opportunity and threat varies too. A number of our speakers, including SFI Distinguished Professor and past president Geoffrey West, emphasized the essential role of theory in using and understanding Big Data. In a world where scientists are drinking from the data fire hose, the data are of little use without theory. And the data need to be the right data for the scientific questions at hand. For example, the availability of large-scale social network data from Twitter and Facebook has captured the attention of social scientists. But are the conclusions drawn from studies of our behavior on social media networks likely to generalize to the real world of everyday interpersonal interactions? We shall see. What is clear is that the scientific questions need to drive the collection of data and not vice versa.
The tension between opportunity and threat is most acute for the NSA. General Keith Alexander, NSA Director, has argued that hunting for terrorists in the deluge of telecommunications and internet data is like trying to find a needle in a haystack and “you need the haystack to find the needle.” This “collect it all” strategy ignores the fact that as the total amount of data increases without bound, the ratio of true-positive “needles” to “false-positive” chaff decreases accordingly. A data collection approach targeted at suspected individuals and groups is likely to be more productive, not to mention more constitutional.
Telecommunications and internet companies are starting to push back against unfettered data collection by the NSA, and Congress on both sides of the aisle is beginning to question the “you need the haystack” rationale. Whatever your own views on Big Data and the NSA, I believe we can achieve a better balance between our government’s legitimate role of protecting its citizens and its equally important role of ensuring the constitutional guarantees of privacy and freedom.
I also believe that the proliferation of Big Data and predictive analytics in all their manifestations is an urgent matter requiring our immediate attention. In the Stephen Spielberg film Minority Report, individuals could be arrested for crimes they had not yet committed but were deemed likely to commit by psychics called “pre-cogs.” Without adequate legal and regulatory protection, Big Data and predictive data analytics threaten to become the Minority Report of the all-too-near future.
This column is part of the "Science in a Complex World" series written by researchers at the Santa Fe Institute and published in The Santa Fe New Mexican.