Friday, June 20, 2014

Poor Data, Rich Data, Big Data, Chief

Over the past 2 years, Big Data has worked its way into public consciousness, courtesy of widespread news exposure and a series of popular books by Big Data scientists with hyperbolic evocations of the analytic power of their methods.  There seems to be nothing that Big Data cannot do: predict health and wellness, illuminate culture change, stop poverty, foil terrorists.  And, of course, tighten the noose of Foucauldian surveillance from governments and corporations.  But what all of these accounts promise (or threaten) is a transparent window onto truth: our social lives, behaviors, hopes and dreams all rendered transparent through the analysis of vast datasets.
Visualization of all editing activity by user. Image courtesy Fernanda B. Viégas and wikicommons
Visualization of all editing activity by user. Image courtesy Fernanda B. Viégas and wikicommons
Many qualitative researchers—including anthropologists—have sounded an alarm over this drive to datafictaion, where, as Chris Andersonhas famously concluded, “numbers speak for themselves.”  If Data Scientists can tell us what everyone is doing and what everything is thinking, what need is there for 60 in-depth interviews and two years of participant observation?  As Tricia Wang asks, “What are ethnographers to do when our research is seen as insignificant?”  What are we to do, in other words, when community relationships that we painstakingly elucidate over months of field research can be scraped from social media in a few minutes?
For Wang, the answer is to engage Big Data—and to make ethnographic research relevant in a world of hyper quantification.  Dana Boyd and Kate Crawford (2012) make some of the same points, additionally going on the offensive by exploring the assumptions underlying the drive to Big Data.  Do numbers really speak for themselves?  And does having all the data mean that you have privileged access to all the facts?
But these questions should be familiar to cultural anthropologists; we are no strangers to Big Data.  While we haven’t generally dealt with millions of data points, the hyperbolic claims of Big Data echo the hubris of anthropology in its contact with small societies.  By looking back on these earlier methodologies, we might reconceptualize Big Data as another chapter in what Walter Mignolo has called the “enduring enchantment” of modernity.
In 1898, Alfred Hort Haddon and his team (which included Charles Seligman and W.H.R. Rivers) set out on an expedition to the Torres Strait islands off the coast of New Guinea.  With broad goals for their field surveys, including salvage anthropology, experimental psychology, linguistics and physical anthropology, the team quickly amassed huge amounts of filed data—enough for 6 huge volumes.  Along with these compendia, the team additionally developed novel methodologies, with W.H.R Rivers’s “genealogical method” being the best remembered (as well as the most excoriated).
In order to compensate for his ignorance of native languages, and for the shallowness of the expedition’s contact, Rivers began asking people (in pidgin English and through interpreters) for the names of their “father,” “mother,” “husband,” “wife,” etc.—never mind that these terms were a priori mired in his British, middle-class assumptions about filiation and descent.  Surprised by the impressive, genealogical memories of his informants, he was able to generate vast amounts of “data” using this ham-fisted approach, including “complete” records for some the islands the Torres Straits team surveyed.  From that data, he was able to generate numerous insights into marriage, naming practices, fertility, “totemistic systems,” and even history and culture change.  In other words, without engaging people in real conversations about their lives, and without actually observing islander life, Rivers believed he could apprehend the “whole” of Torres Strait culture and society through applications of his “concrete” method.
The Genealogical Method of Anthropological Inquiry by  W. H. R. Rivers, 1910. Image courtesy the Sociological Review
The Genealogical Method of Anthropological Inquiry by W. H. R. Rivers, 1910. Image courtesy the Sociological Review
Big Data starts from any of the same assumptions.  Without direct windows onto people themselves, Big Data scientists harvest proxy data from the residue of our complex lives in information society.  Do you want to know if people are getting sick?  You could ask people—and observe their behavior—but you could also (as with Google Flu Trends) compile search data on symptoms.  Or do you want to know about the mobility of people in cities?  You could interview people and follow them are their daily round, or you could, as Barabasi and his team did, analyze the billing records from 100,000 cell phone users in order to generate maps of movements over a 6-moth period.
Is it specious to compare huge datasets from Google with Rivers’s collected genealogies?  Both proceed from the same assumptions about the whole.  After all, anthropological research on small populations of people living in putative isolation on islands was premised on the assumption that one could collect and understand everything about a simple society.  Big Data builds a similar edifice upon massive computing power and the integration of networks.  For Google, flu trends provides a window onto vectors of illness because it collects the whole of Google search data—an island, as it were, secured by a near-monopoly over Internet traffic.   In addition, the problems of the genealogical methods are the problems of proxy data in general.  Massive data can be collected, analyzed  and correlated, but what do these data describe?  When Rivers asks the Torres Strait islanders who their “proper” father is, how useful are those data?  And if he’s managed to solicit genealogies out to five generations, what insights might he derive from these facts?
Of course, big data scientists debate the suitability of data proxies—but it would be a mistake to assume that we have nothing to add to that argument.  Moreover, anthropologists have a long history of questioning the synecdochic fallacy.  Is kinship the foundation for society?  Can we understand the whole of society by considering key institutions like kinship, subsistence and exchange?  And what does it mean to understand the “whole” to begin with?   These are ultimately the questions to pose Big Data: if I collect all of the tweets (as the Library of Congressis doing), can I now understand how people live in the city?  Or how they relate to other people?  Or is there always some destabilizing meaning that lies between these hundreds of terabytes?
Most of all, we can utilize our own experiences to reflect on Big Data as a technological imaginary.  Why do we think it’s desirable to collect all of the data?  What do we imagine the truth of the whole to be?