New Wine And New Bottles: The Promise Of Research In The Age Of Big Data

Originally published in Forbes on February 20, 2018. Read the article here.

A review of Bit by Bit: Social Research in the Digital Age, Matthew J. Salganik, Princeton University Press, 2017. 445 pp.

Bit by Bit: Social Research in the Digital Age (2017).

Bit by Bit: Social Research in the Digital Age (2017).

The Promise of Research for Business

Popular consumer web platforms use online controlled experiments known as A/B tests to compare the effect on users of different features. At Microsoft’s Bing, for example, there are over 200 concurrent experiments running on any given day. Google infamously tested 41 shades of blue on users to gauge their reactions before deciding on the Pantone shade of its hyperlink. Thanks to the ease of using the platforms to conduct research, companies can easily and repeatedly divide customers into groups and give them different experiences to inform how they develop products and services to the end of improving customer experience and increasing sales. These kinds of experiments have become commonplace in online business. Randomized controlled trial in use in medicine, business, and other fields to test effectiveness by comparing the outcomes of an intervention with a particular group against a randomly selected group that does not receive the intervention are an inexpensive way to know what works.

Why Big Data Research is Stalled in Government

However, what is becoming standard practice among companies is still foreign in public policy. It is inexcusable, laments software entrepreneur and political commentator Jim Manzi in Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society, that we lack the opportunity in policymaking “to experiment and discover workable arrangements through an open-ended process of trial and error.” We typically make macroeconomic decisions with global consequences with no means to know what works and why.[1]

In part, the slow uptake of a culture of experimentation in government is because these digital social research techniques are not without controversy, even in business. Facebook as well as Uber have come under scrutiny for regularly running social and behavioral experiments without people’s consent. There are also questions about the limitations of using data from such micro-interventions to drive business strategy let alone policymaking. With the exception of new nudge units in the UK, Australian, US, and Singaporean governments, there has been a lack of knowledge within government of how to create these digital research experiments.

The reality is that traditional policy and social sciences are not well-equipped with the tools and methods (and ethical guidelines) to know how to wrangle the deluge of data now available. At the same time, computer and data scientists with the technical know-how to manage the data may know how to test whether people prefer this shade of blue over that shade of blue, but they often lack access to, and understanding of how to construct, more complex social science experiments.

A Guide to Doing Digital Social Research

In an effort to bridge the gap between social and data science, tech-savvy Princeton sociology professor Matt Salganik has published an ambitious new book: Bit by Bit: Social Research in the Digital Age. The book is an enticing and important field guide to the new frontier of digital social research that will be of interest whether one is trying to figure out how to do more evidence-based policymaking or simply sell more toothpaste online.

Impeccably organized and beautifully written in clear and accessible prose, the book doubles as a methods textbook for university students studying social and data sciences (or any field where research is at the center). Textbook or not, I read the book cover-to-cover on a (long) plane ride. Replete with interesting examples of path breaking action research from a variety of fields, Salganik demonstrates what it means to do research today in ways that take advantage of big data and the technologies of collective intelligence.

The book opens with the paradigmatic example of Joshua Blumenstock’s study of distribution of wealth and poverty in Rwanda. The case illustrates the idea at the center of the book, namely that new data science techniques, although enabling researchers to gather more and new kinds of data from more diverse populations, are not replacing but complementing older approaches like phone surveys to improve the quality of research results for business, government, healthcare and policy.Blumenstock, a researcher at the University of California, Berkeley, called a random sample of 1,000 residents culled from a database of 1.5 million mobile phone users. So far, so traditional.

But then Blumenstock’s team used what they learned from the phone survey to develop and train a machine learning model to predict wealth and applied it to the complete calling data from those 1.5 million users to create a detailed map of the wealth levels of the whole country. When mapped, their model very closely approximated the government’s national Demographic and Health study previously created through manual surveys. However, Blumenstock’s approach achieved these results 10 times faster and 50 times cheaper, paving the way to apply the method in other places where data is absent!

The Impact of Technology on Research

Salganik devotes a chapter to illustrating the impact of technology on every stage of research, from observing behavior, to asking questions, to conducting experiments, and thoughtfully assessing limitations as much as hopeful possibilities for using big data to improve research.

First, technology helps to connect subjects and their data to studies faster than ever before. Whereas a clinician would have previously needed a year to find a few hundred study participants for a clinical trial, devices like mobile phones and smart watches can recruit participants in a few hours. “We have moved from a world where behavioral data was scarce to one in which it is plentiful,” says Salganik. The proliferation of data-collecting devices from web browsers to mobile phones is giving rise to enormous quantities of observational data and information. Interestingly, most of this data has not been collected for purposes of research. This has the advantage that these call records, clickstreams, and dating profiles are “non-reactive.” That is to say, users are not adjusting their behavior because they are being watched in the way they would do if they were conscious of being part of a study. However, the fact that the data is “naturally occurring” is not without its complications. Johan Ugander’s “finding” that Facebook users tend to have around 20 friends on average is a classic case of “algorithmic confounding.” Facebook is engineered to push users to make more friends until they have 20.

Second, social scientists always ran experiments, but new digital platforms make it cheap and easy to conduct large-scale A/B tests with ever bigger populations, as customer engagement tech provider OPower (now part of Oracle) discovered when it collaborated with power companies to provide customers with a picture of their own energy consumption and run social psychology experiments to encourage energy saving.

Third, researchers have long wielded clipboards to conducts surveys and interviews, but now mobile phones make it possible to reach more people more quickly with micro-surveys. Surveys can also be combined with big data from other sources to develop a more comprehensive picture of on-the-ground conditions.

Fourth, the internet is not only changing our ability to collect more data for research, it is also enabling more people to participate in the process of data analysis. Mass collaboration and crowdsourcing projects and citizen science platforms like Crowdcrafting are democratizing the process of both data collection and data analysis, bringing more people into the process of doing research.

New Rules for the Road

In a case of “eating his own dog food,” Salganik crowdsourced the production of the book. He subjected his manuscript to his publisher’s traditional closed-door peer review process, but he also posted it openly online and received hundreds of annotations from dozens of people. Then he combined this feedback with the feedback from the publisher to improve the book. The book has even been machine translated into 100 languages! It seems only fitting that a book about social research in the digital age was produced in a digital way. The software to enable this process has been published online as the Open Review Toolkit.

But the most important contribution of the book for researchers in business, academe and policy is its discussion of how to navigate the ethical challenges involved in digital social research, such as the new-found ability to observe and gather data about millions of people and to do so over time without their knowledge, that create “situations, where reasonable well-meaning people will disagree.” I cannot do justice here to his balanced and thoughtful reflection on the implications of digital for traditional principles of research ethics (respect for persons, beneficence, justice, respect for law). But by unpacking the issues, Salganik shows that they are surmountable.

Moreover, Salganik makes the case successfully that the ability to do more research faster about stuff that matters is too important not to grapple with these hard questions. Thus, he explains at some length why research with populations that might not be perfectly representative, such as using Twitter or Facebook data, is still important. He illustrates using the story of John Snow and his work on the causes of cholera in 19th century London. Although the study did not allow him to describe the prevalence of cholera in London, mapping the incidence of cholera in relation to water pumps enabled him to uncover that contaminated water and not bad air was the cause. His research led to changing residents’ source of drinking water and thereby saved lives.

The best part of Bit by Bit and where I wanted to read even more is precisely the discussion of how to balance methodological rigor with, or forego it in favor of, real-world impact. As MIT Professor Kurt Lewin said, “research that produces nothing but books will not suffice.” We must do more action research about important topics.

Digital tools and big data create the opportunity to get out of the lab and to study and address our most urgent, impactful and socially relevant problems in more rigorous ways than we have ever been able to do before. Those ways may be imperfect, messy and challenging but, thanks to Bit by Bit, we have ethical guideposts, helpful instructions, and optimistic encouragement to press on.

[1] Jim Manzi, The Surprising Payoff of Trial-and-Error for Business, Politics, and Society (New York: Basic Books, 2012).