Big Data. Big Obstacles.


Dalton Conley et al. in the Chronicle of Higher Education: “After decades of fretting over declining response rates to traditional surveys (the mainstay of 20th-century social research), an exciting new era would appear to be dawning thanks to the rise of big data. Social contagion can be studied by scraping Twitter feeds; peer effects are tested on Facebook; long-term trends in inequality and mobility can be assessed by linking tax records across years and generations; social-psychology experiments can be run on Amazon’s Mechanical Turk service; and cultural change can be mapped by studying the rise and fall of specific Google search terms. In many ways there has been no better time to be a scholar in sociology, political science, economics, or related fields.

However, what should be an opportunity for social science is now threatened by a three-headed monster of privatization, amateurization, and Balkanization. A coordinated public effort is needed to overcome all of these obstacles.

While the availability of social-media data may obviate the problem of declining response rates, it introduces all sorts of problems with the level of access that researchers enjoy. Although some data can be culled from the web—Twitter feeds and Google searches—other data sit behind proprietary firewalls. And as individual users tune up their privacy settings, the typical university or independent researcher is increasingly locked out. Unlike federally funded studies, there is no mandate for Yahoo or Alibaba to make its data publicly available. The result, we fear, is a two-tiered system of research. Scientists working for or with big Internet companies will feast on humongous data sets—and even conduct experiments—and scholars who do not work in Silicon Valley (or Alley) will be left with proverbial scraps….

To address this triple threat of privatization, amateurization, and Balkanization, public social science needs to be bolstered for the 21st century. In the current political and economic climate, social scientists are not waiting for huge government investment like we saw during the Cold War. Instead, researchers have started to knit together disparate data sources by scraping, harmonizing, and geo­coding any and all information they can get their hands on.

Currently, many firms employ some well-trained social and behavioral scientists free to pursue their own research; likewise, some companies have programs by which scholars can apply to be in residence or work with their data extramurally. However, as Facebook states, its program is “by invitation only and requires an internal Facebook champion.” And while Google provides services like Ngram to the public, such limited efforts at data sharing are not enough for truly transparent and replicable science….(More)”