Raw data won't solve our problems – asking the right questions will

05 September 2019

This article was written by Stefaan G. Verhulst, co-founder of The GovLab, for Apolitical.

“If I had only one hour to save the world, I would spend fifty-five minutes defining the questions, and only five minutes finding the answers,” is a famous aphorism attributed to Albert Einstein.
Behind this quote is an important insight about human nature: Too often, we leap to answers without first pausing to examine our questions. We tout solutions without considering whether we are addressing real or relevant challenges or priorities. We advocate fixes for problems, or for aspects of society, that may not be broken at all.
This misordering of priorities is especially acute — and represents a missed opportunity — in our era of big data. Today’s data has enormous potential to solve important public challenges.
However, policymakers often fail to invest in defining the questions that matter, focusing mainly on the supply side of the data equation (“What data do we have or must have access to?”) rather than the demand side (“What is the core question and what data do we really need to answer it?” or “What data can or should we actually use to solve those problems that matter?”).
As such, data initiatives often provide marginal insights while at the same time generating unnecessary privacy risks by accessing and exploring data that may not in fact be needed at all in order to address the root of our most important societal problems.

A new science of questions

So what are the truly vexing questions that deserve attention and investment today? Toward what end should we strategically seek to leverage data and AI?
The truth is that policymakers and other stakeholders currently don’t have a good way of defining questions or identifying priorities, nor a clear framework to help us leverage the potential of data and data science toward the public good.
This is a situation we seek to remedy at The GovLab, an action research center based at New York University.
Our most recent project, the 100 Questions Initiative, seeks to begin developing a new science and practice of questions — one that identifies the most urgent questions in a participatory manner. Launched last month, the goal of this project is to develop a process that takes advantage of distributed and diverse expertise on a range of given topics or domains so as to identify and prioritize those questions that are high impact, novel and feasible.
Because we live in an age of data and much of our work focuses on the promises and perils of data, we seek to identify the 100 most pressing problems confronting the world that could be addressed by greater use of existing, often inaccessible, datasets through data collaboratives – new forms of cross-disciplinary collaboration beyond public-private partnerships focused on leveraging data for good.

Finding the right people to ask the right questions

To identify these questions, we take a hybrid “smarter crowdsourcing” approach. This process relies on finding and developing the talents of what we call “bilinguals”: experts who are proficient in one of ten domain fields we focus on — such as migration, gender and climate change — and who are also skilled in the still-emerging fields of data science and/or statistics.
Together with the Big Data for Migration Alliance (initiated by IOM and the European Commission) we have started the process of identifying and engaging bilinguals in the migration space; where over seventy have already agreed to participate, and we expect many more. They are tasked with identifying the ten most important questions in their respective fields. Once this step is completed, those questions will be shared with the public for feedback and comments, a process we expect to take place in the autumn for the domain of migration. Our next focus will be gender with Data2X.
We—and our bilinguals — use several criteria in formulating and selecting questions. The first is that the questions should be highly impactful — either by directly changing people’s lives, or by advancing our current scientific knowledge.
Second, the questions should identify problems that our experts believe are amenable to a data solution. Other criteria we have asked our bilinguals to consider include originality (How novel is the problem or approach?) and clarity (How specific can the question be formulated?).
Within these broad parameters, we focus on four types of questions that we believe can change our approach to contemporary problems. These four categories are:

Situational analysis: Questions related to better understand the trends and geographic distributions of phenomena (often in real-time i.e. nowcasting).
Cause and effect: Questions that can help stakeholders better understand the key drivers and consequences of a situation. This category seeks to establish which variables can make a difference for a given problem
Prediction: Questions that interrogate new predictive capabilities that would allow stakeholders to assess future risks, needs, and opportunities.
Impact assessment: These questions that try to determine the results (positive or negative) of various interventions.

From data-driven to question-driven

While in its early stage, the process itself has been enlightening.
It has made it clear that many (governments, private sector or communities) face similar dilemmas over what to prioritize, and how to prioritize data or AI intiatives. We have also found the return to core values — the emphasis on defining problems rather than leaping to solutions — highly valuable. And for me, personally, it has reaffirmed my belief that any mission-driven organisation, whether focused on improving a particular borough in a single city or on addressing broader societal problems, should consider engaging in its own 100 question-based initiative. Toward that end we are seeking to develop a roadmap that can be leveraged by those that want to become more question-driven.

we strongly suggest that all organizations involved in a data-driven questions initiative designate a Data Steward whose task it is to steer the entire initiative

The first step is to identify a group of bilinguals that can help your organisation pinpoint the most important questions. This is a unique aspect of our approach, and we have found that the combination of domain and data expertise yields questions that are both relevant and amenable to data solutions. Once this community of experts has been created, it can also be used as a brain trust to organizations across data initiatives.
Outreach and engagement throughout the process are two further essential components to ensure an end-to-end data initiatives (from questions to data to knowledge to action). In particular, we recommend three forms of engagement—with the public, problem owners and with data holders.
Public engagement (for example, through the use of rankings and lists that citizen can access) has a valuable role to play in providing feedback to experts, and to ensuring that the questions selected are legitimate and relevant to real problems felt by people and their communities.
By the same token, engaging with data holders is vital to ensuring that data to answer the questions identified can be made available and accessible.
We also recommend an ongoing conversation with “problem owners,” whether in government, the private sector or civil society that can act upon the insights generated. The ground covered by such conversations should include data availability and accessibility, as well as key principles and models for acting upon the insights generated as to serve the public good.

Becoming a data steward

One final recommendation is derived less from this particular exercise and more from previous research conducted by the GovLab: we strongly suggest that all organizations involved in a data-driven questions initiative designate a Data Steward whose task it is to steer the entire initiative.
We have written about Data Stewards before in apolitical, pointing to their importance in managing organizational data assets. Data Stewards play a valuable role in matching data with priority questions. They also play a role in ensuring that data is handled responsibly, with attention to individual privacy, and in compliance with all applicable regulations.
Needless to say, these recommendations, like our questions themselves, remains a work in progress. Our goal is to iterate and be responsive as we learn.
More than the specifics of our initiative, what’s important is our effort to foster a new approach to data initiatives as well as problem-solving and policymaking. We aim for a process that is demand-driven and participatory in nature. We also believe that our approach of sourcing questions in a transparent and participatory manner may ultimately lead to a more legitimate, trusted and effective way of agenda-setting.
Finally, it is worth mentioning that our search for a new science of questions extends beyond the realms of big data and data science. It is relevant across domains and to various stakeholders.
We hope that this specific initiative can serve as a template for future efforts to reinvent not only how we ask questions, but also how we govern and make policy. In line with our crowdsourcing approach, we would welcome your views on any aspect of the above. You can share them with us by contacting us at the 100 Questions Initiative website or by emailing me. — Stefaan G. Verhulst

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License