Data Ethics: Investing Wisely in Data at Scale

Report by David Robinson & Miranda Bogen prepared for the MacArthur and Ford Foundations: ““Data at scale” — digital information collected, stored and used in ways that are newly feasible — opens new avenues for philanthropic investment. At the same time, projects that leverage data at scale create new risks that are not addressed by existing regulatory, legal and best practice frameworks. Data-oriented projects funded by major foundations are a natural proving ground for the ethical principles and controls that should guide the ethical treatment of data in the social sector and beyond.

This project is an initial effort to map the ways that data at scale may pose risks to philanthropic priorities and beneficiaries, for grantmakers at major foundations, and draws from desk research and unstructured interviews with key individuals involved in the grantmaking enterprise at major U.S. foundations. The resulting report was prepared at the joint request of the MacArthur and Ford Foundations.

Grantmakers are exploring data at scale, but currently have poor visibility into its benefits and risks. Rapid technological change, the scarcity of data science expertise, limited training and resources, and a lack of clear guideposts around emergent risks all contribute to this problem.

Funders have important opportunities to invest in, learn from, and innovate around data-intensive projects, in concert with their grantees. Grantmakers should not treat the new ethical risks of data at scale as a barrier to investment, but these risks also must not become a blind spot that threatens the success and effectiveness of philanthropic projects. Those working with data at scale in the philanthropic context have much to learn: throughout our conversations with stakeholders, we heard consistently that grantmakers and grantees lack baseline knowledge on using data at scale, and many said that they are unsure how to make better informed decisions, both about data’s benefits and about its risks. Existing frameworks address many risks introduced by data-intensive grantmaking, but leave some major gaps. In particular, we found that:

Some new data-intensive research projects involve meaningful risk to vulnerable populations, but are not covered by existing human subjects regimes, and lack a structured way to consider these risks. In the philanthropic and public sector, human subject review is not always required and program officers, researchers, and implementers do not yet have a shared standard by which to evaluate ethical implications of using public or existing data, which is often exempt from human subjects review.
Social sector projects often depend on data that reflects patterns of bias or discrimination against vulnerable groups, and face a challenge of how to avoid reinforcing existing disparities. Automated decisions can absorb and sanitize bias from input data, and responsibly funding or evaluating statistical models in data-intensive projects increasingly demands advanced mathematical literacy which foundations lack.
Both data and the capacity to analyze it are being concentrated in the private sector, which could marginalize academic and civil society actors.Some individuals and organizations have begun to call attention to these issues and create their own trainings, guidelines, and policies — but ad hoc solutions can only accomplish so much.

To address these and other challenges, we’ve identified eight key questions that program staff and grantees need to consider in data-intensive work:

For a given project, what data should be collected, and who should have access to it?
How can projects decide when more data will help — and when it won’t?
How can grantmakers best manage the reputational risk of data-oriented projects that may be at a frontier of social acceptability?
When concerns are recognized with respect to a data-intensive grant, how will those concerns get aired and addressed?
How can funders and grantees gain the insight they need in order to critique other institutions’ use of data at scale?
How can the social sector respond to the unique leverage and power that large technology companies are developing through their accumulation of data and data-related expertise?
How should foundations and nonprofits handle their own data?
How can foundations begin to make the needed long term investments in training and capacity?

Newly emergent ethical issues inherent in using data at scale point to the need for both a broader understanding of the possibilities and challenges of using data in the philanthropic context as well as conscientious treatment of data ethics issues. Major foundations can play a meaningful role in building a broader understanding of these possibilities and challenges, and they can set a positive example in creating space for open and candid reflection on these issues. To those ends, we recommend that funders:…(More)”