Data Stewards: a new Role and Responsibility for an AI and Data Age
As data grows increasingly prevalent in our economy and society, it is increasingly clear, too, that tremendous value can be derived from reusing and combining previously separate data sets. Of course, recombination poses certain risks, notably to privacy. At the GovLab, an action research center at the Tandon School of Engineering of New York University, we are researching ways that data can be reused and combined responsibly so that the benefits outweigh the dangers, and in a manner that makes a genuine contribution toward solving important public problems.
One avenue that holds particular promise is the emerging structure of Data Collaboratives. Data collaboratives occur when institutions or organizations provide access to data, usually across sectors, in new and innovative ways. Some of the most interesting Data Collaboratives result when private sector entities share access to their vast stores of (often siloed) data with civil society, public officials or others working in the public interest. Data Collaboratives have, for instance:
- Helped streamline aid delivery to victims of natural disaster by leveraging Facebook data on people’s movement after natural disasters with aid workers;
- Measured people’s economic resilience to natural disasters using sale payments and ATM cash withdrawal data from the bank BBVA in Mexico; and
- Provided better urban insights to city officials by leveraging Waze data through its Connected Citizen Program,
The importance of “Data Stewards”
Not all Data Collaboratives succeed. We have come across many failures over the course of our research. Success depends on several factors, including whether a problem statement has been clearly articulated, whether the recipient organization possesses the requisite technical capacity to handle the data, can both parties trust one another and whether the data itself is accessible and of acceptable and usable quality (e.g., being free of errors, bias and inconsistencies).
Success depends on the existence within sharing organizations of individuals or teams that are empowered to proactively initiate, facilitate and coordinate data collaboratives when they may be useful or necessary.
Crucially, we have found that success also depends on the existence within sharing organizations of individuals or teams that are empowered to proactively initiate, facilitate and coordinate data collaboratives when they may be useful or necessary. We call such individuals and teams “data stewards”. Data stewards have the requisite expertise and authority to recognize opportunities for productive collaborations or to respond to external requests for data. They systematize the process of partnering, and help scale efforts when there are fledgling signs of success.
Several companies — including Mastercard, Facebook, LinkedIn, Uber, Cloudera, and Digital Globe — have recently put into place data steward roles (though they don’t necessarily call them by that title). These individuals or teams are tasked with the responsibility of identifying, structuring, guiding and evaluating collaborative opportunities
Defining Data Steward(ship)
Yet even as we see more data steward-type roles defined within companies, there exists considerable confusion about just what they should be doing. In particular, we have noticed a tendency to conflate the roles of data stewards with those of individuals or groups who might be better described as chief privacy, chief data or security officers. This slippage is perhaps understandable, but our notion of the role is somewhat broader. While privacy and security are of course key components of trusted and effective data collaboratives, the real goal is to leverage private data for broader social goals — while preventing harm.
So what are the necessary attributes of data stewards? What are their roles, responsibilities, and goals of data stewards? And how can they be most effective, both as champions of sharing within organizations and as facilitators for leveraging data with external entities? These are some of the questions we seek to address in our current research, and below we outline some key preliminary findings.
The following “Three Goals” and “Five Functions” can help define the aspirations of data stewards, and what is needed to achieve the goals. While clearly only a start, these attributes can help guide companies currently considering setting up sharing initiatives or establishing data steward-like roles.
The Three Goals of Data Stewards
- Collaborate: Data stewards are committed to working and collaborating with others, with the goal of unlocking the inherent value of data when a clear case exists that it serves the public good and that it can be used in a responsible manner.
- Protect: Data stewards are committed to managing private data ethically, which means sharing information responsibly, and preventing harm to potential customers, users, corporate interests, the wider public and of course those individuals whose data may be shared.
- Act: Data stewards are committed to pro-actively acting in order to identify partners who may be in a better position to unlock value and insights contained within privately held data.
The Five Functions of Data Stewards
- Partnership and Community Engagement: A central function of data stewards involves developing and implementing a more proactive and responsive approach to reaching out to and vetting potential partners. In addition, data stewards should be informing potential beneficiaries (and others) of the insights generated as a result of data collaborative efforts. In general, data stewards are responsible for engaging with all actors (both within and without their companies) who may be affected by or otherwise have a stake in the innovative use of data.
- Internal Coordination and Staff Engagement: Establishing a successful data collaborative requires internal coordination and sign-off from various company actors — including, for instance, the legal, technical, data, marketing and sales teams. Data stewards are key to ensuring internal stakeholders and company leadership are informed and aligned. In addition, data stewards often play an important role in mapping and matching staff that have specific skills, such as data science abilities, or interest in data collaborative initiatives.
- Data Audit, Ethics and Assessment of Value and Risk: Data stewards are responsible for monitoring and assessing the value, potential, and risk of all data held within an organization. This responsibility includes knowing what data the organization collects, and what public interest questions that data could potentially help answer. Data stewards should also be in involved in preparing data for analysis, assessing the challenges involved in sharing that data, as well as steps to minimize those challenges and other risks at different steps of the data value chain. Finally, data stewards should consider the ethical implications and be involved in establishing externally validated public impact measurements (both benefits and harms, if any) that result from data collaborative that leverage private data for the public good.
- Dissemination and Communication of Findings: Data stewards often act as the public face of their company’s data projects, and they are responsible for raising awareness, disseminating findings and communicating shared outcomes from Data Collaboratives. Data stewards may also be responsible for overall communication with customers, users, partners, government and other stakeholders about regulatory compliance, contractual obligations and how data is being shared and used, and what public benefits -potential and realized — it has had.
- Nurture Data Collaboratives to Sustainability: Many ambitious data collaborative projects collapse after initial pilots or experiments. Data collaboratives can play a valuable role (partly through their dissemination and communication functions) in nurturing and helping scale these projects until they are sustainable. While data stewards may not themselves have the requisite budget to ensure long-term sustainability themselves, they must work with a variety of stakeholders to gather the needed resources and support so as to ensure broad and long-term impact.
The Data Stewards Network
As part of our ongoing research, and in an effort to flesh out and better understand the attributes outlined above, the GovLab has recently launched a Data Stewards Network (datastewards.net). The primary goal of the network is to connect existing and aspiring corporate data stewards so as to jointly develop methodologies, tools, and frameworks that could help build more efficient, collaborative, impactful and safe data collaboratives. We are in effect building a directory of existing data stewards and database of best practices and case studies to advance the cause of data sharing and public-private partnerships.
Whether you are a data steward yourself or just someone interested in the potential of data collaboratives, we invite you to visit the website and get in touch. Over the next few months, we will be launching a dedicated on-line platform to facilitate shared learning. We welcome your suggestions, thoughts, and ideas — on any of the issues raised by this post, or in data sharing more generally.
(Many thanks to Claudia Juech, William Hoffman, Andrew Young, Andrew Zahuranec, Andrew Puddephat and Charles Bradley for comments on earlier versions of this primer)