Seven Principles for Big Data and Resilience Projects


PopTech & Rockefeler Bellagio Fellows: “The following is a draft “Code of Conduct” that seeks to provide guidance on best practices for resilience building projects that leverage Big Data and Advanced Computing. These seven core principles serve to guide data projects to ensure they are socially just, encourage local wealth- & skill-creation, require informed consent, and be maintainable over long timeframes. This document is a work in progress, so we very much welcome feedback. Our aim is not to enforce these principles on others but rather to hold ourselves accountable and in the process encourage others to do the same. Initial versions of this draft were written during the 2013 PopTech & Rockefeller Foundation workshop in Bellagio, August 2013.
Open Source Data Tools – Wherever possible, data analytics and manipulation tools should be open source, architecture independent and broadly prevalent (R, python, etc.). Open source, hackable tools are generative, and building generative capacity is an important element of resilience….
Transparent Data Infrastructure – Infrastructure for data collection and storage should operate based on transparent standards to maximize the number of users that can interact with the infrastructure. Data infrastructure should strive for built-in documentation, be extensive and provide easy access. Data is only as useful to the data scientist as her/his understanding of its collection is correct…
Develop and Maintain Local Skills – Make “Data Literacy” more widespread. Leverage local data labor and build on existing skills. The key and most constraint ingredient to effective data solutions remains human skill/knowledge and needs to be retained locally. In doing so, consider cultural issues and language. Catalyze the next generation of data scientists and generate new required skills in the cities where the data is being collected…
Local Data Ownership – Use Creative Commons and licenses that state that data is not to be used for commercial purposes. The community directly owns the data it generates, along with the learning algorithms (machine learning classifiers) and derivatives. Strong data protection protocols need to be in place to protect identities and personally identifying information…
Ethical Data Sharing – Adopt existing data sharing protocols like the ICRC’s (2013). Permission for sharing is essential. How the data will be used should be clearly articulated. An opt in approach should be the preference wherever possible, and the ability for individuals to remove themselves from a data set after it has been collected must always be an option. Projects should always explicitly state which third parties will get access to data, if any, so that it is clear who will be able to access and use the data…
Right Not To Be Sensed – Local communities have a right not to be sensed. Large scale city sensing projects must have a clear framework for how people are able to be involved or choose not to participate. All too often, sensing projects are established without any ethical framework or any commitment to informed consent. It is essential that the collection of any sensitive data, from social and mobile data to video and photographic records of houses, streets and individuals, is done with full public knowledge, community discussion, and the ability to opt out…
Learning from Mistakes – Big Data and Resilience projects need to be open to face, report, and discuss failures. Big Data technology is still very much in a learning phase. Failure and the learning and insights resulting from it should be accepted and appreciated. Without admitting what does not work we are not learning effectively as a community. Quality control and assessment for data-driven solutions is notably harder than comparable efforts in other technology fields. The uncertainty about quality of the solution is created by the uncertainty inherent in data…”