What Are Data? A Categorization of the Data Sensitivity Spectrum

Paper by John M.M. Rumbold and Barbara K. Pierscionek in Big Data Research: “The definition of data might at first glance seem prosaic, but formulating a definitive and useful definition is surprisingly difficult. This question is important because of the protection given to data in law and ethics. Healthcare data are universally considered sensitive (and confidential), so it might seem that the categorisation of less sensitive data is relatively unimportant for medical data research. This paper will explore the arguments that this is not necessarily the case and the relevance of recognizing this.

The categorization of data and information requires re-evaluation in the age of Big Data in order to ensure that the appropriate protections are given to different types of data. The aggregation of large amounts of data requires an assessment of the harms and benefits that pertain to large datasets linked together, rather than simply assessing each datum or dataset in isolation. Big Data produce new data via inferences, and this must be recognized in ethical assessments. We propose a schema for a granular assessment of data categories. The use of schemata such as this will assist decision-making by providing research ethics committees and information governance bodies with guidance about the relative sensitivities of data. This will ensure that appropriate and proportionate safeguards are provided for data research subjects and reduce inconsistency in decision making…(More)”.