How to Navigate the Nuances of Anonymous and De-Identified Data in AI-Driven Classrooms
As the Director of Quantitative Research and Data Science, as well as the Data Privacy Officer at Digital Promise, I aim to demystify the complex world of data privacy, particularly in the realm of education and AI tools. Having begun my journey as an Institutional Review Board (IRB) committee member during my graduate school years, I’ve been committed to upholding ethical principles in data usage, such as those outlined in The Belmont Report. Collaborating with researchers to ensure their work aligns with these principles has been a rewarding part of my career. Over the past decade, I’ve grappled with the nuances of anonymous and de-identified data, a challenge shared by many in this field. In a time when student data is being captured and used more prolifically than we know, understanding how privacy is maintained is crucial to protecting our learners.
Anonymous Versus De-Identified
The Department of Education defines de-identified data as information from which personally identifiable details have been sufficiently removed or obscured, making it impossible to re-identify a person. However, it may still contain a unique identifier that could potentially re-identify the data.
Similarly, the General Data Protection Regulation (GDPR) characterizes anonymous data as information that does not relate to any identified or identifiable individual or data that has been rendered anonymous to the extent that the data subject cannot be identified.
These definitions, while seemingly similar, often lack clarity and consistency in literature and research. A review of medical publications revealed that less than half of the papers discussing de-identification or anonymization provided clear definitions, and when definitions were provided, they frequently contradicted one another. De-identified data can be considered anonymized if enough potentially identifiable information is removed, as suggested in HIPAA data de-identification methods. Conversely, others contend that anonymous data is data from which identifiers were never collected, implying that de-identified data can never be truly anonymous.
Simplifying Data Privacy: Three Key Strategies for Educators
As AI tools become prolific in classrooms, it is easy to become overwhelmed with the nuance of these terms. Moreover, our news feeds are inundated with these conversations related to student privacy: Parents are concerned about data privacy, teachers reportedly don’t know enough about student privacy and most school districts still lack data-privacy personnel.
In a time when the difference between anonymous and de-identified could matter greatly, what are educators to do about the data collected by AI tools they might use? I offer three overly simplified strategies.
1. Ask.
In 2020, Visual Capitalist developed a visualization of the length of the fine print for 14 popular apps and shared that the average American would need to set aside almost 250 hours to read all the digital contracts they accept while using online services.
If you do not want to spend hours researching whether the company collects and uses anonymous or de-identified data and how it defines it, you can always ask. A few examples of these questions include:
- What data will you collect?
- Can that data be connected back to the students themselves?
- How will data be used?
- Can a student or parent/guardian request that their data be deleted (if you live in California, the answer is often Yes!), and how would they go about doing that?
2. Give Students Choice.
The Belmont Report states that in order to uphold the Respect for Persons principle, individuals should be given the opportunity to choose what shall and shall not happen to them and, by extension, their data. Providing students the opportunity to choose whether they want to use an AI tool that will make use of their data whenever possible upholds this important ethics standard and gives students autonomy as they traverse this tech-rich world.
3. Allow Parents to Consent.
A further look at the Respect for Persons principle shows that individuals with diminished autonomy are entitled to protection. The Common Rule, or the federal regulations that outline processes for ethical research in the United States, states that children are persons who have not yet attained the legal age for consent and are one of the many groups entitled to this protection. In a practical application, this means that permission is needed by parents or guardians for participation, in addition to the child’s consent.
To the greatest extent possible, parents should also have the opportunity to understand and agree to a child’s data being gathered and used.
Let’s Navigate the Nuances Together
As someone who has been thinking about how to best protect students’ data since before you could wear your iPhone on your wrist, I regularly rely on these three strategies to best uphold the ethical principles that have guided my career. I ask when I do not understand, I strive to give individuals autonomy over their choices and their data and I seek consent when additional protection is needed. While these three practices won’t allay every fear one may have about the use of AI in classrooms, they will allow you to gather the information you need to make better choices for your students, and I have confidence that we can navigate the nuance together!