AI tools are increasingly being used to track and monitor us both online and in-person, yet their effectiveness comes with big risks. Computer scientists at the Oxford Internet Institute, Imperial College London, and UCLouvain have developed a new mathematical model which could help people better understand the risks posed by AI and assist regulators in protecting peoples’ privacy. The findings have been published in Nature Communications.
For the first time, the method provides a robust scientific framework for evaluating identification techniques, especially when dealing with large-scale data. This could include, for instance, monitoring how accurate advertising code and invisible trackers are at identifying online users from small pieces of information such as time zone or browser settings (a technique called ‘browser fingerprinting’).
Lead author Dr Luc Rocher, Senior Research Fellow, Oxford Internet Institute, part of the University of Oxford, said: “We see our method as a new approach to help assess the risk of re-identification in data release, but also to evaluate modern identification techniques in critical, high-risk environments. In places like hospitals, humanitarian aid delivery, or border control, the stakes are incredibly high, and the need for accurate, reliable identification is paramount.”
The method draws on the field of Bayesian statistics to learn how identifiable individuals are on a small scale, and extrapolate the accuracy of identification to larger populations up to 10x better than previous heuristics and rules of thumb. This gives the method unique power in assessing how different data identification techniques will perform at scale, in different applications and behavioural settings. This could help explain why some AI identification techniques perform highly accurately when tested in small case studies but then misidentify people in real-world conditions.
The findings are highly timely, given the challenges posed to anonymity and privacy caused by the rapid rise of AI-based identification techniques. For instance, AI tools are being trialled to automatically identify humans from their voice in online banking, their eyes in humanitarian aid delivery, or their face in law enforcement.
According to the researchers, the new method could help organisations to strike a better balance between the benefits of AI technologies and the need to protect people’s personal information, making daily interactions with technology safer and more secure. Their testing method allows for the identification of potential weaknesses and areas for improvement before full-scale implementation, which is essential for maintaining safety and accuracy.
Co-author Associate Professor Yves-Alexandre de Montjoye (Data Science Institute, Imperial College, London) said: “Our new scaling law provides, for the first time, a principled mathematical model to evaluate how identification techniques will perform at scale. Understanding the scalability of identification is essential to evaluate the risks posed by these re-identification techniques, including to ensure compliance with modern data protection legislations worldwide.”
Dr Luc Rocher concluded: “We believe that this work forms a crucial step towards the development of principled methods to evaluate the risks posed by ever more advanced AI techniques and the nature of identifiability in human traces online. We expect that this work will be of great help to researchers, data protection officers, ethics committees, and other practitioners aiming to find a balance between sharing data for research and protecting the privacy of patients, participants, and citizens.”
Article by:Source