Categories: Big Data
Text Wash, a new software tool that anonymizes personally identifiable text data, making it accessible to social scientists without compromising its usability for research, has just won the SAGE Concept Grant. This year’s award comes to roughly $30,000.
Text Wash is being developed by Bennett Kleinberg, Maximilian Mozes and Toby Davies from the Department of Security and Crime Science at University College, London. The concept grant will enable the team to get the tool off the ground and promote ethical and intelligent data sharing practices.
When it comes to doing research with text data, many datasets are protected through ethics boards’ restrictions and wider data protection frameworks such as the European Union’s General Data Protection Regulation (e.g., police reports or patient files). As a result, such unique datasets are rarely shared, so that research using text data often focuses on readily available data at the expense of data that could help answer more pressing research questions.
Where they are shared, current approaches to anonymize these data render the texts unusable for follow-up research. Text Wash solves this problem by enabling the anonymization of text data without compromising its quality. It does this by using natural language processing and machine learning to identify and replace sensitive information while preserving the semantic and grammatical structures in text. Importantly, personally identifiable information is determined in close collaboration with data protection officers from the government and the police.
“The idea for Text Wash,” said Kleinberg, “came from the observation that many organizations are, in principle, willing to share raw text data for research purposes but are reluctant to do so due to data protection issues. We are excited to put our ideas into practice with this concept grant. Ultimately we hope that we can open up access to a yet-untapped treasure of data to make research more relevant.”
Text Wash will be available as an R-package and as standalone software for non-technical users. For more information, contact: firstname.lastname@example.org
SAGE Ocean made the prize announcement on June 25.
SAGE Vice President of Product Innovation Katie Metzler noted that in its second year, 31 percent of of the 47 applications received were led by women, an increase from 21 percent in 2018. “As part of our commitment to encouraging diversity within computational social science, we would like to encourage more applications from women and diverse applicants in 2020.”
The Concept Grant program is a key part of the SAGE Ocean initiative to enable social scientists to work with big data and new technology. To stay up to date with the latest news and ensure you receive the next call for applications, subscribe to the Big Data Newsletter.