Hi folks, newbie on here, so bear with me !
Hope I can help contribute to others questions in the future but I naturally start with one of my own ! And wondering how any of you experienced practitioners have dealt with this in the past ?
We have a “live” database … and a research database that we want to create from this. The research database needs to be anonymised.
Does anyone know of any tools or best practice techniques that can fully anonymise a research data set ? In my example, we can easily discard all the personal details and address details. thats not the problem. But crudely put. we have tables A, B, C (with some example identifiers)
Table A – Patient table
- patient ID = 23667
- patient field 1
- …
- patient field N
Table B – Interview table
- patient ID = 23667
- interview session ID = 3474
- interview session field 1
- …
- interview session field N
Table C – Questionnaire table
- interview session ID = 3474
- questionnaire ID = 42766
- questionnaire field 1
- …
- questionnaire field N
and so on and so on…
These IDs uniquely identify specific events in the patients chronology in the database. And of course since its a pretty well normalised database, these IDs are also the keys that logically join the tables together.
I want to transform these keys in the anonymised research database, so that they no longer correspond to those in the live database (ie no one can be traced back). But I obviously want these transformed keys to maintain the logical relationship between the tables.
A long time ago I used to develop software, so I can think of some very crude ugly ways to do it. But there must be better smarter ways these days. And I can’t be the first person who wants to do this.
Does anyone have any hints, tips, techniques or tools they could point me to ?
thanks !
Paul Martin