We are kicking off a three-month focus on data analysis, starting with Analyzing Words, Pictures, and Numbers in July. This month we will have the opportunity to learn new ideas and practical skills from Mentors in Residence Stephen Gorard, Jean Breny, and Shannon McMorrow. Find the unfolding series through this link.
In statistics and the philosophy of science, “randomness” means something rather more than its everyday meaning of merely haphazard, or without apparent pattern. It is stronger, in the sense that something either is or is not random. There are no degrees of randomness, because a random event is one that is completely unpredictable in form, outcome or timing. Randomness is the quality of such an event – its unpredictability, and lack of intention.
Applied to a set of numbers, randomness means that each number in the set is unrelated to any other. Knowing one or more numbers in the set will not help identify any other numbers in the set. By analogy, if a standard six-sided die roll has a random outcome, then knowing the previous 10, 100 or 1,000 results from that die will not assist you in predicting of the next one. The chances of guessing the result of the next roll correctly remain at 1 in 6, however many rolls have been seen previously. Randomness is the characteristic of chance, as illustrated by a fair die.
There have been various attempts to define randomness more formally than this over time, in terms of unpredictability, the non-computability of random events, and the inability to describe a set of random elements more efficiently than by repeating the entire set.
Randomness and social science
The term ‘random’ is widely used in social science in the context of sampling and statistical analysis. A sample is random if all of the cases in it were selected by chance from a larger set of cases known as the population, and if all of the cases in the population had a genuine chance of being in the sample. A population itself is clearly not a random sample, and nor is a sample selected by other means (ad hoc, convenience, purposive etc.). A sample selected at random but in which cases cannot be found, do not respond, or are otherwise not recorded is no longer a random sample. And there is no reason to believe that such missing cases are a random subset of the planned sample. Those refusing to take part in a piece of research, for example, can be predicted on the basis of their prior characteristics with more success than chance alone.
None of the statistical techniques predicated on working with a random sample can or should be used with populations, or with samples that are not random or incomplete. This means that significance tests of significance, p-values, standard errors, or confidence intervals should not be calculated or reported with such cases. It also means that anything like these should be ignored in the work of others unless the study is based on a complete random sample. I have never seen a real study that had a complete random sample, which means that such statistical techniques should rarely, if ever, be reported. However, their abuse remains widespread in the literature.
Real-life samples cannot and should not be used with any technique predicated on random samples. Given that randomness is an ideal not often seen in practice in social science samples, it is most important that the much more common non-random (and incomplete “random”) samples are analysed properly. We need to teach more about the appropriate techniques for handling these real-life samples that are not random, even though these approaches are largely ignored in most statistical texts. What these techniques are will be addressed in future blogs, and are covered in my new book.
Stephen Gorard is the author of How to Make Sense of Statistics, Professor of Education and Public Policy, and Director of the Evidence Centre for Education, at Durham University. He is a Fellow of the Academy of Social Sciences, and a member of the Cabinet Office Trials Advice Panel as part of the Prime Minister’s Implementation Unit. His work concerns the robust evaluation of education as a lifelong process. He is author of around 30 other books and over 1,000 other publications. Stephen is currently funded by the British Academy to look at the impact of schooling in India and Pakistan, by the Economic and Research Council to work out how to improve the supply and retention of teachers, and by the Education Endowment Foundation to evaluate the impact of reduced teacher marking in schools. Follow him on Twitter @SGorard.