The fundamental difference between data and statistics (because who knew!)
Before I started working on SAGE Stats, the idea of working with a large data set was quite intimidating. Shout out to the USDA’s Food Access Research Atlas! In the two years since, working regularly with our platform has really opened my eyes to how empowering and beautiful data is once you understand how to pull usable information from it.
My experience has also taught me how overwhelming and confusing data can be. What is a data set and how is it different than a time series? How can I tell if data content is reliable or not? What the heck is a data dictionary and why do I need it? Unless you are consistently elbows deep in data, it can be difficult knowing where to even start. So let’s begin with the very basics: what is the difference between data and statistics?
The two terms are often used interchangeably – even within the same breath. I have even caught myself using both terms in explaining SAGE Stats to team members and close friends without a second thought. Although it is easy to synonymize the two, they are in fact very different.
Data are collected and organized information typically provided in massive files with detailed records and a data dictionary to decode the variable information. The records in those data files do not communicate significant meaning to the naked eye, so time and analysis are needed to read through the data collection methodology, decipher variable information, and determine which variables are of interest to you.
Statistics are clear and understandable explanations or summaries of data based on analysis. Statistics are generally available in tables and represented graphically. For example, the median state unemployment rate in the U.S. was 4.0 percent in 2016. This is a statistic derived from analysis of sample data collected by the U.S. federal government.
So statistics are better than data, right?
Not necessarily. Whether you need data or statistics really depends on your research question. Data is needed when your research question addresses a new issue that hasn’t been explained or thoroughly explored yet – this requires a deep dive into data where you must analyze and derive meaningful knowledge that can answer your question.
A more straightforward research question, however, can be more quickly answered with statistics because the question has been asked before and so the analysis to answer that question has also already been done. For instance, a student who needs information on unemployment across the Rust Belt states can easily find an answer because that information is frequently processed by the federal government for its own assessment of the economic climate.
The difference between data and statistics lies in the analysis. Data needs to be analyzed to be understood, but a statistic can be understood right away. The next question is: how do I begin to analyze data to get the statistics I need? Stay tuned for my next blog post for tips on just that!