# A question about preparing questionnaire data for statistical analysis?

Home Forums Methodspace discussion A question about preparing questionnaire data for statistical analysis?

Viewing 6 posts - 1 through 6 (of 6 total)
• Author
Posts
• #888
Marko Sobol
Member

The question may seem trivial, but although I have some experience in doing statistical analysis I’m having a dilemma nevertheless. And here it is, let’s say I want to statistically test differences between two or more groups, and dependent variable was measured with a questionnaire, let’s say on interval scale. The construct I want to test is measured with more than one question or statement (i.e. extraversion tested with several different questions or statements: “I consider myself a talkative person.”, “I like to attend parties.” and so on), and each of this items is answered on a five point Likert scale. The dilemma is when I combine answers on each of these questions/statements for each participant, to obtain total score (i.e. extraversion), should I use absolute values and simply sum all answers from all the items that measure extraversion for each participant i.e., 3+5+4+3+5, etc,.or calculate mean or some other value and then proceed with the analysis using this value as data?

What would the answer be if some of items were answered on a five point Likert scale and some on other type of scale and they both need to be totalled somehow to define one underlying construct?

#893
Dave Collingridge
Participant

Marko,

Anytime you want to combine certain questions into a single score representing a construct like extroversion, you should first check whether those questions are indeed representative of the construct. This can be done with principal components analysis (PCA). In a PCA analysis the questions believed to measure extroversion should “load” onto the same component or factor with adequate factor loadings. I once had a researcher bring me questionnaire data that she wanted analyzed by combining questions and then comparing the aggregate scores based on gender. A PCA showed that none of the questions she wanted combined loaded onto the same factor. She ended up analyzing questions separately.

As for computing an aggregate score, I recommend adding responses for each question and then computing the mean of those scores – do this separately for each participant so that each person ends up with mean score representing extroversion. You can then run a between groups t-test comparison of the mean component score.

Just one more thing. If your Likert survey uses the typical strongly disagree to strongly agree scale, this is ordinal data. If you analyze the questions individually you should a nonparametric test like the Wilcoxon Rank Sum. Fortunately it is acceptable to use parametric tests like the t-test and ANOVA when comparing aggregated component scores, which is what you are doing.

#892
Stephen Gorard
Participant

Remember, the use of a significance test (which does not make much sense ever) would only only be justified if your data is based on a full random sample. I have never seen a survey that was. So, forget that issue. No need for sig tests. Use correlations if both variables are real numbers, and quote R-squared as your ‘effect’ size.

But, if you believe the idea that there are ‘levels of measurement’, then you cannot use the agree/disagree responses as real numbers. They are at best ordinal and you cannot add them up etc.

My usual question to students is ‘what did you plan to do for analysis when you designed the instrument in this way?’. None of these issues should be a surprise, and research needs to be (at least loosely) planned beforehand.

I think these composite variables are nonsense (and the extract I have uploaded here shows why). The whole chapter might be a useful heads-up. I attach an extract from a chapter in the Sage Handbook of Measurement, that can be cited as:

Gorard, S. (2010) Measuring is more than assigning numbers, pp.389-408 in Walford, G., Tucker, E. and Viswanathan, M. (Eds.) Sage Handbook of Measurement, Los Angeles: Sage

Instead, I suggest you analyse the best single question from each set (which if you could do PCA would perhaps be the one with the highest loading, but PCA is designed to work with real numbers). And look at the range of response in the light of the other (interval) variable.

Next time, plan the analysis before using the instrument, and adjust items accordingly. OK?

#891
Marko Sobol
Member

#890
Marko Sobol
Member