A question about preparing questionnaire data for statistical analysis?

Home Forums Methodspace discussion A question about preparing questionnaire data for statistical analysis?

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #888
    Marko Sobol

    The question may seem trivial, but although I have some experience in doing statistical analysis I’m having a dilemma nevertheless. And here it is, let’s say I want to statistically test differences between two or more groups, and dependent variable was measured with a questionnaire, let’s say on interval scale. The construct I want to test is measured with more than one question or statement (i.e. extraversion tested with several different questions or statements: “I consider myself a talkative person.”, “I like to attend parties.” and so on), and each of this items is answered on a five point Likert scale. The dilemma is when I combine answers on each of these questions/statements for each participant, to obtain total score (i.e. extraversion), should I use absolute values and simply sum all answers from all the items that measure extraversion for each participant i.e., 3+5+4+3+5, etc,.or calculate mean or some other value and then proceed with the analysis using this value as data?

    What would the answer be if some of items were answered on a five point Likert scale and some on other type of scale and they both need to be totalled somehow to define one underlying construct?

    Dave Collingridge


    Anytime you want to combine certain questions into a single score representing a construct like extroversion, you should first check whether those questions are indeed representative of the construct. This can be done with principal components analysis (PCA). In a PCA analysis the questions believed to measure extroversion should “load” onto the same component or factor with adequate factor loadings. I once had a researcher bring me questionnaire data that she wanted analyzed by combining questions and then comparing the aggregate scores based on gender. A PCA showed that none of the questions she wanted combined loaded onto the same factor. She ended up analyzing questions separately.

    As for computing an aggregate score, I recommend adding responses for each question and then computing the mean of those scores – do this separately for each participant so that each person ends up with mean score representing extroversion. You can then run a between groups t-test comparison of the mean component score. 

    Just one more thing. If your Likert survey uses the typical strongly disagree to strongly agree scale, this is ordinal data. If you analyze the questions individually you should a nonparametric test like the Wilcoxon Rank Sum. Fortunately it is acceptable to use parametric tests like the t-test and ANOVA when comparing aggregated component scores, which is what you are doing.  

    Stephen Gorard

    Remember, the use of a significance test (which does not make much sense ever) would only only be justified if your data is based on a full random sample. I have never seen a survey that was. So, forget that issue. No need for sig tests. Use correlations if both variables are real numbers, and quote R-squared as your ‘effect’ size. 

    But, if you believe the idea that there are ‘levels of measurement’, then you cannot use the agree/disagree responses as real numbers. They are at best ordinal and you cannot add them up etc. 

    My usual question to students is ‘what did you plan to do for analysis when you designed the instrument in this way?’. None of these issues should be a surprise, and research needs to be (at least loosely) planned beforehand. 

    I think these composite variables are nonsense (and the extract I have uploaded here shows why). The whole chapter might be a useful heads-up. I attach an extract from a chapter in the Sage Handbook of Measurement, that can be cited as:

    Gorard, S. (2010) Measuring is more than assigning numbers, pp.389-408 in Walford, G., Tucker, E. and Viswanathan, M. (Eds.) Sage Handbook of Measurement, Los Angeles: Sage

    Instead, I suggest you analyse the best single question from each set (which if you could do PCA would perhaps be the one with the highest loading, but PCA is designed to work with real numbers). And look at the range of response in the light of the other (interval) variable. 

    Next time, plan the analysis before using the instrument, and adjust items accordingly. OK?

    Marko Sobol

    Thank you for your answer, it was very informative.

    Marko Sobol

    Thank you for your answer, also. I’ll be glad to read through the documents and referenced texts you provided.

    John F Hall

    Check out the tutorials on my site, http://www.surveyresearch.weebly.com especially


    It’s worth doing a quick correlation to see how well the items hang together. Reverse polarity items will need to be recoded to create a scale score.  Technically your items are ordinal, but old hands like me ignore the rules and add them up anyway.  We need a report yesterday, not a PhD thesis in three years’ time!  If you’re using SPSS, you can send me your questionnaire and *.sav file and I’ll check it out, but I’m not doing your homework for you..

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.