Composite score

Home Forums Methodspace discussion Composite score

Viewing 13 posts - 1 through 13 (of 13 total)
  • Author
    Posts
  • #1046
    Mehdi Riazi
    Member

    I was wondering if anyone has the experience of calculating a composite score and could share it with me. I am thinking of a construct (say, for example, cohesion) and try to calculate a composite score which represents this construct by adding the individual scores of some variables related to it. Simply put, if I have 5 scores for “connectives”, “non-overlap”, etc. which correlate with each other (implying they are related to the same construct) can I add them up to calculate a composite score for cohesion? The question of different scales for each of the variables will indeed be resolved by first changing each into a z score and then adding up the z scores. 

    #1058
    Dave Collingridge
    Participant

    Mehdi,

    If the five scores are adequately correlated then it sounds reasonable to combine them into a composite score. Another method is to check inter-question relationship with Cronbach’s Alpha. If the 5 scores are on different scales, it is a good idea to normalize across scales by converting them into z-scores using separate means and SDs from each scale. You would not want to add up the 5 z-scores for each individual, rather you might try finding the mean and/or median of an individual’s z-score. This mean could be called a person’s average standardized composite score. For example, if someone has an average standardized composite score of 1.5, that would tell me that she, on average, scored 1.5 standard deviations above the rest of the group. Just one way to do it. Someone else might have a better idea.

    #1057
    Rafael Garcia
    Participant

    If you standardize you’re scales (normal) and take the mean you have what some refer to as a unit-weighted factor score (see FACTOR ANALYSIS by Gorsuch 1983). It is similar to a weighted factor score (like you would get by doing an EFA), but tends to generalize better from study to study. By taking the part-whole correlations between the scales and the unit-weighted factor, you can get analogs of the standardized factor loadings (though they are usually inflated).

    I endorse Dave’s suggestions.

    #1056
    Mehdi Riazi
    Member

    Thank you Dave and Rafael for your prompt responses which prove useful. Let me elaborate a bit more and see what you think.

    1) In response to Dave, I should say yes, the scores adequately correlate with a minimum of r=0.40 which I then thought there’s no need to run Cronbach’s Alpha. Also, the 5 scores are on different scales and I’ve changed them into z-scores.

    2) In response to Rafael, I did not go for EFA because there is enough theoretical background that the five variables constitute the construct (cohesion). On the other hand, I’d like to include all the variables because, as I said, theoretically each has a contribution to the construct; which explains why I’m interested in the composite score.

    Having said that, I am trying to compare two groups (high and low) of language learners in terms of the “cohesion” construct, and so I’m not concerned with individuals. Once I have a single score for “cohesion” (a reasonably calculated composite score) then it’s easy to compare the two groups in terms of this construct. 

    With this explanation, do you think the way I’ve calculated the composite score is justifiable? and will such comparison based on a composite score make sense?

    #1055
    Stephen Gorard
    Participant

    Sorry to say, I think what you are proposing is scientifically meaningless. R of 0.4 (i.e. 16% common variance) is pretty small. But even if R is higher why would you do this anyway? Let’s say a person’s height and foot size were strongly related, and you could imagine that each represents a variable like size, why would you conflate the two? What would be achieved over and above using the best single estimate of size? In fact the latter approach is much safer and produces an answer with much lower error. I know this ‘construct’ approach is common, but then so were religion and belief in flat earth at one time. Think about what you what to do and why. Sorry to self-reference but I think the argument against is simply explained in:

    Gorard, S. (2010) Measuring is more than assigning numbers, pp.389-408 in Walford, G., Tucker, E. and Viswanathan, M. (Eds.) Sage Handbook of Measurement, Los Angeles: Sage

    #1054
    Rafael Garcia
    Participant

    Stephen,
    Are you suggesting against modeling latent constructs in general or in the specific case here?

    I fail to see how using the single best predictor results in mich lower error. A unit-weighted factor score would be comprised of the common variance, average specific variance, and average error. With a sufficiently large number of indicators, the specific and error variances should cancel out. Am I wrong on that assumption?

    I think if the goal is straight prediction, using the best single estimate is appropriate.

    #1053
    Rafael Garcia
    Participant

    Not having a strong theoretical background is MORE of a reason to do an EFA. If there were strong rationale, a CFA should be used.

    The calculation is defensible, but only makes sense if there is reason to believe the scales are indicators of a latent construct. If you can, you should use a continuous predictor not dichotomous group membership.

    #1052
    Stephen Gorard
    Participant

    I am suggesting not aggregating things that cannot be aggregated. I did provide a reference. There are many more. The same kind of thing appears in:

    Gorard, S. (2008) Quantitative research in education: Volumes 1 to 3, London: Sage

    Here is a summary of part of the argument. If R=1 or near then the measures are all of the same thing, in effect. Therefore use only one. If R not near 1 the measures are of different things and averaging them (or whatever) is nonsensical. The notion of composite constructs is based on the errors in each variable being random (if they are biased then the thing falls apart anyway). If two or more measures with random error are aggregated the result is obviously going to be less accurate. The more measures are used the less accurate the results become (given these conditions). If all measures are trying to get at the same underlying idea then using the best single one (perhaps the highest loading in PCA) is safer (as well as simpler etc.).

    #1051
    Rafael Garcia
    Participant

    Stephen,

    I do not see how “If two or more measures with random error are aggregated the result is obviously going to be less accurate.” If this were the case, why would we not go to the extreme and use single-item scales? Isn’t one of the motivations behind multiple-item scales that by aggregating the items we negate the error of some of the items? Why would this principle not also apply to using multiple scales versus a single scale? 

    I agree that using a single measure is much simpler, and for straight prediction, this is more than sensible. I, however, favor multiple operationalism (Campbell & Fiske, 1959) over single operationalism. Standardizing and averaging scales belonging to a single construct is a way of implementing the approach in a more-easily digestible way (in lieu of doing a CFA).

    Campbell, D. T., & Fiske, D. W. “Convergent and Discriminant Validation by the
    Multitrait-Multimethod Matrix.” Psychological Bulletin, 1959, 56, 81-105.

    #1050
    Mehdi Riazi
    Member

    Thank you Stephen for your explanation and your paper which I will read with interest and use. However, like Rafael, I have difficulty understanding your argument. Take for example, “writing ability” in language proficiency tests. There are individual components (variables) with descriptors in the rating scales which underlie the construct of “writing ability”. These categories include, for example, “fluency”, “lexical sophistication”, “grammatical range & accuracy” etc. The components may have equal or variable weights. Using such an analytic scale, testers then calculate a total score (composite score?) for the test taker’s “writing ability”. Do you think is scientifically meaningless?

    “Think about what you what to do and why.”  

    I want to calculate a score for the latent variable of “Cohesion” based on the components of this latent variable. Where do I get the components (observable variables) from? As I indicated from available theories on cohesion. Accordingly, I’ve had the test takers written essays; have a score for each of the constituents of the latent variable “cohesion” and now want to calculate a score for the latent variable. What would you suggest?

    #1049
    Stephen Gorard
    Participant

    We are trying to measure accurately something we apparently have no direct measure of. We have a number of proxies all of which are meant to be asking something similar (hence the R). Each will, according to assumptions, contain random errors in practice, meaning that the given response by any individual may not be the one that would lead to the most accurate assessment of the underlying measure. Imagine only asking the first proxy question and getting 90% accurate responses (10% random error). If we use this item alone we will be 90% accurate (to put it simply here). If we ask a second question and aggregate responses the result will be less than 90% accurate even if the second item is also 90% accurate. Obviously. Most responses to the first item will be ok. Most responses to the second will be ok. If the two responses match (whether in error or not) then we have gained nothing by asking a second question. Of the ones that don’t match 90% will have been ok on the first item, and so the second question will not be ok and worsens the quality. In only 10% of the non-matching cases will the second item yield a more accurate answer (but even then we will not know since there is nothing to calibrate with). Thus, both simpler and more accurate to use the best single item. 

    #1048
    Stephen Gorard
    Participant

    Yes, I think it is scientifically meaningless.

    You have a set of achieved measures or counts. Why not just analyse those, and compare the five means between your two groups using a simple effect size? Of course, as ever, it depends what your RQ is. 

    #1047

    Thank you for providing your very insightful article!

Viewing 13 posts - 1 through 13 (of 13 total)
  • You must be logged in to reply to this topic.