Home › Forums › Methodspace discussion › Composite score
 This topic has 12 replies, 5 voices, and was last updated 6 years, 8 months ago by Dave Collingridge.

AuthorPosts

30th April 2014 at 1:39 pm #1046Mehdi RiaziMember
I was wondering if anyone has the experience of calculating a composite score and could share it with me. I am thinking of a construct (say, for example, cohesion) and try to calculate a composite score which represents this construct by adding the individual scores of some variables related to it. Simply put, if I have 5 scores for “connectives”, “nonoverlap”, etc. which correlate with each other (implying they are related to the same construct) can I add them up to calculate a composite score for cohesion? The question of different scales for each of the variables will indeed be resolved by first changing each into a z score and then adding up the z scores.
30th April 2014 at 4:01 pm #1058Dave CollingridgeParticipantMehdi,
If the five scores are adequately correlated then it sounds reasonable to combine them into a composite score. Another method is to check interquestion relationship with Cronbach’s Alpha. If the 5 scores are on different scales, it is a good idea to normalize across scales by converting them into zscores using separate means and SDs from each scale. You would not want to add up the 5 zscores for each individual, rather you might try finding the mean and/or median of an individual’s zscore. This mean could be called a person’s average standardized composite score. For example, if someone has an average standardized composite score of 1.5, that would tell me that she, on average, scored 1.5 standard deviations above the rest of the group. Just one way to do it. Someone else might have a better idea.
30th April 2014 at 5:41 pm #1057Rafael GarciaParticipantIf you standardize you’re scales (normal) and take the mean you have what some refer to as a unitweighted factor score (see FACTOR ANALYSIS by Gorsuch 1983). It is similar to a weighted factor score (like you would get by doing an EFA), but tends to generalize better from study to study. By taking the partwhole correlations between the scales and the unitweighted factor, you can get analogs of the standardized factor loadings (though they are usually inflated).
I endorse Dave’s suggestions.
1st May 2014 at 12:57 am #1056Mehdi RiaziMemberThank you Dave and Rafael for your prompt responses which prove useful. Let me elaborate a bit more and see what you think.
1) In response to Dave, I should say yes, the scores adequately correlate with a minimum of r=0.40 which I then thought there’s no need to run Cronbach’s Alpha. Also, the 5 scores are on different scales and I’ve changed them into zscores.
2) In response to Rafael, I did not go for EFA because there is enough theoretical background that the five variables constitute the construct (cohesion). On the other hand, I’d like to include all the variables because, as I said, theoretically each has a contribution to the construct; which explains why I’m interested in the composite score.
Having said that, I am trying to compare two groups (high and low) of language learners in terms of the “cohesion” construct, and so I’m not concerned with individuals. Once I have a single score for “cohesion” (a reasonably calculated composite score) then it’s easy to compare the two groups in terms of this construct.
With this explanation, do you think the way I’ve calculated the composite score is justifiable? and will such comparison based on a composite score make sense?
1st May 2014 at 6:25 pm #1055Stephen GorardParticipantSorry to say, I think what you are proposing is scientifically meaningless. R of 0.4 (i.e. 16% common variance) is pretty small. But even if R is higher why would you do this anyway? Let’s say a person’s height and foot size were strongly related, and you could imagine that each represents a variable like size, why would you conflate the two? What would be achieved over and above using the best single estimate of size? In fact the latter approach is much safer and produces an answer with much lower error. I know this ‘construct’ approach is common, but then so were religion and belief in flat earth at one time. Think about what you what to do and why. Sorry to selfreference but I think the argument against is simply explained in:
Gorard, S. (2010) Measuring is more than assigning numbers, pp.389408 in Walford, G., Tucker, E. and Viswanathan, M. (Eds.) Sage Handbook of Measurement, Los Angeles: Sage
1st May 2014 at 6:58 pm #1054Rafael GarciaParticipantStephen,
Are you suggesting against modeling latent constructs in general or in the specific case here?I fail to see how using the single best predictor results in mich lower error. A unitweighted factor score would be comprised of the common variance, average specific variance, and average error. With a sufficiently large number of indicators, the specific and error variances should cancel out. Am I wrong on that assumption?
I think if the goal is straight prediction, using the best single estimate is appropriate.
1st May 2014 at 7:05 pm #1053Rafael GarciaParticipantNot having a strong theoretical background is MORE of a reason to do an EFA. If there were strong rationale, a CFA should be used.
The calculation is defensible, but only makes sense if there is reason to believe the scales are indicators of a latent construct. If you can, you should use a continuous predictor not dichotomous group membership.
1st May 2014 at 10:49 pm #1052Stephen GorardParticipantI am suggesting not aggregating things that cannot be aggregated. I did provide a reference. There are many more. The same kind of thing appears in:
Gorard, S. (2008) Quantitative research in education: Volumes 1 to 3, London: Sage
Here is a summary of part of the argument. If R=1 or near then the measures are all of the same thing, in effect. Therefore use only one. If R not near 1 the measures are of different things and averaging them (or whatever) is nonsensical. The notion of composite constructs is based on the errors in each variable being random (if they are biased then the thing falls apart anyway). If two or more measures with random error are aggregated the result is obviously going to be less accurate. The more measures are used the less accurate the results become (given these conditions). If all measures are trying to get at the same underlying idea then using the best single one (perhaps the highest loading in PCA) is safer (as well as simpler etc.).
2nd May 2014 at 12:00 am #1051Rafael GarciaParticipantStephen,
I do not see how “If two or more measures with random error are aggregated the result is obviously going to be less accurate.” If this were the case, why would we not go to the extreme and use singleitem scales? Isn’t one of the motivations behind multipleitem scales that by aggregating the items we negate the error of some of the items? Why would this principle not also apply to using multiple scales versus a single scale?
I agree that using a single measure is much simpler, and for straight prediction, this is more than sensible. I, however, favor multiple operationalism (Campbell & Fiske, 1959) over single operationalism. Standardizing and averaging scales belonging to a single construct is a way of implementing the approach in a moreeasily digestible way (in lieu of doing a CFA).
Campbell, D. T., & Fiske, D. W. “Convergent and Discriminant Validation by the
MultitraitMultimethod Matrix.” Psychological Bulletin, 1959, 56, 81105.2nd May 2014 at 2:14 am #1050Mehdi RiaziMemberThank you Stephen for your explanation and your paper which I will read with interest and use. However, like Rafael, I have difficulty understanding your argument. Take for example, “writing ability” in language proficiency tests. There are individual components (variables) with descriptors in the rating scales which underlie the construct of “writing ability”. These categories include, for example, “fluency”, “lexical sophistication”, “grammatical range & accuracy” etc. The components may have equal or variable weights. Using such an analytic scale, testers then calculate a total score (composite score?) for the test taker’s “writing ability”. Do you think is scientifically meaningless?
“Think about what you what to do and why.”
I want to calculate a score for the latent variable of “Cohesion” based on the components of this latent variable. Where do I get the components (observable variables) from? As I indicated from available theories on cohesion. Accordingly, I’ve had the test takers written essays; have a score for each of the constituents of the latent variable “cohesion” and now want to calculate a score for the latent variable. What would you suggest?
2nd May 2014 at 8:26 am #1049Stephen GorardParticipantWe are trying to measure accurately something we apparently have no direct measure of. We have a number of proxies all of which are meant to be asking something similar (hence the R). Each will, according to assumptions, contain random errors in practice, meaning that the given response by any individual may not be the one that would lead to the most accurate assessment of the underlying measure. Imagine only asking the first proxy question and getting 90% accurate responses (10% random error). If we use this item alone we will be 90% accurate (to put it simply here). If we ask a second question and aggregate responses the result will be less than 90% accurate even if the second item is also 90% accurate. Obviously. Most responses to the first item will be ok. Most responses to the second will be ok. If the two responses match (whether in error or not) then we have gained nothing by asking a second question. Of the ones that don’t match 90% will have been ok on the first item, and so the second question will not be ok and worsens the quality. In only 10% of the nonmatching cases will the second item yield a more accurate answer (but even then we will not know since there is nothing to calibrate with). Thus, both simpler and more accurate to use the best single item.
2nd May 2014 at 8:31 am #1048Stephen GorardParticipantYes, I think it is scientifically meaningless.
You have a set of achieved measures or counts. Why not just analyse those, and compare the five means between your two groups using a simple effect size? Of course, as ever, it depends what your RQ is.
12th May 2014 at 12:41 pm #1047Ruth M TappinMemberThank you for providing your very insightful article!

AuthorPosts
 You must be logged in to reply to this topic.