Test for significant difference between correlations

Home Forums Methodspace discussion Test for significant difference between correlations

Viewing 15 posts - 1 through 15 (of 23 total)
  • Author
  • #1472

    Hi everyone!

    During my statistical analysis for my thesis I have calculated Spearmans correlations and I would now like to test, whether two correlations differ significantly. I think, between samples that would be Fishers Z, but what is it within the same sample, does anyone know?

    Thank you for you help!

    Kind regards

    Dave Collingridge

    There are different options for testing the spearman correlation coefficient. Perhaps the easiest and most well known is the t-test.

    t = r * sqrt [ (n-2) / (1 – r^2) ] . Refer your t-value to the t-distribution to find out the probability of getting that t-value assuming the population correlation equals zero. If the probability is less than 0.05, then you may declare statistical significance. This test has n-2 degrees of freedom.  


    Hi Dave, thank you very much for your helpful answer!

    I didn’t know that the T-Test can be applied within the same sample, but apparently I was wrong..

    Thank you


    I actually have another question I was hoping you could answer:

    I am running a regression analysis, with two predictor variables (one of which is age), all variables are continous. Based on previous research, both predictors should influence the outcome variable, but I can’t make any assumptions about which predictor would make a better prediction or explain more variance in the outcome variable respectively.

    What kind of regression should I run?

    Should I maybe also consider a ANOVA, even if I have to dichotomise the predictors (I might legitimate this with the skewed distribution in my data)?

    Again, any help would be greatly appreciated!

    Kind regards

    Dave Collingridge

    Hi Laurine,

    If you have a continuous outcome variable you should run a basic linear regression with the two continuous variables entered as predictors.

    There is a statistic in most regression software packages that will tell you which predictor explains the most variance. It is called the standardized regression coefficient, or Beta coefficient (it is not the same as the regression coefficient). If you find that both predictors are statistically significant and that one predictor has a standardized regression coefficient of .350 and the other has a standardized regression coefficient of of .550, you will know that the second predictor explains more variance in the outcome.

    As a general rule I do not like reducing continuous variables into dichotomous variables because it may reduce your ability to detect differences, but sometimes it is necessary. A regression is usually better than ANOVA.

    With a regression you don’t have to worry so much about whether your outcome and predictors are skewed, but you will want to check the shape of the distribution of residuals. The residuals are the true scores minus predicted scores from your regression model. The residuals should be normally distributed. You can check residual normality with a histogram or “normal P-P plot”. Most software packages have options for plotting residuals. Let me know if you can’t find these.




    Hey Dave,

    again thank you very much for your helpful answer!

    Great advise to check for residual normality. I did it with both, histogram and P-P Plot, and they actually both look fine to me.

    I have now another problem that just appeared and I just don’t know what to do with it.. I did run the regression for the whole sample (N=172) with my two predictors, lets call them A and B. So for the whole sample, predictor A turns out to be the better predictor and to explain more variance. However when I split the sample in two (as I have two age groups), predictor B turns out to be the better one in each of the two groups. The outcome variable is always the same! How is this even possible?? Is this a known problem/ phenomenon?

    I used the stepwise forward method, as I couldn’t make any assumptions about which predictor would be the best.

    Again, thank you for your help!

    Dave Collingridge

    Hi Laurine,

    “Wie geht’s?”

    Separating the group by gender is like including gender as a variable in your regression. In fact I suggest re-running the regression and including gender as a third variable (coded, for example, as males =1 and females=0). Check if variables A and B switch in terms of accounting for more variability when you include gender as a predictor variable in the regression. If B becomes a better predictor when you include gender, then gender and variable A have shared variability that is being accounted for by gender and no longer by variable A. This exchange of variability from A to gender leaves variable B as the winner in terms of accounting for the most variability.

    This may be what is happening. If you are interested in gender, you should include it as a third variable in your regression.

    Dave C.



    Hi Dave

    Thanks a lot for your answer!

    So I re-run the regression and it turns out that age (I guess you meant age instead of gender? I have two age groups, first-graders third-graders) and it turns out that when analyzing the entire sample, age is the best predictor (which makes completely sense, as my outcome variable is school achievement..)

    I used the Enter Method this time and put age as first predictor in the model. When doing this, my two predictors A and B explain about the same amount of variance (very little, actually..) and the Beta coefficient of both predictors is about .20. When adding age in the last step, both predictors Beta coefficients are about .15 and age however is .62.

    So does that mean that my predictor A, who would account for the most variance in the entire sample, is confounded with age? How can I prove this? I did check for correlations, and its .48.

    Thank you for your advice!

    Kind regards


    Dave Collingridge

    Is variable A confounded with age? It depends. It all depends on whether there is a logical relationship between variable A and age and whether A is still significant when age is added to the equation.

    For instance, I recently analyzed some data looking at the relationship between sodium levels and death/mortality in hospital patients. We entered sodium levels as the main predictor and entered other variables. Another variable was hospital location, namely trauma intensive care (ICU) unit vs. non-trauma hospital patients. It turned out that patient location was a significant predictor; being in a trauma ICU had a higher rate of death/mortality. Initially this was a concern. It seemed to suggest that one area of the hospital had higher mortality than another area. It did not quite make sense. After asking some questions I realized that trauma ICU dealt with traumatic brain injured (TBI) patients and the other areas did not. So I entered a TBI variable (yes/no) and guess what? Patient location was no longer significant, but the newly entered TBI variable was.

    In this example patient location in the hospital was confounded with traumatic brain injury. There is not a logical relationship between patient location in the hospital and mortality (at least not in a good hospital), but there is a logical relationship between patient location and whether they had a TBI (if patients had a TBI they went to trauma ICU). 

    Is there a logical relationship between variable A and school achievement? If there is a logical relationship then it will probably still be significant after adding age, even though the amount of variance it accounts for (beta coefficient) drops a little after adding age to the equation. A drop in variance accounted for just means that variable A shares some variability with age and in the regression age is accounting for some of that variability now. If variable A is still significant after adding age and the beta coefficient only drops from .20 to .15, and there is a logical relationship between A and the outcome, then I would say A is not confounded with age.

    You could separate your data into 2 groups (first and third graders) and run separate regression analyses. This would likely minimize the impact of age and may give you a more sensitive measure of the influence of variables A and B on school achievement.



    Hi Dave

    thanks a lot for your answer!!

    As you say, I don’t think that age is confounded with my variable A, but they share a good amount of variance.

    To get a better understanding, I divided the sample in two age groups (firstgraders and thidgraders). So age is the best predictor for the whole sample, when dividing the sample in two, it’s not anymore and predictor B explains more variance.

    However, I am not sure whether I should enter age as a control variable/covariate in my regression or not?! Previous findings show that both predictors improve and progress with age (between 6 an 9, so which is exactly the age group of my sample). So if I enter age as control variable, does that mean that I disregard the fact that age has an influence (on my predictors and on the outcome variable)? That I “pretend” that age wouldn’t be important and have an influence?

    On the other hand, I want to test the influence of my two predictor variables specifically on the outcome variable, which would be a reason to control for age. I am a bit confused here. What is your opinion? Is there any literature about this issue?

    Thank you very much for your help!

    Kind regards


    Dave Collingridge


    Age is the best predictor when you include all kids. When you divide the kids into first and third grades and run separate regressions, age is no longer a significant predictor because you have removed the effect of grade by dividing the kids into age groups. It appears that your outcome/dependent variable is sensitive to changes in ages across grades, but not sensitive to changes in age within grades. This is an interesting find.

    Anyway, these results show that the outcome variable significantly changes with age. However, if you did not include second grade kids then you really have a categorical age predictor (first and third grades). You might consider re-coding the “age” variable as a “grade” variable and code first graders as 1 (baseline/referent category) and code third graders as 3. The regression coefficient for this categorical variable will tell you the average increase in the outcome variable when you move from first to third grade.

    Based on what you’ve told me, I think you should include all the kids in the same regression with “grade” replacing the “age” variable. This will show that age (or grade level) affects the outcome while controlling for other predictors, and allow you to test whether the other predictors influence the outcome while controlling for age/grade level. 



    Hey Dave!

    Thank you very much for your help! I have yet another question.. Suddenly I’m not sure anymore whether I should use Spearman or Kendall correlations.. The reason why I have to calculate nonparametric correlations is that most of my variables are not normally distributed at all (skewed: floor effects). I think both Spearman and Kendall have their pros and cons, I’m really not sure which one I should take. My sample is N=172. What is your opinion? Also: can I calculate partial correlations? I was thinking that they are based on Pearson correlations what means that they can’t be applied in that case. Is that true?

    Thank you very much for your advise!

    Kind regards


    Dave Collingridge

    Salut Laurine!

    It is best to use nonparametric correlation when the data are not normally distributed. The Kendall’s tau nonparametric test should be used when you have a small data set and a large proportion of tied ranks. N=172 is not really small so I think you should use Spearman’s rank test.

    I don’t run many partial correlations – in fact I don’t think I have ever run a partial correlation in my research, so I don’t know too much about partial correlations other than that they allow you check the correlation between 2 variables while controlling or holding constant the effect of a third variable. Checking for the relationship between two variables while holding a third variable constant is essentially what regression does. Anyway, if you need to compute partial correlations, check if your software can run the tests. I can help with running partial correlation in R statistics package.


    Muir Houston

    never use a stepwise entry method – then it is the computer package building the model – there is no lonk to previoous research or theory – just fishing expedition

    see here:




    Hey Dave!

    I am almost done with my statistical analyses. When presenting and interpreting my findings, I have some doubts about my regression analyses. I have two regression analyses with two different dependent variables. To what extend is ist possible too compare the finding (e.g. the beta-coefficients) between these two analyses? I have a formula to compare the coefficients and check if they are significantly different. Is it correct if I apply this formula for two different regressions or should it only be applied within the same regression? And to what extend can I compare and contrast the results with regard to contents and make implications?

    Another doubt I have is the following: I am analyzing different types of errors within a task that the children commited. Unfortunately (for me) they made very few such errors and the distributions are exrtemely skewed (which is why I don’t use parametric tests with these variables). What does this mean with reference to the variance or other parameters of these error-variables?I just don’t have a lot of variance in these variables, do I? What does this mean for my analyse (e.g. correlations)

    Thank you very much for your help!!

    Kind regards


Viewing 15 posts - 1 through 15 (of 23 total)
  • You must be logged in to reply to this topic.