Should my control group be included in all analyses?

Home Forums Methodspace discussion Should my control group be included in all analyses?

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
  • #1061
    Tessa Mearns

    My PhD research is concerned with motivational differences between learners in bilingual and ‘regular’ education. The largest part of my data comes from a survey administered to a total of nearly 900 pupils across 6 schools: 5 that offer both bilingual and regular education, and 1 control school, which only offers regular.

    My question is whether I should be including the control school in all of my analyses. I was doing so at first (analysing on the basis of 3 groups: bilingual, regular and control), but I’m concerned that it’s messing with my significance tests (Chi square), as the control group’s responses are sometimes surprisingly different to the others’. 

    Should I do every analysis twice (once including the control and once without it)? Otherwise I’m concerned about getting significant results, only because the control is so different. Obviously this is interesting in itself, but it isn’t the main point of my research, and after all it is only one school.

    I’ve tried looking this up, but to no avail…

    Any advice would be much appreciated!


    Dave Collingridge

    Greetings Tessa,

     If you are running Chi-square tests for Goodness of Fit which basically considers proportions in each group for a single characteristic/variable, it would be a good idea to include the control school in each analysis. I might add that the expected proportion for the control school might need to be adjusted to reflect the fact that it makes up just 1/5 schools in your study.

    If you are running Chi-square tests of Independence, you can include all three school types as one of your variables. This analysis will tell you if educational style influences another variable or vice versa.  

    By running multiple chi-square tests you might run into the problem of an inflated Type I error rate, at least from members of your PhD committee. A Bonferonni correction could be used if concerns arise.

    My last point is that there may be a better way to analyze your data, depending on what your data look like. Are all your variables categorical or do you have some continuous (ratio, interval) data? Binary logistic regression can be a great way to simultaneously evaluate the effect of multiple predictor variables on a binary outcome while keeping the alpha inflation under control.  Same thing for multiple regression.

    Stephen Gorard

    The answer is not a technical one. But it must depend on your research questions and what you intended to do when you set up your study design. In essence, what you are trying to find out using this approach. Do not dredge.

    The previous reply suggests a modelling approach in which both the type of school and the type of teaching can be used as ‘predictors’ of the outcome of interest. This makes sense and allows you to use all of the data without confusion.

    You do not have a ‘control’ since there is no suggestion of an intervention. Comparator?

    Perhaps most importantly, unless there has been some kind of hidden randomisation your talk of ‘significance’ is inappropriate. The uncertainty you face is not probabilistic. So chi-square etc are not to be used. Nor should significance be used to retain or reject predictors in an model. Simply describe the findings and the proportionate differences between sub-groups, and use effect sizes (perhaps variance explained) to select predictors. 

    Tessa Mearns

    Dear Dave,

    How awful that I didn’t respond to say thanks for taking the time to respond! My apologies.

    Thank you!


    Tessa Mearns

    Dear Stephen,

    Also to you, thanks for the advice!


Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.