Question about sample size in multiple regression

Home Forums Methodspace discussion Question about sample size in multiple regression

Viewing 15 posts - 1 through 15 (of 20 total)
  • Author
  • #1158

    I have a question about sample size calculations.  I would like to look at predictors of stage of adoption (participants will be classified into 7 categories – unaware, unengaged by the issue, undecided, decided not to act, decided to act, action, and maintenance).  I have 14 variables that I would like to examine as potential predictors.  What is the best way specifically to figure out sample size for this type of analysis?

    Dave Collingridge

    Greetings Lisa,

    Power and sample size calculations for multiple regression  usually require specifying the multiple regression coefficient or the specific correlations between the predictors and outcome variable. Either approach assumes that the outcome is a continuous (ratio or interval level) variable. However, unless I’ve misunderstood, it does not look like your outcome (stage adoption) is a continuous variable. You might consider analyses where the outcome is categorical.


    Thank you Dave for your response.  I see your point.  Would you suggest logistic regression then since it is actually categorical (you could make the argument that it is an ordered category I suppose)?

    Dave Collingridge

    If you can reduce stage of adoption to two categories then a logistic regression might work. If your categories cannot be reduced and they do not have an inherent ordering, a multinomial regression may work. If the categories are ordered you might consider ordinal regression. Beware that ordinal and multinomial regression are quite sophisticated. I have not seem power and sample size tools for these analyses.  


    Thank you very much for the direction!  Much appreciated.

    Stephen Gorard

    As already stated, a standard multiple linear regression would not suit here. But how is the classification of adoption stage to be achieved? If it is based on a score, then that score could be predicted variable instead.

    In general, samples should be as large as feasibly possible. If this is not large enough to be convincing then the work cannot be done. A minimum to convince me might be around 50 times the number of predictors (counting each dummy variable as one in its own right). ‘Power’ of course is irrelevant, since it is based on the flawed logic of significance testing.

    I might try converting the classes into three (not really aware, not acting, acting or similar). And then running 2 sets of binary logistic comparisons. AB and BC. Note that you will need about the same number of cases in each outcome as grouped (else the skew will prevent sizeable variation explained).  


    As a rule of thumb: 10-15 sample per each predictor.

    Robert Rieg

    You can use the following software to calculate the sample size needed to achieve a certain level of power (typically at least 80, better 90%):

    G Power 3:

    It is free and I found it very useful;

    select “F-Test” “Mulitple Regression Fixed model deviation from zero”

    Parameters: effect size (0.15 is a medium effect), alpha = 5, power 0.90, number of predictors (your number is 14), you will get 166 as sample size needed!

    Stephen Gorard

    The major problem with such software and such an approach is that it is predicated on significance testing. This cannot be used where the sample is not true, complete and random – and which sample ever is? And even if the sample is complete, significance does not provide the relevant answers here. ‘Effect’ sizes do.

    Aizazullah Khan

    I think you should take 5 observation per variable

    Robert Rieg

    The point of this software is to calculate sample size given pre-defined power, not statistical significance. Power is basically the probability to measure an effect if there is really an effect. In that sense; I work with effect sizes and confidence intervals, I do not see why this procedure should (solely) rely on statistical significance. One parameter is the proposed alpha value but that you need also for confidence intervals (1-alpha).

    Stephen Gorard

    Sorry Robert – but the calculations you describe ARE predicated on significance testing. The alpha used in power calculations is the same as significance alpha. And means exactly the same. The algorithm for the software works out how many cases would be needed to have a certain probability (often 80%) of detecting a difference (pattern or trend) at the alpha level of significance. Since sig does not work, nor do power calculations,.


    Exactly the same problems arise with confidence intervals (which are just another form of significance for a range not a point). They would also require a full true random sample even if they worked as intended. They do not work for same reasons as sig (they do not provide the probability we want, but are often misinterpreted as doing so). P(D|H) not P(H|D).


    Effect sizes have no such assumptions. An effect size can be quoted validly for a population, convenience sample, incomplete random sample etc.


    Sample size is important, and is about how many cases would be needed to be convincing (to warrant a claim to knowledge).


     Dear lisa carter


    There is soft ware called sample size calculator, you can use that and get accurate sample.


    You can also use one of the Random sampling procedure, multi stage or stratified sampling technique


    Prof.Dr.Subhadra Iyengar,PhD

    HOD,Research Department,PSG,Coimabtore,Tamil Nadu,India

    Stephen Gorard

    Again sorry Subhadra – the applies as above. Do not use significance for samples size calculations (remember Meehl and all that). Just consider what would be convincing in relation to the number of variables.


    thank you stephen, I will reconsider.I thought lisa, is concerned about calculation of sample size. So I suggested.

Viewing 15 posts - 1 through 15 (of 20 total)
  • You must be logged in to reply to this topic.