Regression Requires Big Samples

Dear Students,

Near the end of Chapter 19, I discuss 6 important issues related to regression. This discussion appears on pages 601-605 (in a section called "Final Comments"), and my introductory comments to that discussion included this sentence:

If you will keep these issues in mind as you encounter research reports based on bivariate, multiple, and logistic regression, you will be in a far better position to both decipher and critique such reports.

Although the 6 issues I did discuss are important, there's another important issue that I did not discuss. Unfortunately, I forgot to bring up the issue of "sample size." However, via this email message I hope to make amends for my oversight.

Simply stated, regression analyses do not work very well if based upon small sample sizes. I suspect that this idea is intuitively reasonable to you. However, I also suspect that you'd like to know what's the definition of "small." Flipping this question around, it's my guess that you'd like to know the minimum number of "cases" needed to create an "adequate" sample size.

If you're like most folks, you probably would like to hear something short and sweet, such as "Regression analyses are OK if based on an n of 30 or more (but not OK if based on an n of less than 30)." Unfortunately, such a simple answer cannot be given.

While reading about regression in a variety of statistics books, journal articles, and Websites, I've come across several different "rules" specifying how large n must be for a regression analysis to work well. In most sources, a rule-of-thumb was offered that defined the "needed n" as a multiple of the number of independent (i.e., predictor) variables. On the conservative end of the continuum, I've seen one recommendation that n should be at least 100 times the number of independent variables. On the other (more liberal) end of the continuum, I've seen it said that regression can work with as few as 2 cases per independent variable.

The most frequently seen rule-of-thumb stipulates that n should be at least 10 times the number of independent variables. That's the criterion I use when conducting my own regression analyses or when evaluating someone else's regression study. To see whether you understand how this little rule works, turn to Excerpt 19.24 (on pages 588-589) and ask yourself whether the study's n was big enough.

Before closing, allow me to explain how I use the 10-to-1 rule in conjunction with logistic regression studies. In such investigations, there are actually 3 sample sizes: (1) the total number of cases from whom data are collected, (2) the number of cases in one of the two subgroups based on the binary dependent variable, and (3) the number of cases in the other of those two subgroups. The 10-to-1 rule applies to the smaller of the 2 subgroups. Thus, if logistic regression is used to predict surgery survival (yes/no) from 8 different demographic and psychological characteristics of patients, there should be no fewer than 80 people in the smaller of the two outcome categories.

Sky Huck

Copyright © 2012

Schuyler W. Huck
All rights reserved.

| Book Info | Author Info |

Site URL:

Top | Site Map
Site Design: John W. Taylor V