require(mosaic)
require(Stat2Data)
require(agricolae)
cols <- trellis.par.get()$superpose.symbol$col

Motivation to Correct for Multiple Testing

Let \(X,Y\) be two random vectors, and consider the model \(Y \sim \beta_0 + \beta_1 \cdot X\). The null hypothesis is that \(\beta_1 = 0\). In our hypothesis test, we compute the \(t\)-statistic associated with the likelihood of observing the data (\(\hat{\beta_1}\)) under the assumption that the true slope is 0 (e.g. \(\beta_1 = 0\)). If the probability of \(\hat{\beta_1}\) is sufficiently small (e.g. less than \(\alpha = 0.05\)), then we say that we have found a statistically significant association.

But what if we have 20 variables \(X_1,X_2, \ldots, X_{20}\) in our model? Remember the jelly beans!