\documentclass[10pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{fancyhdr,url,hyperref}
\usepackage{graphicx,xspace}
\usepackage{tikz}
\usetikzlibrary{shapes,arrows,decorations.pathmorphing,backgrounds,positioning,fit,through}
\oddsidemargin 0in %0.5in
\topmargin 0in
\leftmargin 0in
\rightmargin 0in
\textheight 9in
\textwidth 6in %6in
%\headheight 0in
%\headsep 0in
%\footskip 0.5in
\newtheorem{thm}{Theorem}
\newtheorem{cor}[thm]{Corollary}
\newtheorem{obs}{Observation}
\newtheorem{lemma}{Lemma}
\newtheorem{claim}{Claim}
\newtheorem{definition}{Definition}
\newtheorem{question}{Question}
\newtheorem{answer}{Answer}
\newtheorem{problem}{Problem}
\newtheorem{solution}{Solution}
\newtheorem{conjecture}{Conjecture}
\pagestyle{fancy}
\lhead{\textsc{Prof. McNamara}}
\chead{\textsc{SDS/MTH 291: Lecture notes}}
\lfoot{}
\cfoot{}
%\cfoot{\thepage}
\rfoot{}
\renewcommand{\headrulewidth}{0.2pt}
\renewcommand{\footrulewidth}{0.0pt}
\newcommand{\ans}{\vspace{0.25in}}
\newcommand{\R}{{\sf R}\xspace}
\newcommand{\cmd}[1]{\texttt{#1}}
\DeclareMathOperator{\Ex}{\mathbb{E}}
\DeclareMathOperator{\Var}{\text{Var}}
\newcommand{\shortans}{\vspace{1in}}
\newcommand{\mediumans}{\vspace{1.5in}}
\newcommand{\longans}{\vspace{2in}}
\rhead{\textsc{October 13, 2016}}
\begin{document}
\paragraph{Agenda}
\begin{enumerate}
\itemsep0em
\item Hypothesis testing and p-values in multiple regression
\item Parallel slopes
\item Interaction
\end{enumerate}
\paragraph{Hypothesis testing and p-values in multiple regression}
\begin{enumerate}
\itemsep1in
\item What was the null and alternative hypothesis we were testing in the simple linear regression case?
\item What are we now testing in multiple regression?
\item Is there a way we can think about a single p-value for an entire model in multiple regression?
\vspace{1in}
\end{enumerate}
<>=
require(mosaic)
require(openintro)
m1 <- lm(math~read+write+ses, data=hsb2)
summary(m1)
@
\paragraph{Parallel slopes}
Up to now, the most complicated model we have seen is something like the previous one. We thought about this model as a set of parallel planes in 3D. In fact, we could write equations for the three planes separately.
\begin{eqnarray*}
\text{Full equation: } \hat{y}&=&12.8+0.4*read+0.3*write+1.3*SES_m+1.9*SES_h \\
\widehat{math}| \text{SES = low} &=&12.8+0.4*read+0.3*write+1.3*0+1.9*0 \\
&=& 12.8+0.4*read+0.3*write \\
\widehat{math}| \text{SES = middle} &=&12.8+0.4*read+0.3*write+1.3*1+1.9*0 \\
&=& 12.8+0.4*read+0.3*write + 1.3 \\
&=& 14.1 +0.4*read+0.3*write\\
\widehat{math}| \text{SES = high}&=&12.8+0.4*read+0.3*write+1.3*0+1.9*1 \\
&=& 12.8+0.4*read+0.3*write + 1.9 \\
&=& 14.7 +0.4*read+0.3*write\\
\end{eqnarray*}
\paragraph{Separate models} But, we could have approached the modeling differently. If we didn't think that it made sense for the relationship between math scores and reading and writing scores to be the same for different socioeconomic statuses, we could have done something like this:
<>=
coef(lm(math~read+write, data=filter(hsb2, ses=="low")))
coef(lm(math~read+write, data=filter(hsb2, ses=="middle")))
coef(lm(math~read+write, data=filter(hsb2, ses=="high")))
@
\begin{enumerate}
\itemsep0.5in
\item Write out the three regression equations from the filtered data
\shortans
\end{enumerate}
\paragraph{Interaction}
Recall that a multiple linear regression model with two quantitative explanatory variables is:
$$
Y = \beta_0 + \beta_1 \cdot X_1 + \beta_2 \cdot X_2 + \epsilon
$$
This assumes that the effects upon $Y$ from $X_1$ are \emph{independent} of the value of $X_2$. That is, a one unit change in $X_1$ is associated with the same change in $Y$ \emph{regardless of the value of} $X_2$. Geometrically, the fitted values $\hat{Y}$ lie in a plane.
Consider the addition of an \emph{interaction} term between the quantitative explanatory variables:
$$
Y = \beta_0 + \beta_1 \cdot X_1 + \beta_2 \cdot X_2 + \beta_3 \cdot X_1 \cdot X_2 + \epsilon
$$
Now, the plane is no longer flat---it can be \emph{warped} as $X_1$ and $X_2$ vary together!
\paragraph{Simple example}
Recall the Italian restaurant data.
<>=
NYC <- read.csv("http://www.math.smith.edu/~bbaumer/mth241/nyc.csv")
mflat <- lm(Price ~ Food + Service, data=NYC)
mwarp <- lm(Price ~ Food + Service + Food * Service, data=NYC)
coef(mwarp)
NYC %>%
filter(Food == 23 & (Service > 23 | Service < 21))
@
\begin{enumerate}
\itemsep0.5in
\item Compute the expected price of a meal at \href{https://www.zagat.com/r/primola-new-york}{Primola}. If Primola hired additional waitstaff and raised their Service rating to 21, how much extra could they charge?
\item Compute the expected price of a meal at \href{http://www.yelp.com/biz/rughetta-ristorante-new-york}{Rughetta}. If Rughetta hired additional waitstaff and raised their Service rating to 25, how much extra could they charge?
\item Interpret the coefficient on \texttt{Food:Service}
\shortans
\end{enumerate}
<>=
require(rgl)
price.flat <- makeFun(mflat)
price.warp <- makeFun(mwarp)
opacity <- 0.4
with(NYC, plot3d(x = Food, y = Service, z = Price, type = "s", size = 1.5,
col = "blue", xlab = "Food Rating", ylab = "Service Rating",
zlab = "Price ($)"))
plot3d(price.flat, alpha = opacity, col = "lightgray", add=TRUE, xlim=c(0,30), ylim=c(0,30), zlim=c(0,80))
plot3d(price.warp, alpha = opacity, col = "pink", add=TRUE, xlim=c(0,30), ylim=c(0,30), zlim=c(0,80))
@
\clearpage
\paragraph{More complex example}
Back to our education data, we could also have added interaction terms to allow predictors to vary together. For this example, we would want two interaction terms:
<>=
m1 <- lm(math~read+write+ses+read*ses+write*ses, data=hsb2)
summary(m1)
@
\begin{enumerate}
\itemsep1in
\item Write out the three regression equations from the model with interaction terms
\item Interpret the coefficient \texttt{read:sesmiddle}
\shortans
\end{enumerate}
\end{document}