Data-Based Economics
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.252
Model: OLS Adj. R-squared: 0.245
Method: Least Squares F-statistic: 33.08
Date: Tue, 30 Mar 2021 Prob (F-statistic): 1.01e-07
Time: 02:34:12 Log-Likelihood: -111.39
No. Observations: 100 AIC: 226.8
Df Residuals: 98 BIC: 232.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
==============================================================================
Intercept -0.1750 0.162 -1.082 0.282 -0.496 0.146
x 0.1377 0.024 5.751 0.000 0.090 0.185
==============================================================================
Omnibus: 2.673 Durbin-Watson: 1.118
Prob(Omnibus): 0.263 Jarque-Bera (JB): 2.654
Skew: 0.352 Prob(JB): 0.265
Kurtosis: 2.626 Cond. No. 14.9
==============================================================================R2: provides an indication of predictive power. Does not prevent overfitting.adj. R2: predictive power corrected for excessive degrees of freedomp-value probability that coefficient might have been greater than observed, if it was actually 0.Example of a Regression Table
Overfitting
Colinearity
To choose the right amount of variables find a combination which maximizes adjusted R2 or an information criterium
\(x\) is colinear with \(y\) if \(cor(x,y)\) very close to 1
more generally \(x\) is colinear with \(y_1, ... y_n\) if \(x\) can be deduced linearly from \(y_1...y_n\)
perfect colinearity is a problem: coefficients are not well defined
\(\text{productivity} = 0.1 + 0.5 \text{sleep} + 0.5 \text{awake}\) or \(\text{productivity} = -11.9 + 1 \text{sleep} + 1 \text{awake}\) ?
best regressions have regressors that:
explain independent variable
are independent from each other (as much as possible)
What if you don’t have enough variables?
\(y = a + bx\)
\[y = a + b x + \epsilon\]