Data-Based Economics, ESCP, 2024-2025
2025-02-05
type | income | education | prestige | |
---|---|---|---|---|
accountant | prof | 62 | 86 | 82 |
pilot | prof | 72 | 76 | 83 |
architect | prof | 75 | 92 | 90 |
author | prof | 55 | 90 | 76 |
chemist | prof | 64 | 86 | 90 |
prestige
has higher \(R^2\) (\(0.83^2\))Now we are trying to fit a plane to a cloud of points.
\[Y = \begin{bmatrix} \text{income}_1 \\\\ \vdots \\\\ \text{income}_N \end{bmatrix}\] \[X = \begin{bmatrix} 1 & \text{education}_1 & \text{prestige}_1 \\\\ \vdots & \vdots & \vdots \\\\ 1 &\text{education}_N & \text{prestige}_N \end{bmatrix}\]
education
significant?As in the 1d case we can compare: - the variability of the model predictions (\(MSS\)) - the variance of the data (\(TSS\), T for total)
Coefficient of determination (same formula):
\[R^2 = \frac{MSS}{TSS}\]
Or:
\[R^2 = 1-\frac{RSS}{SST}\]
where \(RSS\) is the non explained variance
Fact:
Penalise additional regressors: adjusted R^2
\[R^2_{adj} = 1-(1-R^2)\frac{N-1}{N-p-1}\]
Where:
In our example:
Regression | \(R^2\) | \(R^2_{adj}\) |
---|---|---|
education | 0.525 | 0.514 |
prestige | 0.702 | 0.695 |
education + prestige | 0.7022 | 0.688 |
We use a special API
inspired by R
:
statsmodels
model = smf.ols('income ~ education', df) # model
res = model.fit() # perform the regression
res.describe()
OLS Regression Results
==============================================================================
Dep. Variable: income R-squared: 0.525
Model: OLS Adj. R-squared: 0.514
Method: Least Squares F-statistic: 47.51
Date: Tue, 02 Feb 2021 Prob (F-statistic): 1.84e-08
Time: 05:21:25 Log-Likelihood: -190.42
No. Observations: 45 AIC: 384.8
Df Residuals: 43 BIC: 388.5
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
==============================================================================
Intercept 10.6035 5.198 2.040 0.048 0.120 21.087
education 0.5949 0.086 6.893 0.000 0.421 0.769
==============================================================================
Omnibus: 9.841 Durbin-Watson: 1.736
Prob(Omnibus): 0.007 Jarque-Bera (JB): 10.609
Skew: 0.776 Prob(JB): 0.00497
Kurtosis: 4.802 Cond. No. 123.
==============================================================================
statsmodels
formulas, can be supplied with R-style syntaxFormula | Model |
---|---|
income ~ education |
\(\text{income}_i = \alpha + \beta \text{education}_i\) |
income ~ prestige |
\(\text{income}_i = \alpha + \beta \text{prestige}_i\) |
income ~ prestige - 1 |
\(\text{income}_i = \beta \text{prestige}_i\) (no intercept) |
income ~ education + prestige |
\(\text{income}_i = \alpha + \beta_1 \text{education}_i + \beta_2 \text{prestige}_i\) |
Formula | Model |
---|---|
log(P) ~ log(M) + log(Y) |
\(\log(P_i) = \alpha + \alpha_1 \log(M_i) + \alpha_2 \log(Y_i)\) (log-log) |
log(Y) ~ i |
\(\log(P_i) = \alpha + i_i\) (semi-logs) |
Example:
(police_spending
and prevention_policies
in million dollars) \[\text{number_or_crimes} = 0.005\% - 0.001 \text{pol_spend} - 0.005 \text{prev_pol} + 0.002 \text{population density}\]
reads: when holding other variables constant a 0.1 million increase in police spending reduces crime rate by 0.001%
Iinterpretation?
Take logs: \[\log(\text{number_or_crimes}) = 0.005\% - 0.15 \log(\text{pol_spend}) - 0.4 \log(\text{prev_pol}) + 0.2 \log(\text{population density})\]
We need some hypotheses on the data generation process:
Under these hypotheses:
Fisher Criterium (F-test)
Statistics: \[F=\frac{MSR}{MSE}\] (same as 1d)
Under:
… the distribution of \(F\) is known
It is remarkable that it doesn’t depend on \(\sigma\) !
One can produce a p-value.
Student Test
Given a coefficient \(\beta_k\):
Statistics (student-t): \[t = \frac{\hat{\beta_k}}{\hat{\sigma}(\hat{\beta_k})}\]
Under the inference hypotheses, distribution of \(t\) is known.
Procedure:
Or just look at the p-value:
Same as in the 1d case.
Produce confidence intervals at \(\alpha\) confidence level:
Interpretation:
Suppose you run a regression: \[y = \alpha + \beta_1 x_1 + \epsilon\] and are genuinely interested in coefficient \(\beta_1\)
But unknowingly to you, the actual model is \[y = \alpha + \beta_1 x_1 + \beta_2 x_2 + \eta\]
The residual \(y - \alpha - \beta_1 x_1\) is not white noise
\[y = \alpha + \beta_1 x_1 + ... \beta_n x_n\]
Which regressors to choose ?
Method 1 : remove coefficients with lowest t (less significant) to maximize adjusted R-squared
Method 2 : choose combination to maximize Akaike Information Criterium