Data-Based Economics, ESCP, 2024-2025
2025-02-12
Our multilinear regression: \[y = \alpha + \beta x_1 + \cdots + \beta x_n\]
So far, we have only considered real variables: (\(x_i \in \mathbb{R}\)).
Like: \[x_{\text{gdp}} = \alpha + \beta_1 x_{\text{unemployment}} + \beta_2 x_{\text{inflation}}\]
How do we deal with the following cases?
Look at the model: \[y_{\text{CO2 emission}} = \alpha + \beta x_{\text{banish cars}} \]
where \(y_{\text{CO2 emission}}\) is an individual’s CO2 emissions and \(x_{\text{yellow vest support}}\) is the response the the question Do you suport the banishment of petrol cars?.
Response is coded up as:
If the variable was used directly, how would you intepret the coefficient \(\beta\) ?
We use one dummy variable per possible answer.
\(D_{\text{Strongly Disagree}}\) | \(D_{\text{Disagree}}\) | \(D_{\text{Neutral}}\) | \(D_{\text{Agree}}\) | \(D_{\text{Strongly Agree}}\) |
---|---|---|---|---|
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 |
\[y_{\text{CO2 emission}} = \alpha + \beta_1 x_{\text{strdis}} + \beta_2 x_{\text{dis}} + \beta_3 x_{\text{agr}} + \beta_4 x_{\text{stragr}}\]
activity | code |
---|---|
massage therapist | 1 |
mortician | 2 |
archeologist | 3 |
financial clerks | 4 |
\(D_{\text{mortician}}\) | \(D_{\text{archeologist}}\) | \(D_{\text{financial clerks}}\) |
---|---|---|
1 | 0 | 0 |
0 | 1 | 0 |
0 | 0 | 1 |
Use statsmodels
/linearmodels
to create dummy variables with formula API.
There is an options to choose the reference group
Clear? Huh! Why a four-year-old child could understand this report! Run out and find me a four-year-old child, I can’t make head or tail of it.
Spurious Correlation
But how do we define
?
Both concepts are actually hard to define:
For instance correlation between \(X\) and \(Y\) is just the average correlation taken over many draws \(\omega\) of the data: \[E_{\omega}\left[ (X-E[X])(Y-E[Y])\right]\]
But causality in the real world is problematic
Usually, we observe \(A(\omega)\) only once…
Example:
Statistical definition of causality
Variable \(A\) causes \(B\) in a statistical sense if
An important task in econometrics is to construct a counter-factual
Maybe the effect would be the consequence of the choice of patients rather than of the medication?
Example: cognitive dissonance
Experiment 2: subjects are given randomly a shirt of either Olympique de Marseille or PSG.
Result:
Cause (A): two groups of people
Possible consequence (B): performance
Take a given agent Alice: she performs well with a PSG shirt.
Let’s try to have her play again without the football shirt
Randomized Control Trial (RCT)
The best way to ensure that treatment is independent from other factors is to randomize it.
In medecine
In economics:
Natural Experiment
A natural experiment satisfies conditions that treatment is assigned randomly
An exemple of a Natural Experiment:
Result: yes, they get 1.5% less votes by right-wing voters
What was the natural experiment?
Lifetime Earnings and the Vietnam Era Draft Lottery, by JD Angrist
Fact:
Problem (for the economist):
Genius idea:
Can we use the Draft to generate randomness ?
Choosing an instrumental variable
A good instrument when trying to explain y by x, is a variable that is correlated to the treatment (x) but does not have any effect on the outcome of interest (y), appart from its effect through x.
In particular, it should be uncorrelated from any potential confounding factor (whether observed or unobserved).
statsmodels
and linearmodels
support instrumental variables
linearmodels
has a handy formula syntax: salary ~ 1 + [war ~ draft]