Data-Based Economics, ESCP, 2024-2025
2025-04-01
A rose is a rose is a rose
Gertrude Stein
Brexit means Brexit means Brexit
John Crace
Elementary my dear Watson
P.G. Woodehouse
There is an easy way for Europe to respond to the trade war started by Donald Trump
If we could complete any sentence. . . Would we be able to solve any problem?
⮕ We are witnessing the advent of language-based AI.
All1 Generative Language Models so far perform text completion
They generate plausible2 text following a prompt.
The type of answer, will depend on the kind of prompt.
To use AI you have to experiment with the prompt.
It is the same as learning how to do google queries
+noir +film -"pinot noir"
“Prompt engineering” is becoming a discipline in itself…
By providing enough context, it is possible to perform amazing tasks
Even Chat Interfaces have a hidden prompt.
The Caesar code
Zodiac 408 Cipher
Later in 2001, in a prison, somewhere in California
Solved by Stanford’s Persi Diaconis and his students using Monte Carlo Markov Chains
Take a letter \(x_n\), what is the probability of the next letter being \(x_{n+1}\)?
\[\pi_{X,Y} = P(x_{n+1}=Y, x_{n}=X)\]
for \(X=\{a, b, .... , z\} , Y=\{a,b,c, ... z\}\)
The language model can be trained using dataset of english language.
And used to determine whether a given cipher-key is consistent with english language.
It yields a very efficient algorithm to decode any caesar code (with very small sample)
MCMCs can also be used to generate text:
I think therefore I
I
I
I think therefore I I I I I I
Not good but promising (🤷)
Going further
fore I
> ???An example using MCMC
He ha ‘s kill’d me Mother , Run away I pray you Oh this is Counter you false Danish Dogges .
Can we augment memory?
26
) after 50
letters, you need to take into account 5.6061847e+70
combinations !
wjai dfni
Despite the constant negative press covfefe
🤔\[\forall X, P(x_n=X| x_{n-1}, ..., x_{n-k}) = \varphi^{NL}( x_{n-1}, ..., x_{n-k}; \theta )\]
with a smaller vector of parameters \(\theta\)
In 2015
speech recognition
LSTM behind “Google Translate”, “Alexa”, …
A special kind of encoder/decoder architecture.
Most successful models since 2017
Take some data \((x_n)\in R^x\).
Consider two functions:
Train the coefficients with:
\[\min_{\theta^E, \theta^D} \left( \varphi^D( \varphi^E(x_n; \theta^E), \theta^D) - x_n\right)^2\]
i.e. train the nets \(\varphi^D\) and \(\varphi^E\) to predict the “data from the data” (it is called autoencoding)
The relation \(\varphi^D( \varphi^E(x_n; \theta^E), \theta^D) ~ x_n\) can be rewritten as
\[x_n \xrightarrow{\varphi^E(; \theta^E)} h \xrightarrow{\varphi^D(; \theta^D)} x_n \]
When that relation is (mostly) satisfied and \(\mathbb{R}^h << \mathbb{R}^x\), \(h\) can be viewed as a lower dimension representation of \(x\). It encodes the information as a lower dimension vector \(h\) and is called learned embeddings.
In particular words have a vector representation in this space!
This very powerful approach can be applied to combine encoders/decoders from different contexts (ex Dall-E)
Main flaw with the recursive approach:
With the attention mechanism, each predicted word/embedding is determined by all preceding words/embeddings, with different weights that are endogenous.
Most famous engine developped by OpenAI: Generative Pre-trained Transformer (aka GPT)
GPT-3 was trained1 on
⇒ 45 TB of data
⇒ size ???
Dataset (mostly) ends in 2021.
Several concepts are relevant here:
unsupervised learning
fine tuning
reinforcement learning
A machine can perform a task \(f(x; \theta)\) for some input \(x\) in a data-generating process \(\mathcal{X}\) and and some parameters \(\theta\).
A typical learning task consists in optimizing a loss function (aka theoretical risk): \[\min _{\theta} \mathcal{L}(\theta) = \mathbb{E}_{\theta} f(x; \theta)\]
The central learning method to minimize the objective is called stochastic gradient descent.
In practice one has access to a dataset \((x_n) \subset \mathcal{X}\) and minimizes the “empirical” risk function
\[L\left( (x_n)_{n=1:N}, \theta \right) = \frac{1}{N} \sum_{n=1}^N f(x; \theta)\]
Regular case: in usual cases, we assume that the dataset is generated by the true model (data-generating process)
Two important variants:
A reinforcement learning algorithm can take actions which have two effects:
Example:
The GPT-4 model has been fine-tuned with reinforcement learning. The language model was rewarded for providing the right kind of answer:
Two main variants on top of foundation model GPT Base
:1
instructGPT
chatGPT
There is information about how GPT-3 was trained (check technical paper or summary)
Which of the following model should you use?
Lots of GPT variants:
o3-mini
gpt-4o
gpt-4-turbo-preview
text-ada-001
And a lot of competititors:
openai
)What are the differences between the various engines?
Checkout the awesome list!
What are the trends?
_code
models