Numeric Python

Numpy/scipy/matplotlib

Most python scientists, use the following libraries:
- numpy: performant array library (vectors, matrices, tensors)
- matplotlib: plotting library
- scipy: all kinds of mathematical routines
In the rest of the course, we’ll make some use of numpy and matplotlib
They are included in all python distributions like Anaconda Python
All additional libraries use numpy and matplotlib: pandas, statsmodels, sklearn

Importing the libraries

It is standard to import the libraries as np, and plt. We’ll follow this convention here.

# these lines need to be run only once per program
import numpy as np
import matplotlib as plt

print(f"Numpy version {np.__version__}")
print(f"Matplotlib version {plt.__version__}")

Numpy

What is Numpy

Numpy is an array type (python object) meant to store efficiently homogenous, square, arrays (like \((a_{i})_{i\in [1,N]}\) or \((b_{i,j,k})_{i\in [1,N],j\in[1,J],k \in [1,K]}\))

By default its stores data in contiguous C-order (last index varies faster), but also supports Fortran order and strided arrays (non-contiguous).

Numpy has introduced well thought conventions, that have been reused by many other libraries (tensorflow, pytorch, jax), or even programming languages (julia)

Vector Creation

Vectors and matrices are created with the np.array(...) function.
Special vectors can be created with np.zeros, np.ones, np.linspace

# an array can be created from a list of numbers
np.array( [1.0, 2.0, 3.0] )

# or initialized by specifying the length of the array
np.zeros(5)

# 10 regularly spaced points between 0 and 1
np.linspace(0, 1, 11)

Matrix Creation

A matrix is a 2-dimensional array and is created with np.array
Function np.matrix() has been deprecated: do not use it.
There are functions to create specific matrices: np.eye, np.diag, …

# an array can be created from a list of (equal size) lists
np.array([
    [1.0, 2.0, 3.0],
    [4  ,   5,   6] 
])

# initialize an empty matrix with the dimensions as a tuple
A = np.zeros( (2, 3) )
A

# matrix dimensions are contained in the shape attribute
A.shape

Tensors

The construction generalizes to higher dimension arrays (a.k.a. tensors)

# an array can be created from a list of list of lists
np.array([
    [
        [1.0, 2.0, 3.0],
        [4  ,   5,   6] 
    ],
        [
        [7.0, 8.0, 9.0],
        [10 ,  11,   12] 
    ]
])

# initialize an empty matrix with the dimensions as a tuple
A = np.zeros( (2, 3) )
A

# matrix dimensions are contained in the shape attribute
A.shape

Linear Algebra

Vector multiplications and Matrix multiplications can be performed using special sign @

A = np.array([[1.0, 2.0], [2,4]])
A

B = np.array([1.0, 2.0])
B

A@B

A@A

Note how multiplication reduces total number of dimensions by 2. It is a tensor reduction.

print(A.shape, A.shape, (A@A).shape)

Scalar types

Numpy arrays can contain data of several scalar types.

[True, False, True]

# vector of boolean
boolean_vector = np.array( [True, False, True] )
print(f"type of scalar '{boolean_vector.dtype}'")
boolean_vector

# vector of integers
int_vector = np.array([1, 2, 0])
print(f"type of scalar '{int_vector.dtype}'")
int_vector

By default, numerical arrays contain float64 numbers (like matlab). But GPUs typically process 16 bits or 32 bits numbers.

Can you create a 32 bits array?

# your code here

Subscripting Vectors

Elements and subarrays, can be retrieved using the same syntax as lists and strings.
- Remember that indexing starts at 0.

V = np.array([0., 1., 2., 3., 4.])
display(V[1])  # second element

V = np.array([0., 1., 2., 3., 4.])
display(V[1:3])  # second, third and fourth element

Modifying Vector Content

Elements and suvectors, can be assigned to new values, as long as they have the right dimensions.

V = np.array([1., 1., 2., 4., 5., 8., 13.])
V[3] = 3.0
V

V = np.array([1., 1., 2., 4., 5., 8., 13.])
# V[1:4] = [1,2,3,4] # this doesn't work
V[1:4] = [2,3,4] # this works

Subscripting Matrices

Indexing generalizes to matrices: there are two indices istead of one: M[i,j]
One can extract a row, or a column (a slice) with M[i,:] or M[:,i]
A submatrix is defining with two intervals: M[i:j, k:l] or M[i:j, :], …

M = np.array([[1,2,3],[4,5,6],[7,8,9]])
M

M[0,1] # access element (1,2)

M[2,:] # third row

M[:,1] # second column     # M[i,1] for any i

M[1:3, :] # lines from 1 (included) to 3 (excluded) ; all columns

Modifying matrix content

M = np.array([[1,2,3],[4,5,6],[7,8,9]])
M

M[0,0] = 0
M

M[1:3, 1:3] = np.array([[0,1],[1,0]]) # dimensions must match
M

Element-wise algebraic operations

The following algebraic operations are defined on arrays: +, -, *, /, **.
Comparisons operators (<,<=, >, >=, ==) are defined are return boolean arrays.
They operate element by element.

A = np.array([1,2,3,4])
B = np.array([4,3,2,1])
A+B

A*B    # note the difference with A@B

A>B

At first, one might be surprised that the default multiplication operator is element-wise multiplication rather than matrix multiplication.

There are at least two good reasons:

consistency: all operators can be broadcasted with the exact same rules (like *, +, >)
for many workflows, elementwise operations are more common than matrix multiplication

Element-wise logical operations

The following logical operations are defined element-wise on arrays: & (and), | (or), ~ (not)

A = np.array([False, False, True, True])
B = np.array([False, True, False, True])

~A

A | B

A & B

Vector indexing

Arrays can be indexed by boolean arrays instead of ranges.
Only elements corresponding to true are retrieved

x = np.linspace(0,1,6)
x

# indexes such that (x^2) > (x/2)
x**2 > (x/2)

cond = x**2 > (x/2)
x[ cond ]

Going further: broadcasting rules

Numpy library has defined very consistent conventions, to match inconsistent dimensions.
Ignore them for now…

M = np.eye(4)
M

M[2:4, 2:4] = 0.5 # float
M

M[:,:2] = np.array([[0.1, 0.2]])  # 1x2 array
M

Going Further

Other useful functions (easy to google):
- np.arange() regularly spaced integers
- np.where() find elements in
- …

Matplotlib

matplotlib is …
object oriented api optional Matlab-like syntax
main function is plt.plot(x,y) where x and y are vectors (or iterables like lists)
- lots of optional arguments

from matplotlib import pyplot as plt

Example

x = np.linspace(-1,1,6)

y = np.sin(x)/x # sinus cardinal

plt.plot(x,y,'o')
plt.plot(x,y)

Example (2)

x = np.linspace(-5,5,100)

fig = plt.figure() # keep a figure open to draw on it
for k in range(1,5):
    y = np.sin(x*k)/(x*k)
    plt.plot(x, y, label=f"$sinc({k} x)$") # label each line
plt.plot(x, x*0, color='black', linestyle='--')
plt.grid(True) # add a grid
plt.title("Looking for the right hat.")
plt.legend(loc="upper right")

Example (3)

x = np.linspace(-5,5,100)

plt.figure()
plt.subplot(2,2,1) # create a 2x2 subplot and draw in first quadrant
plt.plot(x,x)
plt.subplot(2,2,2) # create a 2x2 subplot and draw in second quadrant
plt.plot(x,-x)
plt.subplot(2,2,3) # create a 2x2 subplot and draw in third quadrant
plt.plot(x,-x)
plt.subplot(2,2,4) # create a 2x2 subplot and draw in fourth quadrant
plt.plot(x,x)

plt.tight_layout() # save some space

Alternatives to matplotlib

plotly (nice javascript graphs)
bqplot (native integration with jupyter)
altair
- excellent for dataviz/interactivity
- python wrapper to Vega-lite
- very efficient to visualize pandas data (i.e. a dataframe)