Chapter 4 Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays

4.1 Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array. The shape of an array is a tuple of integers giving the size (the total number of elements) of the array along each dimension.

import numpy as np
# 1-dimensonal array, also referred to as a vector
a1 = np.array([1, 2, 3])
a1.shape, a1.ndim, a1.dtype, a1.size, type(a1)
#> ((3,), 1, dtype('int64'), 3, <class 'numpy.ndarray'>)
# 2-dimensional array, also referred to as matrix
a2 = np.array([[1, 2.0, 3.3],
               [4, 5, 6.5]])
a2.shape, a2.ndim, a2.dtype, a2.size, type(a2)
#> ((2, 3), 2, dtype('float64'), 6, <class 'numpy.ndarray'>)
# 3-dimensional array, also referred to as a matrix
a3 = np.array([[[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]],
                [[10, 11, 12],
                 [13, 14, 15],
                 [16, 17, 18]]])
a3.shape, a3.ndim, a3.dtype, a3.size, type(a3)
#> ((2, 3, 3), 3, dtype('int64'), 18, <class 'numpy.ndarray'>)

4.1.1 Creating Arrays

  • np.array()
  • np.ones()
  • np.full()
  • np.zeros()
  • np.random.rand(5, 3)
  • np.random.randint(10, size=5)
  • np.random.seed() - pseudo random numbers

You can change the data type with .astype() or calling the argument dtype= when creating.

# Create an array of ones, 10 rows of 2 cols type integer
ones = np.ones((10, 2))
# Create an array of zeros
zeros = np.zeros((5, 3, 3))
# One-dim array from 1 to 10
a = np.arange(1,11)
# One-dim array of 100 numbers from 1 to 10
a = np.linspace(1,11,50) 
# Random array of integers
a = np.random.randint(10, size=(5, 3))
# Random array of floats (between 0 & 1)
a = np.random.random((5, 3))

For consistency, you might want to keep the random numbers you generate similar throughout experiments. To do this, you can use np.random.seed().

4.1.2 Indexing and slicing arrays

Array shapes are always listed in the format (row, column, n...) where n is optional extra dimension(s).

a = np.arange(100)
a = a.reshape(50,2)
a[0,1]  # same form of an access to a nested list or a[0][1] 
#> 1
a[1:3]  # row slice
#> array([[2, 3],
#>        [4, 5]])
a[:,0]  # project on first column
#> array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
#>        34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
#>        68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])
a[0]    # first row or a[0, :]
#> array([0, 1])
a[:3,0] # Get the first value of the first 3 row
#> array([0, 2, 4])
b = np.arange(48).reshape(4, 12)
b, b[(0,2), 2:5],  b[:, (-1, 2, -1)]

# returns a 1D array with b[-1, 5], b[2, 9], b[-1, 1] and b[2, 9]
#> (array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
#>        [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
#>        [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
#>        [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]]), array([[ 2,  3,  4],
#>        [26, 27, 28]]), array([[11,  2, 11],
#>        [23, 14, 23],
#>        [35, 26, 35],
#>        [47, 38, 47]]))
b[(-1, 2, -1, 2), (5, 9, 1, 9)]
#> array([41, 33, 37, 33])

4.1.3 Reshape

a = np.arange(0,11) 
## Reshape into n rows of n col
# Row vector
a = a.reshape((1,11)) #a[np.newaxis, :]
a
#> array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]])
# Column vector
a = a.reshape((11,1)) #a[:, np.newaxis]
a
#> array([[ 0],
#>        [ 1],
#>        [ 2],
#>        [ 3],
#>        [ 4],
#>        [ 5],
#>        [ 6],
#>        [ 7],
#>        [ 8],
#>        [ 9],
#>        [10]])
# Transpose the array
a.T
#> array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]])

4.1.3.1 Aggregate

#Concatenate (The arrays must have equal numbers of dimensions)
a=np.array([1,2,3])
b=np.array([4,5,6])
c=np.array([7,8,9])
np.concatenate([a, b, c])
#> array([1, 2, 3, 4, 5, 6, 7, 8, 9])

For arrays with different numbers of dimensions, use vstack and hstack to perform concatenation. The inverse operation, splitting, is implemented by split,vsplit,hsplit

a = np.array([2,3,5])
b = np.array([[7,11,13], [17,19,23]])
np.vstack([a, b]) #np.concatenate((a, b), axis=0)
#> array([[ 2,  3,  5],
#>        [ 7, 11, 13],
#>        [17, 19, 23]])
b = np.array([[29], [31]])
b2 = np.array([[7,11,13], [17,19,23]])
np.hstack([b, b2]) ##np.concatenate((a, b), axis=1)
#> array([[29,  7, 11, 13],
#>        [31, 17, 19, 23]])
a = np.array([2,3,5,7,11,13,17,19,23])
a1, a2, a3 = np.split(a, [3,6])
a1, a2, a3
#> (array([2, 3, 5]), array([ 7, 11, 13]), array([17, 19, 23]))

There is also a split function which splits an array along any given axis. Calling vsplit is equivalent to calling split with axis=0. There is also an hsplit function, equivalent to calling split with axis=1.

4.1.3.2 Sort

  • np.sort()
  • np.argsort()
  • np.argmax()
  • np.argmin()

4.1.3.3 Transpose

The transpose method creates a new view on an ndarray’s data, with axes permuted in the given order. By default, transpose reverses the order of the dimensions:

a = np.arange(24).reshape(4,2,3)
a
#> array([[[ 0,  1,  2],
#>         [ 3,  4,  5]],
#> 
#>        [[ 6,  7,  8],
#>         [ 9, 10, 11]],
#> 
#>        [[12, 13, 14],
#>         [15, 16, 17]],
#> 
#>        [[18, 19, 20],
#>         [21, 22, 23]]])
a2= a.transpose() # equivalent to t.transpose((2, 1, 0))

Now let’s create an ndarray such that the axes 0, 1, 2 (depth, height, width) are re-ordered to 1, 2, 0 (depth→width, height→depth, width→height):

a1 = a.transpose((1,2,0))
a1, a1.shape
#> (array([[[ 0,  6, 12, 18],
#>         [ 1,  7, 13, 19],
#>         [ 2,  8, 14, 20]],
#> 
#>        [[ 3,  9, 15, 21],
#>         [ 4, 10, 16, 22],
#>         [ 5, 11, 17, 23]]]), (2, 3, 4))

NumPy provides a convenience function swapaxes() to swap two axes. For example, let’s create a new view of t with depth and height swapped:

a3 = a.swapaxes(0,1) # equivalent to a.transpose((1, 0, 2)) 
a3
#> array([[[ 0,  1,  2],
#>         [ 6,  7,  8],
#>         [12, 13, 14],
#>         [18, 19, 20]],
#> 
#>        [[ 3,  4,  5],
#>         [ 9, 10, 11],
#>         [15, 16, 17],
#>         [21, 22, 23]]])

The T attribute is equivalent to calling transpose() when the rank is 2. The T attribute has no effect on rank 0 (empty) or rank 1 arrays.

a.T
#> array([[[ 0,  6, 12, 18],
#>         [ 3,  9, 15, 21]],
#> 
#>        [[ 1,  7, 13, 19],
#>         [ 4, 10, 16, 22]],
#> 
#>        [[ 2,  8, 14, 20],
#>         [ 5, 11, 17, 23]]])

4.2 Operators

All the usual arithmetic operators can be used with ndarrays. They apply element wise: * Arithmetic * +, -, *, /, //, **, % * np.exp() * np.log() * np.dot() - Dot product

The arrays must have the same shape. If they do not, NumPy will apply the broadcasting rules.

4.2.1 Broadcasting Rules

  • If the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.
h = np.arange(5).reshape(1, 1, 5)
h
#> array([[[0, 1, 2, 3, 4]]])

Now let’s try to add a 1D array of shape (5,) to this 3D array of shape (1,1,5). Applying the first rule of broadcasting!

h + [10, 20, 30, 40, 50] #same as: h + [[[10, 20, 30, 40, 50]]]
#> array([[[10, 21, 32, 43, 54]]])
  • Arrays with a 1, along a particular dimension, act as if they had the size of the array with the largest shape along that dimension. The value of the array element is repeated along that dimension.
k = np.arange(6).reshape(2,3)
k
#> array([[0, 1, 2],
#>        [3, 4, 5]])

Let’s try to add a 2D array of shape (2,1) to this 2D ndarray of shape (2, 3). NumPy will apply the second rule of broadcasting:

k + [[100], [200]] # same as: k + [[100, 100, 100], [200, 200, 200]]
#> array([[100, 101, 102],
#>        [203, 204, 205]])

Combining rules 1 & 2, we can do this:

k + [100, 200, 300] # after rule 1: [[100, 200, 300]], and after rule 2:␣ 􏰰→[[100, 200, 300], [100, 200, 300]]
#> array([[100, 201, 302],
#>        [103, 204, 305]])

After rules 1 & 2, the sizes of all arrays must match.

try:
 k + [33, 44]
except ValueError as e:
 print(e)
#> operands could not be broadcast together with shapes (2,3) (2,)

4.2.2 Upcasting

When trying to combine arrays with different dtypes, NumPy will upcast to a type capable of handling all possible values (regardless of what the actual values are).

k1 = np.arange(0, 5, dtype=np.uint8)
print(k1.dtype, k1)
#> uint8 [0 1 2 3 4]
k2 = k1 + np.array([5, 6, 7, 8, 9], dtype=np.int8)
# Note that int16 is required to represent all possible int8 and uint8 values (from -128 to 255)
print(k2.dtype, k2)
#> int16 [ 5  7  9 11 13]
k3 = k1 + 1.5
print(k3.dtype, k3)
#> float64 [1.5 2.5 3.5 4.5 5.5]

4.2.3 Conditional operators

The conditional operators also apply elementwise.

  • Comparison operators
    • >
    • <
    • <=
    • >=
    • x != 3
    • x == 3
    • np.sum(x > 3)
k2 < np.array([2, 8, 11, 9, 6]) 
#> array([False,  True,  True, False, False])
k2 < 8
#> array([ True,  True, False, False, False])

4.2.4 Statistics

Many mathematical and statistical functions are available for ndarrays.

  • Aggregation
    • np.sum() - faster than .sum()
    • np.prod()
    • np.mean()
    • np.std()
    • np.var()
    • np.min()
    • np.max()
    • np.argmin() - find index of minimum value
    • np.argmax() - find index of maximum value

These functions accept an optional argument axis which lets you ask for the operation to be performed on elements along the given axis. For example:

c=np.arange(24).reshape(2,3,4)
c
#> array([[[ 0,  1,  2,  3],
#>         [ 4,  5,  6,  7],
#>         [ 8,  9, 10, 11]],
#> 
#>        [[12, 13, 14, 15],
#>         [16, 17, 18, 19],
#>         [20, 21, 22, 23]]])
c.sum(axis=0) # sum across matrices
#> array([[12, 14, 16, 18],
#>        [20, 22, 24, 26],
#>        [28, 30, 32, 34]])
c.sum(axis=1) # sum across rows
#> array([[12, 15, 18, 21],
#>        [48, 51, 54, 57]])
c.sum(axis=(0,2)) # sum across matrices and columns (first row, second row , 8+9+10+11 + 20+21+22+23)
#> array([ 60,  92, 124])

4.2.5 Linear Algebra

Linear algebra functions are available in the numpy.linalg module. The inv function computes a square matrix’s inverse, whereas the pinv() computes the pseudoinverse.

import numpy.linalg as linalg
a = np.array([[1,2,3],[5,7,11],[21,29,31]])
linalg.inv(a), linalg.pinv(a)
#> (array([[-2.31818182,  0.56818182,  0.02272727],
#>        [ 1.72727273, -0.72727273,  0.09090909],
#>        [-0.04545455,  0.29545455, -0.06818182]]), array([[-2.31818182,  0.56818182,  0.02272727],
#>        [ 1.72727273, -0.72727273,  0.09090909],
#>        [-0.04545455,  0.29545455, -0.06818182]]))

The qr function computes the QR decomposition of a matrix, also known as a QR factorization or QU factorization. It is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R.

q, r = linalg.qr(a)  #q.dot(r) -> q**r equals a
  • The det() function computes the matrix determinant.
linalg.det(a)
#> 43.99999999999997
  • The eig() function from linalg computes the eigenvalues and eigenvectors of a square matrix:
eigenvalues, eigenvectors = linalg.eig(a)
eigenvalues, eigenvectors
#> (array([42.26600592, -0.35798416, -2.90802176]), array([[-0.08381182, -0.76283526, -0.18913107],
#>        [-0.3075286 ,  0.64133975, -0.6853186 ],
#>        [-0.94784057, -0.08225377,  0.70325518]]))
  • The diag() function returns the elements on the diagonal, whereas trace() returns their sum:
np.diag(a),np.trace(a)
#> (array([ 1,  7, 31]), 39)

The solve function solves a system of linear scalar equations, such as: \[2x+6y=6\] \[5x+3y=-9\]

coeffs  = np.array([[2, 6], [5, 3]])
depvars = np.array([6, -9])
solution = linalg.solve(coeffs, depvars)
solution
#> array([-3.,  2.])