Solving systems of equations in Python
In high school algebra, you probably learned to solve systems of equations such as:
$$4x + 3y = 32$$ $$4x - 2y = 12$$Example 1: Two equations of two variables
One (pencil and paper) way to solve this sort of system of equations is to pick one of the two equations and solve for one variable. For example, the first resolves to:
$$-2y = 12 - 4x$$ $$y = -6 + 2x$$Example 2: Solving for y
This can be inserted into the second to find a solution for x:
$$4x + 3(-6 + 2x) = 32$$ $$4x - 18 + 6x = 32$$ $$10x - 18 = 32$$ $$10x = 50$$ $$x = 5$$Example 3: Solving for x
and then inserted back into the other equation to solve for y:
$$20 - 2y = 12$$ $$-2y = -8$$ $$y = 4$$Example 4: back substitution
This pair (5,4) is the single solution for this system of equations. Geometrically, this is the point at which the two lines intersect.
You can extend this approach to more than two equations of more than two unknowns: given three equations and three unknowns, pick one, solve for a single variable and insert that back into the other two. Now you have a system of two equations that you can solve as above. This same approach can be extended out to any number of equations, but gets to be pretty tedious and error prone: even three equations with three unknowns is a hassle to solve this way.
Another way to approach these problems is to add or subtract one equation to or from the other. This is always legitimate: you can add the same value to each side of any equation and still have a valid equation. Since the other equation specifies the same thing on both sides, you can generate another valid equation by solving:
$$4x + 3y = 32$$ $$4x - 2y = 12$$ $$(4x + 3y) - (4x - 2y) = 32 - 12$$ $$4x + 3y - 4x + 2y = 20$$ $$5y = 20$$ $$y = 4$$Example 5: Subtract one equation from the other
This turned out to be easy because the 4x value subtracted out and I was left with a single variable that I could quickly solve for. This might lead you to think this is a "niche" technique that can only be applied when the same term appears in both equations, but it's actually more general than that. Just as you can always add or subtract the same value to each side of an equation, you can also always multiply or divide the same value to both sides. So, given
$$2x + 4y = 22$$ $$4x - 2y = 4$$Example 6: Scale one equation
You can eliminate the x value from the second equation by solving:
$$2 * (2x + 4y) - (4x - 2y) = 22 * 2 - 4$$ $$4x + 8y - 4x + 2y = 40$$ $$10y = 40$$ $$y = 4$$Example 7: Subtract the scaled equation
So you can always multiply or divide one equation by a factor that causes adding or subtracting it from the other to eliminate one of the variables. This can be extended out to any number of equations by eliminating one variable at a time. This process is called Gaussian elimination.
It's customary to save some bookkeeping by representing the equations as matrices of coefficients, so that the three-equation, three unknown system of equations:
$$-3x + 2y - 6x = 6$$ $$5x + 7y - 5z = 6$$ $$x + 4y - 2x = 8$$Example 8: Three equations of three variables
Is written more compactly as:
$$\begin{bmatrix} -3 & 2 & -6 & 6 \\ 5 & 7 & -5 & 6 \\ 1 & 4 & -2 & 8 \end{bmatrix}$$Example 9: Matrix form
(This representation is called a matrix). The goal now is to convert this into what's called an upper-triangular matrix where the lower left entries are all zeros. Once that's done, the bottom equation is a single equation of a single variable that can be solved, substituted up into the second from last and solved, and then worked upwards.
Converting a matrix into upper triangular form is pretty formulaic - in fact, it can be programmed. Start by multiplying the first row by the quotient of the first coefficient of the second row and the first coefficient of the first row (in other words, find a factor that makes the first coefficient of the first row the same as the first coefficient of the second row) and subtract the first row from the second. In the current case, that factor is (5/-3), so the first row becomes:
$$\begin{matrix}-3 * \frac{5}{-3} & 2 * \frac{5}{-3} & -6 * \frac{5}{-3} & 6 * \frac{5}{-3}\end{matrix}$$ $$\begin{matrix}5 & -\frac{10}{3} & 10 & -10\end{matrix}$$Example 10: Scale the first row
Subtracting this from the second row of example 9 yields:
$$\begin{matrix} & 5 & 7 & -5 & 6 \\ - & 5 & -\frac{10}{3} & 10 & -10 \\ & 0 & \frac{31}{3} & -15 & 16 \end{matrix}$$Example 11: Subtract from the second
And the matrix becomes
$$\begin{bmatrix} -3 & 2 & -6 & 6 \\ 0 & \frac{31}{3} & -15 & 16 \\ 1 & 4 & -2 & 8 \end{bmatrix}$$Example 12: Resulting matrix
Now do the same thing with the first and third rows - here, the scale factor of the first row is -1/3 and the first row becomes:
$$\begin{matrix} -3 * \frac{-1}{3} & 2 * \frac{-1}{3} & -6 * \frac{-1}{3} & 6 \frac{-1}{3} \end{matrix}$$ $$\begin{matrix} 1 & -\frac{2}{3} & 2 & -2 \end{matrix}$$Example 13: Scale and subtract
And subtracting this from the third row results in the matrix
$$\begin{bmatrix} -3 & 2 & -6 & 6 \\ 0 & \frac{31}{3} & -15 & 16 \\ 0 & \frac{14}{3} & -4 & 10 \end{bmatrix}$$Example 14: Transformed matrix
Now, the running second row can be used to eliminate the second coefficient from the third row by scaling it by 14/31:
$$\begin{matrix} 0 * \frac{14}{31} & \frac{31}{3} * \frac{14}{31} & -15 * \frac{14}{31} & 16 * \frac{14}{31} \end{matrix}$$ $$\begin{matrix} 0 & \frac{14}{3} & -\frac{210}{31} & \frac{224}{31} \end{matrix}$$Example 15: Scale and subtract
And obtaining the final upper triangular matrix:
$$\begin{bmatrix} -3 & 2 & -6 & 6 \\ 0 & \frac{31}{3} & -15 & 16 \\ 0 & 0 & \frac{86}{31} & \frac{86}{31} \end{bmatrix}$$Example 16: Upper triangular matrix
So the equations themselves can be rewritten (consistently) as:
$$-3x + 2y -6z = 6$$ $$\frac{31}{3}y - 15x = 16$$ $$\frac{86}{31}z = \frac{86}{31}$$Example 17: Upper triangular matrix with named coefficients
Which can then be solved:
$$z = 1$$ $$\frac{31}{3}y - 15 = 16$$ $$\frac{31}{3}y = 31$$ $$y = 3$$ $$-3x + 6 - 6 = 6$$ $$-3x = 6$$ $$x = -2$$Example 18: Solution
Computationally, though, it makes sense to take this (at least) one step further and convert this upper-triangular matrix into row echelon form: a row-echelon matrix is not only upper triangular, but also the first non-zero element of each row is a 1. This is easy enough to do; just divide the entire row by the value of the first coefficient. Returning to the sample matrix (example 16), this now becomes:
$$\begin{bmatrix} 1 & -\frac{2}{3} & 2 & -2 \\ 0 & 1 & -\frac{45}{31} & \frac{48}{31} \\ 0 & 0 & 1 & 1 \end{bmatrix}$$Example 19: Row-echelon form
You can start to see a programmatic solution shaping up, but this is still sort of difficult to code for: the z coefficient falls out of this form easily, but the "back substitution" part is less automatic. What you can do, though, is repeat the upper-triangulation process backwards, from the bottom up, yielding what is called a reduced row-echelon matrix.
In this case, you want to start by multiplying the bottom row by -45/31 and then subtracting the result from the second row:
$$\begin{bmatrix} 1 & -\frac{2}{3} & 2 & -2 \\ 0 & 1 & 0 & \frac{93}{31} = 3 \\ 0 & 0 & 1 & 1 \end{bmatrix}$$Example 20: Scale and subtract, row 2
And then multiply the bottom row by 2 and subtract it from the top:
$$\begin{bmatrix} 1 & -\frac{2}{3} & 0 & -4 \\ 0 & 1 & 0 & 3 \\ 0 & 0 & 1 & 1 \end{bmatrix}$$Example 21: Scale and subtract, row 1, coefficient 3
And finally multiply the second row by -2/3 and subtract it from the first:
$$\begin{bmatrix} 1 & 0 & 0 & -2 \\ 0 & 1 & 0 & 3 \\ 0 & 0 & 1 & 1 \end{bmatrix}$$Example 22: Scale and subtract, row 1, coefficient 2
Which is now completely automatable: the values of each coefficient are the right-most entries in the matrix. This example solved for three unknowns, but the algorithm extends out to any number of unknowns in any number of equations.
In Python, then, you can represent a matrix as a list of lists (assumed to each be of equal length; Python doesn't have the concept of multidimensional arrays like C does, so it's up to the caller in this case to make sure that the lists aren't "ragged"). Then this matrix can be converted to upper triangular form automatically as shown in listing 1:
def upper_triang(m):
matrix_print(m)
for i in range(len(m)):
top_row = m[i]
for j in range(i + 1, len(m)):
reduce_row = m[j]
reduce_factor = reduce_row[i] / top_row[i] if top_row[i] != 0 else 0
for k in range(i, len(reduce_row)):
reduce_row[k] -= top_row[k] * reduce_factor
return m
Listing 1: Upper triangularization of a matrix
Here, i
represents the index of the row that's being scaled and subtracted from each subsequent row, and j
is the index of the row that it's being subtracted from: k
iterates over the coefficients of row j
. Feeding the
matrix from example 9 above into this function with a matrix_print
helper function as show in listing 2:
def matrix_print(m):
for row in m:
print row
matrix_print(upper_triang([[-3.0, 2.0, -6.0, 6.0],
[5.0, 7.0, -5.0, 6.0],
[1.0, 4.0, -2.0, 8.0]]))
Listing 2: Upper triangularization example
Results in the matrix shown in example 22:
[-3.0, 2.0, -6.0, 6.0]
[0.0, 10.333333333333334, -15.0, 16.0]
[0.0, 0.0, 2.774193548387097, 2.774193548387097]
Example 22: Upper triangular result
(Compare this with example 16). You can then convert this to row-echelon form as shown in listing 3:
def row_echelon(m):
for row in m:
scale_factor = 0
for k in range(len(row)):
if row[k] != 0 and scale_factor == 0:
scale_factor = row[k]
if scale_factor != 0:
row[k] /= scale_factor
return m
Listing 3: Row Echelon code
This will result in the matrix shown in example 23:
[1.0, -0.6666666666666666, 2.0, -2.0]
[0.0, 1.0, -1.4516129032258063, 1.5483870967741935]
[0.0, 0.0, 1.0, 1.0]
Example 23: Row echelon result
(Again, compare with example 19). Finally, eliminating backwards as shown in listing 4:
def reduced_re(m):
for i in reversed(range(len(m))):
bottom_row = m[i]
for j in reversed(range(i)):
reduce_row = m[j]
reduce_factor = reduce_row[i] / bottom_row[i] if bottom_row[i] != 0 else 0
for k in range(i, len(reduce_row)):
reduce_row[k] -= bottom_row[k] * reduce_factor
return m
Listing 4: Reduced Row-Echelon format
This is more or less the same as listing 1, with reversed ranges instead of "forward" ranges. Notice that I still have to iterate
forward with index k
. Of course, you could combine these three functions together into one linear equation solver function,
but I think this breakdown is a little easier to follow.
You have to watch out for some edge cases here, though. For example, the matrix:
[[1.0,1.0,1.0,3.0],[1.0,-1.0,2.0,2.0],[2.0,0.0,3.0,1.0]]
Example 24: No solutions
yields:
[1.0, 1.0, 1.0, 3.0]
[0.0, 1.0, -0.5, 0.5]
[0.0, 0.0, 0.0, 1.0]
Example 25: contradiction
The bottom row indicates there's a problem - in fact, the matrix itself is unsolvable, and the contradiction here where 0.0 = 1.0 is the tip-off. On the other hand, the matrix:
[[1.0,1.0,1.0,3.0],[1.0,-1.0,2.0,2.0],[2.0,0.0,3.0,5.0]]
Example 26: underdetermined
is not linearly independent (the third row is the combination of the first two), so it has multiple solutions. The solver presented here yields:
[1.0, 0.0, 1.5, 2.5]
[0.0, 1.0, -0.5, 0.5]
[0.0, 0.0, 0.0, 0.0]
Example 27: All zeros
Here, the bottom row of all zeros is a tip-off that the matrix represents an underdetermined system of equations; that is, one with multiple solutions.
Also, order matters. Consider what happens with the input in example 28:
matrix_print(reduced_re(row_echelon(upper_triang([[0.0, 0.0, 2.774193548387097, 2.774193548387097],
[0.0, 10.333333333333334, -15.0, 16.0],
[-3.0, 2.0, -6.0, 6.0]]))))
[0.0, 0.0, 0.0, 1.9375]
[0.0, 1.0, 0.0, 0.18750000000000022]
[1.0, -0.0, 1.032258064516129, -0.967741935483871]
Example 28: upside-down matrix
This is definitely wrong - this is the upper-triangular output from example 22, in reverse order. It should yield the same result as
example 22 did. The problem is the zeros in the first entry: you can't scale 0 by anything to do a reduction here. In general, this
algorithm will fail if m[i][i] = 0
for any i
in len(m)
— at any point in the algorithm
(remember, a prior transformation can "invalidate" a row this way). The fix is to dynamically sort the rows as the algorithm proceeds;
convert upper_triang
as shown in listing 5.
def upper_triang(m):
matrix_print(m)
for i in range(len(m)):
candidate = i + 1
while m[i][i] == 0 and candidate < len(m):
(m[i],m[candidate]) = (m[candidate],m[i])
candidate += 1
top_row = m[i]
Listing 5: sort the array while converting it
Finally, the gaussian elimination algorithm itself is vulnerable to numerical instability because of the use of floating-point
operations. You can mitigate this by keeping everything in rational form until the very end, or by applying the softmax function
to "smooth out" the coefficients. Rather than coding all of this by hand, the standard library for this type of problems
is numpy
:
>>> import numpy as np
>>> a = np.array([[-3,2,-6],[5,7,-5],[1,4,-2]])
>>> b = np.array([6,6,8])
>>> np.linalg.solve(a,b)
array([-2., 3., 1.])
Listing 6: numpy and scipy solution
I'll close out with a neat trick outlined in Mathematics for Machine Learning — using gaussian elimination to invert a matrix. Given the input matrix:
$$\begin{bmatrix} 1 & 0 & 2 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 2 & 0 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix}$$Example 29: Matrix to invert
If you extend it with an identity matrix on the right:
$$\begin{bmatrix} 1 & 0 & 2 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 1 & 2 & 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 1 \end{bmatrix}$$Example 30: Append an identity matrix
And feed it into the gaussian elimination routine:
matrix_print(reduced_re(row_echelon(upper_triang([[1.0,0.0,2.0,0.0,1.0,0.0,0.0,0.0],
[1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0],
[1.0,2.0,0.0,1.0,0.0,0.0,1.0,0.0],
[1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0]]))))
[1.0, 0.0, 0.0, 0.0, -1.0, 2.0, -2.0, 2.0]
[0.0, 1.0, 0.0, 0.0, 1.0, -1.0, 2.0, -2.0]
[0.0, 0.0, 1.0, 0.0, 1.0, -1.0, 1.0, -1.0]
[0.0, 0.0, 0.0, 1.0, -1.0, 0.0, -1.0, 2.0]
Example 31: Inverted matrix solution
The right-hand "side" of the result is the matrix inverse.
Add a comment:
Completely off-topic or spam comments will be removed at the discretion of the moderator.
You may preserve formatting (e.g. a code sample) by indenting with four spaces preceding the formatted line(s)