Solving systems of equations in Python
In high school algebra, you probably learned to solve systems of equations such as:
One (pencil and paper) way to solve this sort of system of equations is to pick one of the two equations and solve for one variable. For example, the first resolves to:
This can be inserted into the second to find a solution for x:
and then inserted back into the other equation to solve for y:
This pair (5,4) is the single solution for this system of equations. Geometrically, this is the point at which the two lines intersect.
You can extend this approach to more than two equations of more than two unknowns: given three equations and three unknowns, pick one, solve for a single variable and insert that back into the other two. Now you have a system of two equations that you can solve as above. This same approach can be extended out to any number of equations, but gets to be pretty tedious and error prone: even three equations with three unknowns is a hassle to solve this way.
Another way to approach these problems is to add or subtract one equation to or from the other. This is always legitimate: you can add the same value to each side of any equation and still have a valid equation. Since the other equation specifies the same thing on both sides, you can generate another valid equation by solving:
This turned out to be easy because the 4x value subtracted out and I was left with a single variable that I could quickly solve for. This might lead you to think this is a "niche" technique that can only be applied when the same term appears in both equations, but it's actually more general than that. Just as you can always add or subtract the same value to each side of an equation, you can also always multiply or divide the same value to both sides. So, given
You can eliminate the x value from the second equation by solving:
So you can always multiply or divide one equation by a factor that causes adding or subtracting it from the other to eliminate one of the variables. This can be extended out to any number of equations by eliminating one variable at a time. This process is called Gaussian elimination.
It's customary to save some bookkeeping by representing the equations as matrices of coefficients, so that the three-equation, three unknown system of equations:
Is written more compactly as:
(This representation is called a matrix). The goal now is to convert this into what's called an upper-triangular matrix where the lower left entries are all zeros. Once that's done, the bottom equation is a single equation of a single variable that can be solved, substituted up into the second from last and solved, and then worked upwards.
Converting a matrix into upper triangular form is pretty formulaic - in fact, it can be programmed. Start by multiplying the first row by the quotient of the first coefficient of the second row and the first coefficient of the first row (in other words, find a factor that makes the first coefficient of the first row the same as the first coefficient of the second row) and subtract the first row from the second. In the current case, that factor is (5/-3), so the first row becomes:
Subtracting this from the second row of example 9 yields:
And the matrix becomes
Now do the same thing with the first and third rows - here, the scale factor of the first row is -1/3 and the first row becomes:
And subtracting this from the third row results in the matrix
Now, the running second row can be used to eliminate the second coefficient from the third row by scaling it by 14/31:
And obtaining the final upper triangular matrix:
So the equations themselves can be rewritten (consistently) as:
Which can then be solved:
Computationally, though, it makes sense to take this (at least) one step further and convert this upper-triangular matrix into row echelon form: a row-echelon matrix is not only upper triangular, but also the first non-zero element of each row is a 1. This is easy enough to do; just divide the entire row by the value of the first coefficient. Returning to the sample matrix (example 16), this now becomes:
You can start to see a programmatic solution shaping up, but this is still sort of difficult to code for: the z coefficient falls out of this form easily, but the "back substitution" part is less automatic. What you can do, though, is repeat the upper-triangulation process backwards, from the bottom up, yielding what is called a reduced row-echelon matrix.
In this case, you want to start by multiplying the bottom row by -45/31 and then subtracting the result from the second row:
And then multiply the bottom row by 2 and subtract it from the top:
And finally multiply the second row by -2/3 and subtract it from the first:
Which is now completely automatable: the values of each coefficient are the right-most entries in the matrix. This example solved for three unknowns, but the algorithm extends out to any number of unknowns in any number of equations.
In Python, then, you can represent a matrix as a list of lists (assumed to each be of equal length; Python doesn't have the concept of multidimensional arrays like C does, so it's up to the caller in this case to make sure that the lists aren't "ragged"). Then this matrix can be converted to upper triangular form automatically as shown in listing 1:
def upper_triang(m):
matrix_print(m)
for i in range(len(m)):
top_row = m[i]
for j in range(i + 1, len(m)):
reduce_row = m[j]
reduce_factor = reduce_row[i] / top_row[i] if top_row[i] != 0 else 0
for k in range(i, len(reduce_row)):
reduce_row[k] -= top_row[k] * reduce_factor
return m
Here, i
represents the index of the row that's being scaled and subtracted from each subsequent row, and j
is the index of the row that it's being subtracted from: k
iterates over the coefficients of row j
. Feeding the
matrix from example 9 above into this function with a matrix_print
helper function as show in listing 2:
def matrix_print(m):
for row in m:
print row
matrix_print(upper_triang([[-3.0, 2.0, -6.0, 6.0],
[5.0, 7.0, -5.0, 6.0],
[1.0, 4.0, -2.0, 8.0]]))
Results in the matrix shown in example 22:
[-3.0, 2.0, -6.0, 6.0]
[0.0, 10.333333333333334, -15.0, 16.0]
[0.0, 0.0, 2.774193548387097, 2.774193548387097]
(Compare this with example 16). You can then convert this to row-echelon form as shown in listing 3:
def row_echelon(m):
for row in m:
scale_factor = 0
for k in range(len(row)):
if row[k] != 0 and scale_factor == 0:
scale_factor = row[k]
if scale_factor != 0:
row[k] /= scale_factor
return m
This will result in the matrix shown in example 23:
[1.0, -0.6666666666666666, 2.0, -2.0]
[0.0, 1.0, -1.4516129032258063, 1.5483870967741935]
[0.0, 0.0, 1.0, 1.0]
(Again, compare with example 19). Finally, eliminating backwards as shown in listing 4:
def reduced_re(m):
for i in reversed(range(len(m))):
bottom_row = m[i]
for j in reversed(range(i)):
reduce_row = m[j]
reduce_factor = reduce_row[i] / bottom_row[i] if bottom_row[i] != 0 else 0
for k in range(i, len(reduce_row)):
reduce_row[k] -= bottom_row[k] * reduce_factor
return m
This is more or less the same as listing 1, with reversed ranges instead of "forward" ranges. Notice that I still have to iterate
forward with index k
. Of course, you could combine these three functions together into one linear equation solver function,
but I think this breakdown is a little easier to follow.
You have to watch out for some edge cases here, though. For example, the matrix:
[[1.0,1.0,1.0,3.0],[1.0,-1.0,2.0,2.0],[2.0,0.0,3.0,1.0]]
yields:
[1.0, 1.0, 1.0, 3.0]
[0.0, 1.0, -0.5, 0.5]
[0.0, 0.0, 0.0, 1.0]
The bottom row indicates there's a problem - in fact, the matrix itself is unsolvable, and the contradiction here where 0.0 = 1.0 is the tip-off. On the other hand, the matrix:
[[1.0,1.0,1.0,3.0],[1.0,-1.0,2.0,2.0],[2.0,0.0,3.0,5.0]]
is not linearly independent (the third row is the combination of the first two), so it has multiple solutions. The solver presented here yields:
[1.0, 0.0, 1.5, 2.5]
[0.0, 1.0, -0.5, 0.5]
[0.0, 0.0, 0.0, 0.0]
Here, the bottom row of all zeros is a tip-off that the matrix represents an underdetermined system of equations; that is, one with multiple solutions.
Also, order matters. Consider what happens with the input in example 28:
matrix_print(reduced_re(row_echelon(upper_triang([[0.0, 0.0, 2.774193548387097, 2.774193548387097],
[0.0, 10.333333333333334, -15.0, 16.0],
[-3.0, 2.0, -6.0, 6.0]]))))
[0.0, 0.0, 0.0, 1.9375]
[0.0, 1.0, 0.0, 0.18750000000000022]
[1.0, -0.0, 1.032258064516129, -0.967741935483871]
This is definitely wrong - this is the upper-triangular output from example 22, in reverse order. It should yield the same result as
example 22 did. The problem is the zeros in the first entry: you can't scale 0 by anything to do a reduction here. In general, this
algorithm will fail if m[i][i] = 0
for any i
in len(m)
— at any point in the algorithm
(remember, a prior transformation can "invalidate" a row this way). The fix is to dynamically sort the rows as the algorithm proceeds;
convert upper_triang
as shown in listing 5.
def upper_triang(m):
matrix_print(m)
for i in range(len(m)):
candidate = i + 1
while m[i][i] == 0 and candidate < len(m):
(m[i],m[candidate]) = (m[candidate],m[i])
candidate += 1
top_row = m[i]
Finally, the gaussian elimination algorithm itself is vulnerable to numerical instability because of the use of floating-point
operations. You can mitigate this by keeping everything in rational form until the very end, or by applying the softmax function
to "smooth out" the coefficients. Rather than coding all of this by hand, the standard library for this type of problems
is numpy
:
>>> import numpy as np
>>> a = np.array([[-3,2,-6],[5,7,-5],[1,4,-2]])
>>> b = np.array([6,6,8])
>>> np.linalg.solve(a,b)
array([-2., 3., 1.])
I'll close out with a neat trick outlined in Mathematics for Machine Learning — using gaussian elimination to invert a matrix. Given the input matrix:
If you extend it with an identity matrix on the right:
And feed it into the gaussian elimination routine:
matrix_print(reduced_re(row_echelon(upper_triang([[1.0,0.0,2.0,0.0,1.0,0.0,0.0,0.0],
[1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0],
[1.0,2.0,0.0,1.0,0.0,0.0,1.0,0.0],
[1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0]]))))
[1.0, 0.0, 0.0, 0.0, -1.0, 2.0, -2.0, 2.0]
[0.0, 1.0, 0.0, 0.0, 1.0, -1.0, 2.0, -2.0]
[0.0, 0.0, 1.0, 0.0, 1.0, -1.0, 1.0, -1.0]
[0.0, 0.0, 0.0, 1.0, -1.0, 0.0, -1.0, 2.0]
The right-hand "side" of the result is the matrix inverse.