Solving systems of equations in Python

In high school algebra, you probably learned to solve systems of equations such as:

4x+3y=32 4x2y=12

Example 1: Two equations of two variables

One (pencil and paper) way to solve this sort of system of equations is to pick one of the two equations and solve for one variable. For example, the first resolves to:

2y=124x y=6+2x

Example 2: Solving for y

This can be inserted into the second to find a solution for x:

4x+3(6+2x)=32 4x18+6x=32 10x18=32 10x=50 x=5

Example 3: Solving for x

and then inserted back into the other equation to solve for y:

202y=12 2y=8 y=4

Example 4: back substitution

This pair (5,4) is the single solution for this system of equations. Geometrically, this is the point at which the two lines intersect.

You can extend this approach to more than two equations of more than two unknowns: given three equations and three unknowns, pick one, solve for a single variable and insert that back into the other two. Now you have a system of two equations that you can solve as above. This same approach can be extended out to any number of equations, but gets to be pretty tedious and error prone: even three equations with three unknowns is a hassle to solve this way.

Another way to approach these problems is to add or subtract one equation to or from the other. This is always legitimate: you can add the same value to each side of any equation and still have a valid equation. Since the other equation specifies the same thing on both sides, you can generate another valid equation by solving:

4x+3y=32 4x2y=12 (4x+3y)(4x2y)=3212 4x+3y4x+2y=20 5y=20 y=4

Example 5: Subtract one equation from the other

This turned out to be easy because the 4x value subtracted out and I was left with a single variable that I could quickly solve for. This might lead you to think this is a "niche" technique that can only be applied when the same term appears in both equations, but it's actually more general than that. Just as you can always add or subtract the same value to each side of an equation, you can also always multiply or divide the same value to both sides. So, given

2x+4y=22 4x2y=4

Example 6: Scale one equation

You can eliminate the x value from the second equation by solving:

2(2x+4y)(4x2y)=2224 4x+8y4x+2y=40 10y=40 y=4

Example 7: Subtract the scaled equation

So you can always multiply or divide one equation by a factor that causes adding or subtracting it from the other to eliminate one of the variables. This can be extended out to any number of equations by eliminating one variable at a time. This process is called Gaussian elimination.

It's customary to save some bookkeeping by representing the equations as matrices of coefficients, so that the three-equation, three unknown system of equations:

3x+2y6x=6 5x+7y5z=6 x+4y2x=8

Example 8: Three equations of three variables

Is written more compactly as:

[326657561428]

Example 9: Matrix form

(This representation is called a matrix). The goal now is to convert this into what's called an upper-triangular matrix where the lower left entries are all zeros. Once that's done, the bottom equation is a single equation of a single variable that can be solved, substituted up into the second from last and solved, and then worked upwards.

Converting a matrix into upper triangular form is pretty formulaic - in fact, it can be programmed. Start by multiplying the first row by the quotient of the first coefficient of the second row and the first coefficient of the first row (in other words, find a factor that makes the first coefficient of the first row the same as the first coefficient of the second row) and subtract the first row from the second. In the current case, that factor is (5/-3), so the first row becomes:

353253653653 51031010

Example 10: Scale the first row

Subtracting this from the second row of example 9 yields:

57565103101003131516

Example 11: Subtract from the second

And the matrix becomes

[3266031315161428]

Example 12: Resulting matrix

Now do the same thing with the first and third rows - here, the scale factor of the first row is -1/3 and the first row becomes:

313213613613 12322

Example 13: Scale and subtract

And subtracting this from the third row results in the matrix

[3266031315160143410]

Example 14: Transformed matrix

Now, the running second row can be used to eliminate the second coefficient from the third row by scaling it by 14/31:

014313131431151431161431 01432103122431

Example 15: Scale and subtract

And obtaining the final upper triangular matrix:

[3266031315160086318631]

Example 16: Upper triangular matrix

So the equations themselves can be rewritten (consistently) as:

3x+2y6z=6 313y15x=16 8631z=8631

Example 17: Upper triangular matrix with named coefficients

Which can then be solved:

z=1 313y15=16 313y=31 y=3 3x+66=6 3x=6 x=2

Example 18: Solution

Computationally, though, it makes sense to take this (at least) one step further and convert this upper-triangular matrix into row echelon form: a row-echelon matrix is not only upper triangular, but also the first non-zero element of each row is a 1. This is easy enough to do; just divide the entire row by the value of the first coefficient. Returning to the sample matrix (example 16), this now becomes:

[1232201453148310011]

Example 19: Row-echelon form

You can start to see a programmatic solution shaping up, but this is still sort of difficult to code for: the z coefficient falls out of this form easily, but the "back substitution" part is less automatic. What you can do, though, is repeat the upper-triangulation process backwards, from the bottom up, yielding what is called a reduced row-echelon matrix.

In this case, you want to start by multiplying the bottom row by -45/31 and then subtracting the result from the second row:

[123220109331=30011]

Example 20: Scale and subtract, row 2

And then multiply the bottom row by 2 and subtract it from the top:

[1230401030011]

Example 21: Scale and subtract, row 1, coefficient 3

And finally multiply the second row by -2/3 and subtract it from the first:

[100201030011]

Example 22: Scale and subtract, row 1, coefficient 2

Which is now completely automatable: the values of each coefficient are the right-most entries in the matrix. This example solved for three unknowns, but the algorithm extends out to any number of unknowns in any number of equations.

In Python, then, you can represent a matrix as a list of lists (assumed to each be of equal length; Python doesn't have the concept of multidimensional arrays like C does, so it's up to the caller in this case to make sure that the lists aren't "ragged"). Then this matrix can be converted to upper triangular form automatically as shown in listing 1:

def upper_triang(m):
  matrix_print(m)
  for i in range(len(m)):
    top_row = m[i]
    for j in range(i + 1, len(m)):
      reduce_row = m[j]
      reduce_factor = reduce_row[i] / top_row[i] if top_row[i] != 0 else 0
      for k in range(i, len(reduce_row)):
        reduce_row[k] -= top_row[k] * reduce_factor

  return m

Listing 1: Upper triangularization of a matrix

Here, i represents the index of the row that's being scaled and subtracted from each subsequent row, and j is the index of the row that it's being subtracted from: k iterates over the coefficients of row j. Feeding the matrix from example 9 above into this function with a matrix_print helper function as show in listing 2:

def matrix_print(m):
  for row in m:
    print row

matrix_print(upper_triang([[-3.0, 2.0, -6.0, 6.0], 
  [5.0, 7.0, -5.0, 6.0], 
  [1.0, 4.0, -2.0, 8.0]]))

Listing 2: Upper triangularization example

Results in the matrix shown in example 22:

[-3.0, 2.0, -6.0, 6.0]
[0.0, 10.333333333333334, -15.0, 16.0]
[0.0, 0.0, 2.774193548387097, 2.774193548387097]

Example 22: Upper triangular result

(Compare this with example 16). You can then convert this to row-echelon form as shown in listing 3:

def row_echelon(m):
  for row in m:
    scale_factor = 0
    for k in range(len(row)):
      if row[k] != 0 and scale_factor == 0:
        scale_factor = row[k]
      if scale_factor != 0:
        row[k] /= scale_factor

  return m

Listing 3: Row Echelon code

This will result in the matrix shown in example 23:

[1.0, -0.6666666666666666, 2.0, -2.0]
[0.0, 1.0, -1.4516129032258063, 1.5483870967741935]
[0.0, 0.0, 1.0, 1.0]

Example 23: Row echelon result

(Again, compare with example 19). Finally, eliminating backwards as shown in listing 4:

def reduced_re(m):
  for i in reversed(range(len(m))):
    bottom_row = m[i]
    for j in reversed(range(i)):
      reduce_row = m[j]
      reduce_factor = reduce_row[i] / bottom_row[i] if bottom_row[i] != 0 else 0
      for k in range(i, len(reduce_row)):
        reduce_row[k] -= bottom_row[k] * reduce_factor

  return m

Listing 4: Reduced Row-Echelon format

This is more or less the same as listing 1, with reversed ranges instead of "forward" ranges. Notice that I still have to iterate forward with index k. Of course, you could combine these three functions together into one linear equation solver function, but I think this breakdown is a little easier to follow.

You have to watch out for some edge cases here, though. For example, the matrix:

[[1.0,1.0,1.0,3.0],[1.0,-1.0,2.0,2.0],[2.0,0.0,3.0,1.0]]

Example 24: No solutions

yields:

[1.0, 1.0, 1.0, 3.0]
[0.0, 1.0, -0.5, 0.5]
[0.0, 0.0, 0.0, 1.0]

Example 25: contradiction

The bottom row indicates there's a problem - in fact, the matrix itself is unsolvable, and the contradiction here where 0.0 = 1.0 is the tip-off. On the other hand, the matrix:

[[1.0,1.0,1.0,3.0],[1.0,-1.0,2.0,2.0],[2.0,0.0,3.0,5.0]] 

Example 26: underdetermined

is not linearly independent (the third row is the combination of the first two), so it has multiple solutions. The solver presented here yields:

[1.0, 0.0, 1.5, 2.5]
[0.0, 1.0, -0.5, 0.5]
[0.0, 0.0, 0.0, 0.0]

Example 27: All zeros

Here, the bottom row of all zeros is a tip-off that the matrix represents an underdetermined system of equations; that is, one with multiple solutions.

Also, order matters. Consider what happens with the input in example 28:

matrix_print(reduced_re(row_echelon(upper_triang([[0.0, 0.0, 2.774193548387097, 2.774193548387097],
  [0.0, 10.333333333333334, -15.0, 16.0],
  [-3.0, 2.0, -6.0, 6.0]]))))
[0.0, 0.0, 0.0, 1.9375]
[0.0, 1.0, 0.0, 0.18750000000000022]
[1.0, -0.0, 1.032258064516129, -0.967741935483871]

Example 28: upside-down matrix

This is definitely wrong - this is the upper-triangular output from example 22, in reverse order. It should yield the same result as example 22 did. The problem is the zeros in the first entry: you can't scale 0 by anything to do a reduction here. In general, this algorithm will fail if m[i][i] = 0 for any i in len(m) — at any point in the algorithm (remember, a prior transformation can "invalidate" a row this way). The fix is to dynamically sort the rows as the algorithm proceeds; convert upper_triang as shown in listing 5.

def upper_triang(m):
  matrix_print(m)
  for i in range(len(m)):
    candidate = i + 1
    while m[i][i] == 0 and candidate < len(m):
      (m[i],m[candidate]) = (m[candidate],m[i])
      candidate += 1
    top_row = m[i]

Listing 5: sort the array while converting it

Finally, the gaussian elimination algorithm itself is vulnerable to numerical instability because of the use of floating-point operations. You can mitigate this by keeping everything in rational form until the very end, or by applying the softmax function to "smooth out" the coefficients. Rather than coding all of this by hand, the standard library for this type of problems is numpy:

>>> import numpy as np
>>> a = np.array([[-3,2,-6],[5,7,-5],[1,4,-2]])
>>> b = np.array([6,6,8])
>>> np.linalg.solve(a,b)
array([-2.,  3.,  1.])

Listing 6: numpy and scipy solution

I'll close out with a neat trick outlined in Mathematics for Machine Learning — using gaussian elimination to invert a matrix. Given the input matrix:

[1020110012011111]

Example 29: Matrix to invert

If you extend it with an identity matrix on the right:

[10201000110001001201001011110001]

Example 30: Append an identity matrix

And feed it into the gaussian elimination routine:

matrix_print(reduced_re(row_echelon(upper_triang([[1.0,0.0,2.0,0.0,1.0,0.0,0.0,0.0],
  [1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0],
  [1.0,2.0,0.0,1.0,0.0,0.0,1.0,0.0],
  [1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0]]))))

[1.0, 0.0, 0.0, 0.0, -1.0, 2.0, -2.0, 2.0]
[0.0, 1.0, 0.0, 0.0, 1.0, -1.0, 2.0, -2.0]
[0.0, 0.0, 1.0, 0.0, 1.0, -1.0, 1.0, -1.0]
[0.0, 0.0, 0.0, 1.0, -1.0, 0.0, -1.0, 2.0]

Example 31: Inverted matrix solution

The right-hand "side" of the result is the matrix inverse.

Powered by MathJax

Add a comment:

Completely off-topic or spam comments will be removed at the discretion of the moderator.

You may preserve formatting (e.g. a code sample) by indenting with four spaces preceding the formatted line(s)

Name: Name is required
Email (will not be displayed publicly):
Comment:
Comment is required
DigitalHermit, 2020-01-23
I enjoyed this article. There is a Python to Octave library (I haven't used though) which I'm assuming would allow something like this:

>> A=[-3, 2, -6; 5, 7, -5; 1, 4, -2]
A =

  -3   2  -6
   5   7  -5
   1   4  -2

>> B=[6; 6; 8]
B =

   6
   6
   8

>> A\\B
ans =

  -2.0000
   3.0000
   1.0000


The Octave '\\' operator is a shorthand for an inverse and division (inv(A)*B)).

Thanks for writing this article. It's been years since college, but I wish these approaches had been part of my coursework.
Josh, 2020-02-14
I'll definitely have to check that out, thanks! I did spend some time using Octave after grad school since I didn't have a licensed copy of Matlab any more. I moved on to Python like most, but I never thought of combining the two.
My Book

I'm the author of the book "Implementing SSL/TLS Using Cryptography and PKI". Like the title says, this is a from-the-ground-up examination of the SSL protocol that provides security, integrity and privacy to most application-level internet protocols, most notably HTTP. I include the source code to a complete working SSL implementation, including the most popular cryptographic algorithms (DES, 3DES, RC4, AES, RSA, DSA, Diffie-Hellman, HMAC, MD5, SHA-1, SHA-256, and ECC), and show how they all fit together to provide transport-layer security.

My Picture

Joshua Davies

Past Posts