Towards an Initial Understanding of Linear Programming


Links and notes regarding explanations of linear programming in general, the simplex algorithm and the dual simplex algorithm.
 
Robin Whittle  rw@firstpr.com.au  2016-09-25

Brief description of Linear Programming, why it is useful for many problems, and how it compares to other optimisation techniques which can handle more complex non-linear relationships

Linear Programming is a mathematical optimisation technique with many important practical applications.  An LP solver finds a set of values for a large number of variables in a way which maximises (or minimises) the value of a single outcome which depends upon the values of all these variables.  The dependencies must be linear, such as the profitability of a manufacturing operation going up by $4 a kilogram of one ingredient and down by $2 a kilogram of some other ingredient (two variables).  The most complex part of LP is the constraints which limit the values of these variables in potentially very complex and inter-dependent ways.  For instance, the value of variable A multiplied by 2.7 plus the value of variable C multiplied by -3.457 must equal 64.  Constraints can involve equality, greater than or less than relations.  Each variable can have bounds, such as minimum and maximum values, which are really just simple constraints.

Linear programming only works with some kinds of problem.  Other kinds, which involve non-linear relationships between the values of the variables and the outcome are of great interest since they are directly applicable to a greater range of real-world problems.  However, the computational techniques for solving these other kinds of problems (such as mixed integer programming, in which variables can only have integer values) are much less able to solve really large problems in any reasonable time.  A primary attraction of linear programming is that extraordinarily large and complex LP problems can be solved by computers in reasonable times - seconds to hours or perhaps days, rather than weeks or millenia.

Introduction to this page

Many people without a strong mathematical background - myself included - need to know what Linear Program can and can't do, and have some insights into how an LP solver works.  However, the detailed theoretical basis of LP in general, and the inner workings of LP solvers are fully understood only by those who devote a large part of their intensely mathematical minds to this arcane field.  I think these people are generally not inclined or able to write and illustrate an introduction to LP for people (such as me) who initially know nothing of the field, and who are not trained in the relevant branches of mathematics.

For example, the Wikipedia page uses two diagrams and launches quickly into algebra and terms such as convex polytope, affine function, vector, matrix etc.  Interested folks in business, engineering or science who are wishing to maximise profits or minimise average transit time in a freight delivery network don't necessarily know where to start with these mathematical descriptions.


As part of my C++ programming work for the mining consulting company Whittle Consulting, building on the work of my father Jeff http://www.jeff.whittlefamily.id.au I need to understand linear programming from the point of view of a programmer who uses a linear programming solver - specifically Cplex and CLP.  This does not mean I need to know how to write an LP solver - but I need to know quite a lot about how they work, since our software makes very intensive use of linear programming.

I need to know the fundamentals of linear programming, and have some idea of how the simplex algorithm works - for instance with CLP using the dual simplex algorithm.

There are plenty of books and websites about linear programming, and the most detailed material is necessarily written by and for people with extensive mathematical training. 

With rare exception, I think the most advanced specialists in any field lack the motivation and/or the understanding of what other people don't know to give an introduction to their field which is elegant and suitable for newcomers.  Generally, I think the best explanations come from a suitably motivated outsider to the field, with good writing - and probably illustration - skills, who fully appreciates what the reader probably does not know, and crafts an elegant explanatory path with examples.

The purpose of this page is to highlight web sites and books which I think are unusually helpful regarding understanding at least the basics of linear programming, for people who have only basic mathematical training - and not necessarily any computer programming experience.  I was inspired to do this by James Jones' explanation of LP and the simplex method - after I discovered that no websites apparently link to this helpful material.

Below I also have some notes on CLP.

Contents

#01jj 01 James Jones' Linear Programming explanation
#02psds 02 Primal simplex and dual simplex
#03clp 03 CLP

#01jj

01 - James Jones' Linear Programming explanation

https://people.richland.edu/james/ictcm/2006/

I found this page and the pages it links to - and some of the videos linked to from those pages - to be a really clear and well-illustrated explanation of the fundamental concepts of linear programming.   Give me well-written sentences and diagrams any day in preference to equations with Greek and other characters, if these can be avoided! 

Professor Jones leads us through a graphic representation of a maximisation LP problem, and then to a tabular format for understanding and solving a two-variable LP problem with pen and paper, without the need to create or think in terms of diagrams. 

Diagrams are fine for LP problems involving two variables.  They are challenging for problems involving three variables, and generally impossible for depicting or helping us think about four or more variables.  LP solvers regularly deal with problems involving half a million variables and a similar number of constraints, but two variable examples are good enough for understanding LP principles.

He then extends the tabular concepts into an explanation of the simplex method - an algorithm which works with any number of variables and which is one of the algorithms used by LP solver software.

Starting with the above page, I printed, read and annotated these pages as well, which are linked to from the above page:

https://people.richland.edu/james/ictcm/2006/geometric.html
https://people.richland.edu/james/ictcm/2006/cornerpoints.html
https://people.richland.edu/james/ictcm/2006/slopeobjective.html
https://people.richland.edu/james/ictcm/2006/table.html
https://people.richland.edu/james/ictcm/2006/simplex.html

I had some difficulty understanding the pivot mechanism and Prof. Jones pointed me to another page which helps, but which is not linked to directly from the above:

https://people.richland.edu/james/misc/pivot.pdf

In the simplex.html page, I could not understand the transformation of all the numbers from the tableau before the heading "Pivot!" to the numbers in the tableau following this heading.  Reading pivot.pdf, I hit the same problem with the numbers in the top right tableau.

The key to understanding this is at the bottom left of pivot.pdf, "How to perform the actual pivot".  The tableau in this section illustrates the procedure for setting the new value of the element which is in column z and row 2.  This is initially -1 (this tableau has the same numbers as the starting tableau at the top left).  The procedure is to take the product of the top left element in the rectangle formed between the pivot element and the element to be changed (row 1, column x = 1) and the value of the element to be changed (-1) and to subtract from this the product of the values at the other two corners of the rectangle (-2 and 3).  The red text under "New values for Row 2" explains this.

After this element has been changed, and the one for row 2 column x, then the same game is played for the RHS column, which was initially 5.  The same game is also played on the sum element for row 2 (the far right element, equal initially to 5), though I think it could also have its new value determined simply by summing the updated values to the left of it.

I understand that this is the use of "row operations" in Gauss-Jordan reduction, and I am well out of my comfort zone and into the matrix algebra wilds beyond.  My broad and tentative understanding of these row operations and the pivot operation as a whole, when used as part of the simplex method of solving an LP problem, is that it is a way of maintaining the integrity of all the information in the LP when we require that all the elements but one in the pivot column be converted into zeros, with the remaining element converted to 1.  I understand the pivot operation as a way of transforming the representation of the LP problem from one tableau (one set of numbers in this matrix) into another, which corresponds exactly to moving our potential solution point from one intersection of two lines (for a two-variable LP problem, as illustrated here) to another intersection, and that the choice of pivot column and row, when made correctly, corresponds to moving the potential solution point to the most immediately profitable such point (in a maximisation LP).

If I really wanted to understand exactly how these "row operations" worked, I could probably search some more pages and figure it out, but it seems that depicting an understanding of why and how it works is non-trivial.  This is one such potential explanation, which avoids algebra and concentrates on verbal/textual explanations and graphics:

http://www.maa.org/press/periodicals/loci/joma/hinges-an-illustration-of-gauss-jordan-reduction-introduction

In this case, it involves 3D animations.  With more effort and an understanding of reduced row-echelon form (https://en.wikipedia.org/wiki/Row_echelon_form) I imagine I could understand it.

So for now, the exact principles the pivot operation are beyond my understanding, but I have a general idea of what they do and I understand they are not contentious.  (This means I have no reason to suspect they are invalid, which is not something I always think about scientific theories I don't clearly understand.)

In simplex.html, in the section before the heading "Repeating the Process", we have navigated our potential solution point from (x1 = 0, x2 = 0) to a point on the right edge of the feasible region, and the algorithm for choosing which point to move to was explained.  While the diagrams are vital to understanding the process, they are not needed for implementing the choice of pivot and the pivot itself.  In the fully transformed tableau before this heading, we changed slack variable s3 from being basic (important) to being non-basic: in this case to being set to 0.  The statement to this effect ". . . not cleared out, so they are non-basic" only made sense to me in the context of the diagram in that the new potential solution point (this is my term - we are moving it around until we can't find a point in the x1 and x2 dimensions with the highest possible objective function value) is at the intersection of the line at which variable x2 = 0 and the line at which s3 = 0.

By setting these two variables to 0, they cannot contribute to the three constraints (rows 1, 2 and 3).  Therefore, by setting them to zero and observing that:
  1. The other three variables x1, s1 and s2 all have a single '1' in their column, each for a different row, and that:

  2. Each of these constraint rows is an equation.
we can see that at the new location of the potential solution point, each of these three variables must be equal to a particular item on the RHS of those rows:

x1 (row 3) = 8
s1 (row 1) = 1
s2 (row 2) = 8

What was not clear to me was that although we had transformed the original objective function (-40 for x1 and -30 for x2, plus zeros for the three slack variables, as per the first tableau in simplex.html) into something else (0, -10/3, 0, 0 and 43/3), in order to find the value of the objective function at our new potential solution point, we need to plug the above three values (x1 = 8, s1 = 1 and s2 = 8) into the original objective function, for which only x1 = 8 is relevant, since it is the only one of these three variables with a non-zero coefficient.  Its coefficient is 40, so the value at our new potential solution point is 8 * 40 = 320.

I glanced at the example with three decision variables: https://people.richland.edu/james/ictcm/2006/3dsimplex.html but was happy with the understanding in principle which I gained from the previous material, which works with two decision variables, and so is very amenable to graphics in a two-dimensional plane.


This explanation does not go into the distinction between primal simplex and dual simplex algorithms, or mention any of the other algorithms for solving LP problems (such as "barrier" AKA "interior point").

However, it gave me a good enough understanding of the simplex algorithm for my current needs, and I think it would be difficult for any explanation to achieve so much with fewer words and diagrams.

#02psds

02 - Primal simplex and dual simplex

The material from James Jones satisfied my needs regarding the simplex algorithm AKA "primal simplex".

"Dual" in this context means an alternative LP problem which is mathematically derived from the original (primal) LP problem (maximisation in this explanation) in such a way that, solving the dual problem produces the same results as solving the primal problem, by way of this dual problem solution providing an upper bound to the value of the objective function in the feasible range of the primal problem.  If this is done properly, then the bound is a precise match, so it is not necessary to solve the primal problem.   I don't yet understand the motivations for preferring to generate and solve the dual version, but I understand they are powerful enough to make this dual simplex algorithm a widely favored approach to solving LP problems.

However, it seems that "dual simplex" does not mean generating a dual version of the problem and then solving it with the simplex method.  More below . . .

The transformation from a simple maximisation LP problem into its dual form is explained in the first two pages of a chapter from book, which is available here (Preview Chapter 5: Duality Theory:

Linear Programming Foundations and Extensions 4th Ed.
Robert J Vanderbei (Wikipedia page) Springer 2014
http://www.springer.com/gp/book/9781461476290

The choice of multiplying the two constraints by 2 and 3, in the initial part of this explanation, is partially arbitrary.  The 3 is a minimum value and looks to me like a good choice, due to the need to multiply up the coefficient of 1 for x3 to at least equal the coefficient of 3 for this variable in the objective function.  The coefficient for this variable in the first constraint is 0.  The choice of 2 seems to be higher than is really required, so the upper bound produced by these choices is sure to exceed the true optimised value of the objective function for the primal (original) LP problem.

To make a dual version of the problem, which is a minimisation problem, rather than using arbitrary choices such as 2 and 3, we create a set of variables, one for each constraint in the primal problem, and then a set constraints, one for each decision variable in the primal problem. 

I found a 2001 edition of this book at Archive.org:

https://archive.org/details/springer_10.1007-978-1-4757-5662-3

and had a quick look at the rest of Chapter 5.  The values (of the objective function) for various solutions to the primal problem (one or more of which are optimal) forms a set of possible values which are less than or equal to the set of upper bounds for these, which result from the set of solutions to the dual problem (one or more of which may be optimal).  The optimal solutions for each problem produce the same value (where the two sets touch).  So if we can find a feasible solution for one problem (we don't yet know it to be optimal) which is the same (has the same values for all decision variables, and so the same value for the objective function) a feasible solution to the other problem, then we know that both of them are optimal solutions.  In other words, the set of primal feasible solutions are in the primal feasible area, and this area touches at one point (a single vertex) or perhaps along a line between two vertexes in these diagrams, the feasible area of the dual problem, which is otherwise outside the feasible region of the primal problem.  That point (or line) of touching is where the one or more optimal solutions are found for both problems.

According to section 6 of this Chapter 5, the "Dual Simplex Method" means "simply as a new way of picking the entering and leaving variables in a sequence of primal dictionaries".  Using the terminology of James Jones' explanation, I understand this means "simply a new way of choosing how to pivot using a series of tableaux derived from the original (primal) LP problem".  So I understand the dual simplex algorithm does a series of operations on the tableaux of the primal problem, to achieve much the same as would be achieved by a conventional simplex solution of the actual, and therefore specifically formulated, dual version of the problem. 

This chapter describes a two phase approach to implementing a dual simplex LP solver. 

It is interesting to note how we detect infeasibility with this new Phase I algorithm.  The modified [dual] problem is guaranteed always to be dual feasible.  It is easy to see that the primal problem is infeasible if and only if the modified problem is dual unbounded (which the dual simplex method will detect just as the primal simplex method detects primal unboundedness).

CLP's dual simplex algorithm is apparently single phase.  From the help-text of CLP, the source code of which is, at least in part, here:

https://projects.coin-or.org/Clp/browser/trunk/Clp/src/CbcOrClpParam.cpp

The dual algorithm in Clp is a single phase algorithm as opposed to a two phase algorithm where you first get feasible then optimal.  If a problem has both upper and lower bounds then it is trivial to get dual feasible by setting non basic variables to correct bound.  If the gap between the upper and lower bounds of a variable is more than the value of dualBound Clp introduces fake bounds so that it can make the problem dual feasible.  This has the same effect as a composite objective function in the primal algorithm. 

The first 200 lines of this CLP source file gives more details of CLP's dual simplex algorithm:

https://projects.coin-or.org/Clp/browser/branches/devel-1/ClpSimplexDual.cpp

I only partially understand this at present.  I wouldn't have been able to be understand any of it without the clear understanding of simplex which I gained from James Jones' explanation.  My impression from reading LP material in general, in text books and websites, is that it would have taken me very much longer to understand the simplex algorithm without his explanation.


#03clp

03 - CLP

CLP is an open-source LP solver, written in C++, which began as an IBM internal project.  Of the various open-source LP solvers, it has a good reputation for performance.  A valuable review of open source LP solvers is:

Comparison of Open-Source Linear Programming Solvers
Jared L. Gearhart et al.
Sandia Report SAND2103-8847 October 2013
http://prod.sandia.gov/techlib/access-control.cgi/2013/138847.pdf
 
There are two home pages, the second which does not link to the first:

http://www.coin-or.org/projects/Clp.xml
https://projects.coin-or.org/Clp

The main documentation is at:

http://www.coin-or.org/Clp/userguide/index.html

If odd characters appear, this is due to the HTML file using Western character encoding without specifying this in the HTML file, and the web server now - differently from when the document was written - being configured to send the document in an HTTP envelope (the protocol which delivers the HTML file to the browser) specifying the default character coding of the server's operating system, which is now UTF-8.  To fix this in the browser, use something like View > Text encoding > Western. 

The CLP mailing list is at:

http://list.coin-or.org/mailman/listinfo/clp

In December 2015 I wrote to the list pointing to a single file Mbox format copy of the mailing list messages from 2002 to 2015-12-01.  Since many questions have been answered on the list, it is handy to be able to search the whole set of messages as a text file, or via an email client such as Thunderbird, which interprets the file as individual emails. 

Here is another file with messages from 2002 to 2016-09-25:  My 2015-12-01 message explains how I filled my Thunderbird mailbox with messages from the gz archives.   With Thunderbird on Windows, if you copy the uncompressed file, renamed to "CLP" to somewhere such as: C:\Users\XYZ\AppData\Roaming\Thunderbird\Profiles\wwxxyyzz.default\Mail\Local Folders\ then you should be able to read the messages individually.  Alternatively you can rename it as a CLP.txt file and open it with an editor, or web browser.  (Google Chrome's text search and scroll-bar highlighting of matches is superior to that of Firefox, but Chrome has no File > Open command, so use file:///D:/CLP.txt in the address bar.)

CLP-mailing-list-2002-to-2016-09-25.7z

CLP's source code and classes are searchable at:

http://www.coin-or.org/Doxygen/Clp/index.html

One body of text which I found helpful is the help text which can be displayed on the console of the standalone CLP executable program.  This text is not available anywhere I know of as a document.  I collected all the help text I think the program can produce and then edited it into a formatted Word file.  This is for CLP version 1.16.9, which is not the current version, but you may still find it useful.  It helped me understand some settings of the solver (in the library).  Note that the standalone program has perturb(ation) on by default while for the library solver, it is off by default.

CLP-1-16-9-standalone-help-text.docx
CLP-1-16-9-standalone-help-text.pdf


I think that even the most experienced programmers who are already fully knowledgeable about linear programming would find it a challenge to find out what they need to know from the CLP documentation and source code.

One thing which confused me was the term "ABC", as if it were a project, or a variant of CLP which could be enabled with appropriate pre-processor macros.  This term is all through the CLP project and I could not understand what it meant.  Eventually, I decided I could safely ignore anything concerning "ABC" after reading this message from John Forrest (the originator of CLP).  I understand that ABC is a variant of CLP with some operations done in parallel, on multiple cores of the CPU(s).

http://list.coin-or.org/pipermail/clp/2014-November/001516.html

If you have any questions about CLP, don't ask me!  Please join and write to the mailing list instead.  You will most likely have your question answered within a day or two by experts - and this way many other people who read the list now and in the future will benefit from your query.

As John Forrest wrote in the above-mentioned message:

If you need more information, do not hesitate to ask awkward questions.


Update history


../ to the main First Principles site.

.