Quadratic Programming Algorithms

Quadratic Programming Definition

Quadratic programming is the problem of finding a vectorxthat minimizes a quadratic function, possibly subject to linear constraints:

\min_{x} \frac{1}{2} x^{T} H x + c^{T} x

(1)

such thatA·x≤b,Aeq·x=beq,l≤x≤u.

`interior-point-convexquadprog`Algorithm

Theinterior-point-convexalgorithm performs the following steps:

Note

The algorithm has two code paths. It takes one when the Hessian matrixHis an ordinary (full) matrix of doubles, and it takes the other whenHis a sparse matrix. For details of the sparse data type, seeSparse Matrices. Generally, the algorithm is faster for large problems that have relatively few nonzero terms when you specifyHassparse. Similarly, the algorithm is faster for small or relatively dense problems when you specifyHasfull.

Presolve/Postsolve

The algorithm first tries to simplify the problem by removing redundancies and simplifying constraints. The tasks performed during the presolve step can include the following:

Check if any variables have equal upper and lower bounds. If so, check for feasibility, and then fix and remove the variables.
Check if any linear inequality constraint involves only one variable. If so, check for feasibility, and then change the linear constraint to a bound.
Check if any linear equality constraint involves only one variable. If so, check for feasibility, and then fix and remove the variable.
Check if any linear constraint matrix has zero rows. If so, check for feasibility, and then delete the rows.
Determine if the bounds and linear constraints are consistent.
Check if any variables appear only as linear terms in the objective function and do not appear in any linear constraint. If so, check for feasibility and boundedness, and then fix the variables at their appropriate bounds.
Change any linear inequality constraints to linear equality constraints by adding slack variables.

If the algorithm detects an infeasible or unbounded problem, it halts and issues an appropriate exit message.

The algorithm might arrive at a single feasible point, which represents the solution.

If the algorithm does not detect an infeasible or unbounded problem in the presolve step, and if the presolve has not produced the solution, the algorithm continues to its next steps. After reaching a stopping criterion, the algorithm reconstructs the original problem, undoing any presolve transformations. This final step is the postsolve step.

For details, see Gould and Toint[63].

Generate Initial Point

The initial pointx0for the algorithm is:

Initializex0toones(n,1), wherenis the number of rows inH.
For components that have both an upper bounduband a lower boundlb, if a component ofx0is not strictly inside the bounds, the component is set to(ub + lb)/2.
For components that have only one bound, modify the component if necessary to lie strictly inside the bound.
Take a predictor step (seePredictor-Corrector), with minor corrections for feasibility, not a full predictor-corrector step. This places the initial point closer to thecentral pathwithout entailing the overhead of a full predictor-corrector step. For details of the central path, see Nocedal and Wright[7], page 397.

Predictor-Corrector

The sparse and full interior-point-convex algorithms differ mainly in the predictor-corrector phase. The algorithms are similar, but differ in some details. For the basic algorithm description, see Mehrotra[47].

算法首先将线性不平等的ities Ax <= b into inequalities of the form Ax >= b by multiplying A and b by -1. This has no bearing on the solution, but makes the problem of the same form found in some literature.

Sparse Predictor-Corrector.Similar to thefminconinterior-point algorithm, the sparseinterior-point-convexalgorithm tries to find a point where theKarush-Kuhn-Tucker (KKT)conditions hold. For the quadratic programming problem described inQuadratic Programming Definition, these conditions are:

$\begin{matrix} H x + c - A_{e q}^{T} y - {\bar{A}}^{T} z = 0 \\ \bar{A} x - \bar{b} - s = 0 \\ A_{e q} x - b_{e q} = 0 \\ s_{i} z_{i} = 0, i = 1, 2, ..., m \\ s \geq 0 \\ z \geq 0. \end{matrix}$

Here

$\bar{A}$ is the extended linear inequality matrix that includes bounds written as linear inequalities. $\bar{b}$ is the corresponding linear inequality vector, including bounds.
sis the vector of slacks that convert inequality constraints to equalities.shas lengthm, the number of linear inequalities and bounds.
zis the vector of Lagrange multipliers corresponding tos.
yis the vector of Lagrange multipliers associated with the equality constraints.

该算法从新手预测的第一步n-Raphson formula, then computes a corrector step. The corrector attempts to better enforce the nonlinear constraints_iz_i= 0.

Definitions for the predictor step:

r_d, the dual residual:

$r_{d} = H x + c - A_{e q}^{T} y - {\bar{A}}^{T} z .$
r_eq, the primal equality constraint residual:

$r_{e q} = A_{e q} x - b_{e q} .$
r_ineq, the primal inequality constraint residual, which includes bounds and slacks:

$r_{i n e q} = \bar{A} x - \bar{b} - s .$
r_sz, the complementarity residual:

r_sz=Sz.

Sis the diagonal matrix of slack terms,zis the column matrix of Lagrange multipliers.
r_c, the average complementarity:

$r_{c} = \frac{s^{T} z}{m} .$

In a Newton step, the changes inx,s,y, andz, are given by:

(\begin{matrix} H & 0 & - A_{e q}^{T} & - {\bar{A}}^{T} \\ A_{e q} & 0 & 0 & 0 \\ \bar{A} & - I & 0 & 0 \\ 0 & Z & 0 & S \end{matrix}) (\begin{matrix} Δ x \\ Δ s \\ Δ y \\ Δ z \end{matrix}) = - (\begin{matrix} r_{d} \\ r_{e q} \\ r_{i n e q} \\ r_{s z} \end{matrix}) .

(2)

However, a full Newton step might be infeasible, because of the positivity constraints onsandz. Therefore,quadprogshortens the step, if necessary, to maintain positivity.

Additionally, to maintain a “centered” position in the interior, instead of trying to solves_iz_i= 0, the algorithm takes a positive parameterσ, and tries to solve

s_iz_i=σr_c.

quadprogreplacesr_szin the Newton step equation withr_sz+ ΔsΔz–σr_c1, where1is the vector of ones. Also,quadprogreorders the Newton equations to obtain a symmetric, more numerically stable system for the predictor step calculation.

After calculating the corrected Newton step, the algorithm performs more calculations to get both a longer current step, and to prepare for better subsequent steps. These multiple correction calculations can improve both performance and robustness. For details, see Gondzio[4].

Full Predictor-Corrector.The full predictor-corrector algorithm does not combine bounds into linear constraints, so it has another set of slack variables corresponding to the bounds. The algorithm shifts lower bounds to zero. And, if there is only one bound on a variable, the algorithm turns it into a lower bound of zero, by negating the inequality of an upper bound.

$\bar{A}$ is the extended linear matrix that includes both linear inequalities and linear equalities. $\bar{b}$ is the corresponding linear equality vector. $\bar{A}$ also includes terms for extending the vectorxwith slack variablessthat turn inequality constraints to equality constraints:

$\bar{A} x = (\begin{matrix} A_{e q} & 0 \\ A & I \end{matrix}) (\begin{matrix} x_{0} \\ s \end{matrix}),$

wherex₀means the originalxvector.

The KKT conditions are

\begin{matrix} H x + c - {\bar{A}}^{T} y - v + w = 0 \\ \bar{A} x = \bar{b} \\ x + t = u \\ v_{i} x_{i} = 0, i = 1, 2, ..., m \\ w_{i} t_{i} = 0, i = 1, 2, ..., n \\ x, v, w, t \geq 0. \end{matrix}

(3)

To find the solutionx, slack variables and dual variables toEquation 3, the algorithm basically considers a Newton-Raphson step:

(\begin{matrix} H & - {\bar{A}}^{T} & 0 & - I & I \\ \bar{A} & 0 & 0 & 0 & 0 \\ - I & 0 & - I & 0 & 0 \\ V & 0 & 0 & X & 0 \\ 0 & 0 & W & 0 & T \end{matrix}) (\begin{matrix} Δ x \\ Δ y \\ Δ t \\ \begin{array}{l} Δ v \\ Δ w \end{array} \end{matrix}) = - (\begin{matrix} H x + c - {\bar{A}}^{T} y - v + w \\ \bar{A} x - \bar{b} \\ u - x - t \\ \begin{array}{l} V X \\ W T \end{array} \end{matrix}) = - (\begin{matrix} r_{d} \\ r_{p} \\ r_{u b} \\ \begin{array}{l} r_{v x} \\ r_{w t} \end{array} \end{matrix}),

(4)

whereX,V,W, andTare diagonal matrices corresponding to the vectorsx,v,w, andtrespectively. The residual vectors on the far right side of the equation are:

r_d, the dual residual
r_p, the primal residual
r_ub, the upper bound residual
r_vx, the lower bound complementarity residual
r_wt, the upper bound complementarity residual

The algorithm solvesEquation 4by first converting it to the symmetric matrix form

(\begin{matrix} - D & {\bar{A}}^{T} \\ \bar{A} & 0 \end{matrix}) (\begin{matrix} Δ x \\ Δ y \end{matrix}) = - (\begin{matrix} R \\ r_{p} \end{matrix}),

(5)

where

$\begin{array}{l} D = H + X^{- 1} V + T^{- 1} W \\ R = - r_{d} - X^{- 1} r_{v x} + T^{- 1} r_{w t} + T^{- 1} W r_{u b} . \end{array}$

All the matrix inverses in the definitions ofDandRare simple to compute because the matrices are diagonal.

To deriveEquation 5fromEquation 4, notice that the second row ofEquation 5is the same as the second matrix row ofEquation 4. The first row ofEquation 5comes from solving the last two rows ofEquation 4for Δvand Δw, and then solving for Δt.

To solveEquation 5, the algorithm follows the essential elements of Altman and Gondzio[1]. The algorithm solves the symmetric system by an LDL decomposition. As pointed out by authors such as Vanderbei and Carpenter[2], this decomposition is numerically stable without any pivoting, so can be fast.

The fullquadprogpredictor-corrector algorithm is largely the same as that in thelinprog'interior-point'algorithm, but includes quadratic terms as well. SeePredictor-Corrector.

References

[1] Altman, Anna and J. Gondzio.Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization. Optimization Methods and Software, 1999.Available for download here.

[2] Vanderbei, R. J. and T. J. Carpenter.Symmetric indefinite systems for interior point methods. Mathematical Programming 58, 1993. pp. 1–32.Available for download here.

Stopping Conditions

The predictor-corrector algorithm iterates until it reaches a point that is feasible (satisfies the constraints to within tolerances) and where the relative step sizes are small. Specifically, define

$ρ = \max (1, ‖ H ‖, ‖ \bar{A} ‖, ‖ A_{e q} ‖, ‖ c ‖, ‖ \bar{b} ‖, ‖ b_{e q} ‖)$

The algorithm stops when all of these conditions are satisfied:

$\begin{matrix} {‖ r_{p} ‖}_{1} + {‖ r_{u b} ‖}_{1} \leq ρ TolCon \\ {‖ r_{d} ‖}_{\infty} \leq ρ TolFun \\ r_{c} \leq TolFun, \end{matrix}$

where

$r_{c} = \max_{i} (\min (| x_{i} v_{i} |, | x_{i} |, | v_{i} |), \min (| t_{i} w_{i} |, | t_{i} |, | w_{i} |)) .$

r_cessentially measures the size of the complementarity residualsxvandtw, which are each vectors of zeros at a solution.

Infeasibility Detection

quadprogcalculates amerit functionφat every iteration. The merit function is a measure of feasibility.quadprogstops if the merit function grows too large. In this case,quadprogdeclares the problem to be infeasible.

The merit function is related to the KKT conditions for the problem—seePredictor-Corrector. Use the following definitions:

$\begin{matrix} ρ = \max (1, ‖ H ‖, ‖ \bar{A} ‖, ‖ A_{e q} ‖, ‖ c ‖, ‖ \bar{b} ‖, ‖ b_{e q} ‖) \\ r_{eq} = A_{eq} x - b_{eq} \\ r_{ineq} = \bar{A} x - \bar{b} - s \\ r_{d} = H x + c + A_{eq}^{T} λ_{eq} + {\bar{A}}^{T} {\bar{λ}}_{ineq} \\ g = \frac{1}{2} x^{T} H x + c^{T} x - {\bar{b}}^{T} {\bar{λ}}_{ineq} - b_{eq}^{T} λ_{eq} . \end{matrix}$

The notation $\bar{A}$ and $\bar{b}$ means the linear inequality coefficients, augmented with terms to represent bounds for the sparse algorithm. The notation ${\bar{λ}}_{ineq}$ similarly represents Lagrange multipliers for the linear inequality constraints, including bound constraints. This was calledzinPredictor-Corrector, and $λ_{eq}$ was calledy.

The merit functionφis

$\frac{1}{ρ} (\max ({‖ r_{eq} ‖}_{\infty}, {‖ r_{ineq} ‖}_{\infty}, {‖ r_{d} ‖}_{\infty}) + g) .$

If this merit function becomes too large,quadprogdeclares the problem to be infeasible and halts with exit flag-2.

`trust-region-reflectivequadprog`Algorithm

Many of the methods used in Optimization Toolbox™ solvers are based ontrust regions,a simple yet powerful concept in optimization.

To understand the trust-region approach to optimization, consider the unconstrained minimization problem, minimizef(x), where the function takes vector arguments and returns scalars. Suppose you are at a pointxinn-space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximatefwith a simpler functionq, which reasonably reflects the behavior of functionfin a neighborhoodNaround the pointx. This neighborhood is the trust region. A trial stepsis computed by minimizing (or approximately minimizing) overN. This is the trust-region subproblem,

\min_{s} {q (s), s \in N} .

(6)

The current point is updated to bex+siff(x+s) <f(x); otherwise, the current point remains unchanged andN, the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach to minimizingf(x) are how to choose and compute the approximationq(defined at the current pointx), how to choose and modify the trust regionN, and how accurately to solve the trust-region subproblem. This section focuses on the unconstrained problem. Later sections discuss additional complications due to the presence of constraints on the variables.

In the standard trust-region method ([48]), the quadratic approximationqis defined by the first two terms of the Taylor approximation toFatx; the neighborhoodNis usually spherical or ellipsoidal in shape. Mathematically the trust-region subproblem is typically stated

\min {\frac{1}{2} s^{T} H s + s^{T} g such that ‖ D s ‖ \leq Δ},

(7)

wheregis the gradient offat the current pointx,His the Hessian matrix (the symmetric matrix of second derivatives),Dis a diagonal scaling matrix, Δ is a positive scalar, and ‖ . ‖ is the 2-norm. Good algorithms exist for solvingEquation 7(see[48]); such algorithms typically involve the computation of all eigenvalues ofHand a Newton process applied to thesecular equation

$\frac{1}{Δ} - \frac{1}{‖ s ‖} = 0.$

Such algorithms provide an accurate solution toEquation 7. However, they require time proportional to several factorizations ofH. Therefore, for large-scale problems a different approach is needed. Several approximation and heuristic strategies, based onEquation 7, have been proposed in the literature ([42]and[50]). The approximation approach followed in Optimization Toolbox solvers is to restrict the trust-region subproblem to a two-dimensional subspaceS([39]and[42]). Once the subspaceShas been computed, the work to solveEquation 7is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace.

The two-dimensional subspaceSis determined with the aid of a preconditioned conjugate gradient process described below. The solver definesSas the linear space spanned bys₁ands₂, wheres₁is in the direction of the gradientg, ands₂is either an approximateNewton direction, i.e., a solution to

H \cdot s_{2} = - g,

(8)

or a direction of negative curvature,

s_{2}^{T} \cdot H \cdot s_{2} < 0.

(9)

The philosophy behind this choice ofSis to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.
SolveEquation 7to determine the trial steps.
Iff(x+s) <f(x), thenx=x+s.
Adjust Δ.

These four steps are repeated until convergence. The trust-region dimension Δ is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e.,f(x+s) ≥f(x). See[46]and[49]for a discussion of this aspect.

Optimization Toolbox solvers treat a few important special cases off特殊功能:非线性最小二乘s, quadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case. These special cases are discussed in later sections.

The subspace trust-region method is used to determine a search direction. However, instead of restricting the step to (possibly) one reflection step, as in the nonlinear minimization case, a piecewisereflective line search is conducted at each iteration. See[45]for details of the line search.

Preconditioned Conjugate Gradient Method

A popular way to solve large, symmetric, positive definite systems of linear equationsHp= –gis the method of Preconditioned Conjugate Gradients (PCG). This iterative approach requires the ability to calculate matrix-vector products of the formH·vwherevis an arbitrary vector. The symmetric positive definite matrixMis apreconditionerforH. That is,M=C², whereC^–1HC^–1is a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrixHis symmetric. However,His guaranteed to be positive definite only in the neighborhood of a strong minimizer. Algorithm PCG exits when it encounters a direction of negative (or zero) curvature, that is,d^THd≤ 0. The PCG output directionpis either a direction of negative curvature or an approximate solution to the Newton systemHp= –g. In either case,phelps to define the two-dimensional subspace used in the trust-region approach discussed inTrust-Region Methods for Nonlinear Minimization.

Linear Equality Constraints

Linear constraints complicate the situation described for unconstrained minimization. However, the underlying ideas described previously can be carried through in a clean and efficient way. The trust-region methods in Optimization Toolbox solvers generate strictly feasible iterates.

The general linear equality constrained minimization problem can be written

\min {f (x) such that A x = b},

(10)

whereAis anm-by-nmatrix (m≤n). Some Optimization Toolbox solvers preprocessAto remove strict linear dependencies using a technique based on the LU factorization ofA^T[46]. HereAis assumed to be of rankm.

The method used to solveEquation 10differs from the unconstrained approach in two significant ways. First, an initial feasible pointx₀is computed, using a sparse least-squares step, so thatAx₀=b. Second, Algorithm PCG is replaced with Reduced Preconditioned Conjugate Gradients (RPCG), see[46], in order to compute an approximate reduced Newton step (or a direction of negative curvature in the null space ofA). The key linear algebra step involves solving systems of the form

[\begin{matrix} C & {\tilde{A}}^{T} \\ \tilde{A} & 0 \end{matrix}] [\begin{matrix} s \\ t \end{matrix}] = [\begin{matrix} r \\ 0 \end{matrix}],

(11)

where $\tilde{A}$ approximatesA(small nonzeros ofAare set to zero provided rank is not lost) andCis a sparse symmetric positive-definite approximation toH, i.e.,C=H. See[46]for more details.

Box Constraints

The box constrained problem is of the form

\min {f (x) such that l \leq x \leq u},

(12)

wherelis a vector of lower bounds, anduis a vector of upper bounds. Some (or all) of the components oflcan be equal to –∞ and some (or all) of the components ofucan be equal to ∞. The method generates a sequence of strictly feasible points. Two techniques are used to maintain feasibility while achieving robust convergence behavior. First, a scaled modified Newton step replaces the unconstrained Newton step (to define the two-dimensional subspaceS). Second,reflections are used to increase the step size.

The scaled modified Newton step arises from examining the Kuhn-Tucker necessary conditions forEquation 12,

{(D (x))}^{- 2} g = 0,

(13)

where

$D (x) = diag ({| v_{k} |}^{- 1 / 2}),$

and the vectorv(x) is defined below, for each1 ≤i≤n:

Ifg_i< 0andu_i< ∞thenv_i=x_i–u_i
Ifg_i≥ 0andl_i> –∞thenv_i=x_i–l_i
Ifg_i< 0andu_i= ∞thenv_i= –1
Ifg_i≥ 0andl_i= –∞thenv_i= 1

The nonlinear systemEquation 13is not differentiable everywhere. Nondifferentiability occurs whenv_i= 0. You can avoid such points by maintaining strict feasibility, i.e., restrictingl<x<u.

The scaled modified Newton steps_kfor the nonlinear system of equations given byEquation 13is defined as the solution to the linear system

\hat{M} D s^{N} = - \hat{g}

(14)

at thekth iteration, where

\hat{g} = D^{- 1} g = diag ({| v |}^{1 / 2}) g,

(15)

and

\hat{M} = D^{- 1} H D^{- 1} + diag (g) J^{v} .

(16)

HereJ^vplays the role of the Jacobian of |v|. Each diagonal component of the diagonal matrixJ^vequals 0, –1, or 1. If all the components oflanduare finite,J^v= diag(sign(g)). At a point whereg_i= 0,v_imight not be differentiable. $J_{i i}^{v} = 0$ is defined at such a point. Nondifferentiability of this type is not a cause for concern because, for such a component, it is not significant which valuev_itakes. Further, |v_i| will still be discontinuous at this point, but the function|v_i|·g_iis continuous.

Second,reflections are used to increase the step size. A (single) reflection step is defined as follows. Given a steppthat intersects a bound constraint, consider the first bound constraint crossed byp; assume it is theith bound constraint (either theith upper orith lower bound). Then the reflection stepp^R=pexcept in theith component, wherep^R_i= –p_i.

`active-setquadprog`Algorithm

After completing a presolve step, theactive-setalgorithm proceeds in two phases.

Phase 1 — Obtain a feasible point with respect to all constraints.
阶段2 -迭代降低目标函数while maintaining a list of the active constraints and maintaining feasibility in each iteration.

Theactive-setstrategy (also known as a projection method) is similar to the strategy of Gill et al., described in[18]and[17].

Presolve Step

The algorithm first tries to simplify the problem by removing redundancies and simplifying constraints. The tasks performed during the presolve step can include the following:

Check if any variables have equal upper and lower bounds. If so, check for feasibility, and then fix and remove the variables.
Check if any linear inequality constraint involves only one variable. If so, check for feasibility, and then change the linear constraint to a bound.
Check if any linear equality constraint involves only one variable. If so, check for feasibility, and then fix and remove the variable.
Check if any linear constraint matrix has zero rows. If so, check for feasibility, and then delete the rows.
Determine if the bounds and linear constraints are consistent.
Check if any variables appear only as linear terms in the objective function and do not appear in any linear constraint. If so, check for feasibility and boundedness, and then fix the variables at their appropriate bounds.
Change any linear inequality constraints to linear equality constraints by adding slack variables.

If the algorithm detects an infeasible or unbounded problem, it halts and issues an appropriate exit message.

The algorithm might arrive at a single feasible point, which represents the solution.

For details, see Gould and Toint[63].

Phase 1 Algorithm

In Phase 1, the algorithm attempts to find a pointxthat satisfies all constraints, with no consideration of the objective function.quadprogruns the Phase 1 algorithm only if the supplied initial pointx0is infeasible.

To begin, the algorithm tries to find a point that is feasible with respect to all equality constraints, such asX = Aeq\beq. If there is no solutionxto the equationsAeq*x = beq, then the algorithm halts. If there is a solutionX, the next step is to satisfy the bounds and linear inequalities. In the case of no equality constraints setX = x0, the initial point.

Starting fromX, the algorithm calculatesM = max(A*X – b, X - ub, lb – X). IfM <= options.ConstraintTolerance, then the pointXis feasible and the Phase 1 algorithm halts.

IfM > options.ConstraintTolerance, the algorithm introduces a nonnegative slack variableγfor the auxiliary linear programming problem

$\min_{x, γ} γ$

such that

$\begin{matrix} A x - γ \leq b \\ Aeq x = beq \\ lb - γ \leq x \leq ub + γ \\ γ \geq - ρ . \end{matrix}$

Here,ρis theConstraintToleranceoption multiplied by the absolute value of the largest element inAandAeq. If the algorithm findsγ= 0 and a pointxthat satisfies the equations and inequalities, thenxis a feasible Phase 1 point. If there is no solution to the auxiliary linear programming problemxwithγ= 0, then the Phase 1 problem is infeasible.

To solve the auxiliary linear programming problem, the algorithm setsγ₀= M + 1, setsx₀=X, and initializes the active set as the fixed variables (if any) and all the equality constraints. The algorithm reformulates the linear programming variablespto be the offset ofxfrom the current pointx₀, namelyx=x₀+p. The algorithm solves the linear programming problem by the same iterations as it takes in Phase 2 to solve the quadratic programming problem, with an appropriately modified Hessian.

Phase 2 Algorithm

In terms of a variabled, the problem is

\begin{matrix} \min_{d \in ℜ^{n}} q (d) = \frac{1}{2} d^{T} H d + c^{T} d, \\ A_{i} d = b_{i}, i = 1, ..., m_{e} \\ A_{i} d \leq b_{i}, i = m_{e} + 1, ..., m . \end{matrix}

(17)

Here,A_irefers to theith row of them-by-nmatrixA.

During Phase 2, an active set ${\bar{A}}_{k}$ , which is an estimate of the active constraints (those on the constraint boundaries) at the solution point.

The algorithm updates ${\bar{A}}_{k}$ at each iterationk, forming the basis for a search directiond_k. Equality constraints always remain in the active set ${\bar{A}}_{k}$ . The search directiond_kis calculated and minimizes the objective function while remaining on any active constraint boundaries. The algorithm forms the feasible subspace ford_kfrom a basisZ_kwhose columns are orthogonal to the estimate of the active set ${\bar{A}}_{k}$ (that is, ${\bar{A}}_{k} Z_{k} = 0$ ). Therefore, a search direction, which is formed from a linear summation of any combination of the columns ofZ_k, is guaranteed to remain on the boundaries of the active constraints.

The algorithm forms the matrixZ_kfrom the lastn–lcolumns of the QR decomposition of the matrix ${\bar{A}}_{k}^{T}$ , wherelis the number of active constraints andl < n. That is,Z_kis given by

Z_{k} = Q [:, l + 1 : n],

(18)

where

$Q^{T} {\bar{A}}_{k}^{T} = [\begin{matrix} R \\ 0 \end{matrix}] .$

After findingZ_k, the algorithm looks for a new search directiond_kthat minimizesq(d), whered_k在活动的零空间约束。That is,d_kis a linear combination of the columns ofZ_k: ${\hat{d}}_{k} = Z_{k} p$ for some vectorp.

Viewing the quadratic as a function ofpby substituting ford_k, gives

q (p) = \frac{1}{2} p^{T} Z_{k}^{T} H Z_{k} p + c^{T} Z_{k} p .

(19)

Differentiating this equation with respect topyields

\nabla q (p) = Z_{k}^{T} H Z_{k} p + Z_{k}^{T} c .

(20)

∇q(p)is referred to as the projected gradient of the quadratic function because it is the gradient projected in the subspace defined byZ_k. The term $Z_{k}^{T} H Z_{k}$ is called the projected Hessian. Assuming the projected Hessian matrix $Z_{k}^{T} H Z_{k}$ is positive semidefinite, the minimum of the functionq(p) in the subspace defined byZ_koccurs when∇q(p) = 0, which is the solution of the system of linear equations

Z_{k}^{T} H Z_{k} p = - Z_{k}^{T} c .

(21)

The algorithm then takes a step of the form

$x_{k + 1} = x_{k} + α d_{k},$

where

$d_{k} = Z_{k} p .$

Due to the quadratic nature of the objective function, only two choices of step lengthαexist at each iteration. A step of unity alongd_kis the exact step to the minimum of the function restricted to the null space of ${\bar{A}}_{k}$ . If the algorithm can take such a step without violating the constraints, then this step is the solution to the quadratic program (Equation 18). Otherwise, the step alongd_kto the nearest constraint is less than unity, and the algorithm includes a new constraint in the active set at the next iteration. The distance to the constraint boundaries in any directiond_kis given by

$α = \min_{i \in {1, ..., m}} {\frac{- (A_{i} x_{k} - b_{i})}{A_{i} d_{k}}},$

which is defined for constraints not in the active set, and where the directiond_kis towards the constraint boundary, that is, $A_{i} d_{k} > 0, i = 1, ..., m$ .

When the active set includesnindependent constraints, without location of the minimum, the algorithm calculates the Lagrange multipliersλ_k, which satisfy the nonsingular set of linear equations

{\bar{A}}_{k}^{T} λ_{k} = c + H x_{k} .

(22)

If all elements ofλ_kare nonnegative,x_kis the optimal solution of the quadratic programming problemEquation 1. However, if any component ofλ_kis negative, and the component does not correspond to an equality constraint, then the minimization is not complete. The algorithm deletes the element corresponding to the most negative multiplier and starts a new iteration.

Sometimes, when the solver finishes with all nonnegative Lagrange multipliers, the first-order optimality measure is above the tolerance, or the constraint tolerance is not met. In these cases, the solver attempts to reach a better solution by following the restart procedure described in[1]. In this procedure, the solver discards the current set of active constraints, relaxes the constraints a bit, and constructs a new set of active constraints while attempting to solve the problem in a manner that avoids cycling (repeatedly returning to the same state). If necessary, the solver can perform the restart procedure several times.

Note

Do not try to stop the algorithm early by setting large values of theConstraintToleranceandOptimalityToleranceoptions. Generally, the solver iterates without checking these values until it reaches a potential stopping point, and only then checks to see whether the tolerances are satisfied.

Occasionally, theactive-set算法可以检测当一个职业有困难blem is unbounded. This issue can occur if the direction of unboundednessvis a direction where the quadratic termv'Hv= 0. For numerical stability and to enable a Cholesky factorization, theactive-setalgorithm adds a small, strictly convex term to the quadratic objective. This small term causes the objective function to be bounded away from –Inf. In this case, theactive-setalgorithm reaches an iteration limit instead of reporting that the solution is unbounded. In other words, the algorithm halts with exit flag0instead of –3.

References

[1] Gill, P. E., W. Murray, M. A. Saunders, and M. H. Wright.A practical anti-cycling procedure for linearly constrained optimization.Math. Programming 45 (1), August 1989, pp. 437–474.

Warm Start

When you run thequadprogorlsqlin'active-set'algorithm with a warm start object as the start point, the solver attempts to skip many of the Phase 1 and Phase 2 steps. The warm start object contains the active set of constraints, and this set can be correct or close to correct for the new problem. Therefore, the solver can avoid iterations to add constraints to the active set. Also, the initial point might be close to the solution for the new problem. For more information, seeoptimwarmstart.