Course 15 : Nonlinear Programming

**Author: OptimizationCity Group**

**Introduction**

Optimization is actually maximizing or minimizing the value of one or more objective functions with (explicit and implicit) constraints or without constraints. It is also called optimization to find the best solution for problems or to achieve the best result under existing conditions and assumptions. Basically, non-linear programming is a type of optimization problem whose variables are presented in the form of non-linear relationships.

The history of non-linear programming and optimization methods can be related to the era of past mathematicians, such as Boruni, Khayyam, Newton, Lagrange, and Cauchy. In fact, the development of optimization methods owes to the works of Newton and Leibnitz. Lagrange proposed the optimization method for objective functions with constraints, and Cauchy initially used the steepest decrease method in solving unconstrained nonlinear problems. However, the fundamental development in nonlinear optimization took place in the 20th century, and the advent of the computer greatly accelerated these developments and developments.

In 1951, Kuhn and Tucker also presented the necessary and sufficient conditions for the optimal solution of mathematical programming problems. This work became the basis for subsequent research in the field of nonlinear programming. In the early 1960s, Zoutendijik and Rosen were able to provide valuable works in the field of nonlinear programming. Carrol, Fiacco and McCormick presented a solution for optimization problems without constraints. Duffin , Zener and Peterson also presented geometric programming in 1964. It should also be considered that the development of programming with integers belongs to Gamory’s research, which is considered as one of the fundamental methods in the optimization of nonlinear problems with constraints. Gilmore and Gamory in 1963 were able to provide a suitable method for solving linear fraction problems. Dantzig, Charnes, and Copper presented stochastic programming methods, which are a type of nonlinear programming problems.

The main assumption of linear programming is that all functions (either the objective function or the constraints) are linear. Although this assumption is not true in some cases.

The goal of nonlinear programming problems in its general form is to find the values of x=(x_{1},x_{2},…,x_{n}) such that:

that f(x) and g_{i}(x) are known functions of n decision variables.

**Note: **In this lesson, for convenience, it is assumed that all functions are **differentiable**.

There is no specific algorithm that can solve all the above-mentioned problems in the framework of nonlinear programming. But, if certain assumptions are used about these functions, then special cases of nonlinear programming are obtained, which have achieved significant progress in providing their solution algorithms.

**Special cases of nonlinear programming**

There are different types of linear programming. We discuss some of the main ones below.

**Linear programming**

If the objective function and all constraints are linear functions and the variables are non-negative, then we will face the following linear programming model:

**Quadratic programming**

The objective function in this case is quadratic, while all its constraints are linear. In this case, the model will be as follows.

*H* is the second order partial derivative matrix of the objective function, which is known as the Hessian matrix.

**Fractional programming**

The objective function of a mathematical programming may be a ratio of two linear or non-linear functions. In this case, this model is as follows.

If the functions N(x) and Q(x) are linear, this program is known as a linear fractional program, and its optimal point is located on one of the boundary points in the feasible space.

**Separable programming**

The objective function and the constraint functions in this type of programming can be represented as a sum of univariate functions, in this case the model is as follows:

The application of this type of planning is, for example, in procurement, water supply systems, electricity networks and economic forecasts.

**Geometric programming**

Many engineering design problems and economic-industrial problems are formulated as follows:

In the above model, l_{0} represents the number of terms in the objective function and l_{i} represents the number of terms in the i-th constraint.

**Stochastic programming**

Some or all available parameters of a program (such as c_{i}, a_{ij}, and b_{i}) may be random variables. Chance constraint is one of the important issues in stochastic programming that can be converted into a non-linear programming. The mathematical representation of chance constraint is as follows.

where p is the probability of validity the constraint.

In this course note, first, some examples of the application of non-linear programming. Then the basic principles and concepts of solving some of its important types of problems are presented.

**Nonlinear programming applications**

**Variable cost transportation problem**

As it was said in course note 5 of the **optimization city** group, with the amount of supply and demand known, the goal of the transportation problem is to determine the optimal plan for transporting goods from different sources to different destinations, so that the total costs are minimized. It was assumed that the cost of transporting each unit of goods between each origin and destination would be a fixed amount, regardless of the amount transported. In practice, this cost may not be fixed and discounts are given for large shipments, so that the final cost of transporting a unit of goods is similar to the figure below. In this case, the cost of transporting x units of the product is in the form of a **non-linear function** C(x), which can be a piecewise linear function.

Therefore, if the transfer of xij units from the origin i (for i=1,2,…,m) to the destination j (j=1,2,…,n) is equal to C_{ij}(x_{ij}), then the objective function becomes as follows .

However, in such problems, even if the objective function is nonlinear, the constraints generally have a special structure that is **still** formulated in the framework of the linear transport model.

**The problem of optimal investment selection**

In order to choose optimal selection for investment, mathematical models based on non-linear programming have become a common tool for professional stock market managers. Since investors care about two factors, i.e. **the amount of return** on investments and also **the amount of risk** that exist; therefore, the choice of investment should be chosen in such a way that there is a logical relationship between *return* and *risk*. In the case of such a problem, the nonlinear programming model is expressed as follows:

Suppose that the selection of n different types of stocks for investing is consideration and the decision variable x_{j} represents the number of stocks of type j to be selected (for j=1,2,…,n). If **µ _{j}** and

**δ**represent the

_{jj}**mean**and

**standard deviation**of the return of a number of stock type j, respectively. Therefore, the average R(x) and variance V(x) of the total investment return are obtained as follows.

The optimal exchange between the above two factors is done with the objective function as follows.

The above objective function should be maximized. The non-negative number β represents the optimal trade-off between the investor’s return and risk. The meaning of β=0 is that there is no investment risk, and choosing a large value for β means giving importance to the risk in the investment.

The nonlinear programming model of this problem is as follows.

where P_{i} represents the price of one unit of stock type i and B the total budget that is intended for investment. The objective function is the same as the mathematical expectation of the investment utility that we seek to maximize in the above model.

**Sales planning**

Suppose the demand function of two products manufactured by a factory is as follows:

where x_{1} and x_{2} represent the sales volume of the first product and the second product, respectively, and p_{1} and p_{2} represent the sales price of the products, respectively. To get total sales we have:

The following constraint is defined according to functional restrictions.

Therefore, the final model is as follows.

**Layout planning problem**

layout planning is one of the common proble raised in operations research. For example, finding the optimal location for machinery, intermediate warehouses and emergency equipment is investigated. In this issue, we consider the following points.

(x_{i}, y_{i}): coordinates of the i-th intermediary warehouse (i=1,2,…,m)

(a_{j},b_{j}): coordinates of the j-th market

h_{i}: capacity from intermediate warehouse i

d_{ij}: distance from the i-th intermediate warehouse to the j-th market

w_{ij}: product unit sent from the i-th warehouse to the j-th market

To determine the location of intermediate warehouses in such a way that the sum of the weighted distances is minimized, we will have:

Different methods may be used to measure d_{ij}, including:

w_{ij} and d_{ij} are the existing variables of the problem, which turn it into a non-linear programming. If the location of intermediate warehouses (d_{ij}) is known, the problem becomes a special case of linear programming known as the **transportation problem**.

**Container design (model 1)**

Suppose 400 cubic meters of a commodity like wheat has to be transported across the river by boat. Regardless of the amount moved by the boat, the cost per trip is 0.1 USD.

The sides of the boat are made so that the cargo container should be 0.5 m high and 1 m wide, while the length of the container is variable (indicated by x_{1} in the figure below).

Figure (1) dimensions of the cargo container

We know that there are two cost components in this problem. The first component is the **construction cost** and the second component is the **shipping cost**, which is the total cost obtained from the sum of the two components. Now, to find the total cost, we do the following.

The construction cost of container includes the following components:

- The cost of making two ends:

- The cost of floor construction:

- The cost of walls construction:

Therefore, the cost of making a container is the sum of the above costs:

The shipping cost of this container is calculated as follows.

The total cost, which includes the cost of construction of the container and the cost of transportation, is as follows.

The goal of the problem is to find the value of length x_{1} such that the function f(x_{1}) is minimized, such that the x_{1} is greater and equal to zero. Therefore, according to the given information, this issue can be stated as follows:

You can see that the mathematical model is a non-linear one-variable programming model without restrictions, and numerical methods can be used to solve this model. In the following table, the optimal solution is obtained.

Table (1) numbering method to find the optimal solution of the container design model

Figure (2) graphical representation of the total cost of model 1

From the above table and figure, the optimal value of x_{1} is equal to 2 and the value of the objective function is equal to 100.

**Container design (model 2)**

In the previous model, assume that the sides of the boat are made so that the length of the cargo container does not exceed 1.5 meters, and the rest of the problem is the same as the previous model. This problem can be expressed as follows:

In this model, which is a one-variable nonlinear problem with a linear constraint, only the limits are changed and x_{1} must be less than 1.5 meters. In the figure below, we find that the feasible region x_{1} is limited to the left side of the vertical line x_{1}=1.5. The minimum value of f(x_{1}) is equal to 103.33.

Figure (3) graphical representation of the total cost of model 2

**Container design (model 3)**

In this model, assume that there is no limit on the length of the container, but the shipment of 400 cubic meters must be passed through the river within 30 days, and the boat can only make 10 trips per day. If we consider g(x_{1}) as the number of trips, the constraint is as follows.

By adding the above constraint to the model, we have:

By adding this restriction, the graphical solution of this model is as follows.

Figure (4) graphical representation of the total cost of model 3

**Container design (model 4)**

In this case, assume that x_{1} is equal to the length and x_{2} is equal to the width of the container. In this case, the cost of the container and the cost of transportation will be a function of two variables. The cost of constructing a container is as follows:

The construction cost of two ends:

The construction cost of floor:

The construction cost of walls:

Therefore, the cost of making a container is the sum of the above costs:

The transportation cost is calculated as follows.

Therefore, the total cost, which is a function of the objective, is as follows:

In order to minimize the total cost, the final model can be written as follows:

The above objective function is a non-linear programming model with two variables that has no restrictions. For the graphical solution, it is difficult to represent the problem due to its three-dimensional nature. In this course, we will discuss the methods of solving this category of problems. By setting the value, the optimal value of the above model is as follows.

**Container design (model 5)**

In this model, suppose there are no restrictions on length, width and height. If you consider x_{1} for the length, x_{2} for the length, and x_{3} for the height of the container. The cost of constructing a container is as follows:

The construction cost of two ends:

The construction cost of floor:

The construction cost of walls:

Therefore, the cost of making a container is the sum of the above costs:

The transportation cost is calculated as follows.

Therefore, the total cost, which is a function of the objective, is as follows:

In order to minimize the total cost, the final model can be written as follows:

The above objective function is a non-linear programming model with three variables that has no restrictions. The optimal value of the above model is as follows.

**Container design (model 6)**

In the previous model, there was no restriction on the values of x_{1}, x_{2}, and x_{3}. But if we know that the maximum number of trips is equal to 300, then the problem can be stated as follows.

The optimal solution of the above model is as follows.

**Graphic representation of nonlinear programming problems**

If the nonlinear programming problem has only one or two variables, it can be solved graphically. Consider the following example.

Example: A door and window manufacturing company has three workshops. Aluminum frames and metal parts are produced in workshop 1. In workshop 2, wooden frames are produced and in workshop 3, glass is cut and installed on the frames. The management of these workshops are looking for the production of two new products and have asked the operations research center to determine how many units the production of each product is according to the capacity of the workshop. Product 1 is a door with an aluminum frame and product 2 is a glass window with a wooden frame. Considering that both products require workshop 3 for welding, the capacity of this workshop creates competition between these two products. The table below shows the profit of each product, the amount of resources used and the capacity of each workshop.

Solution:

To produce each unit of product 1, 1 unit of plant 1 is used, which has only 4 units of capacity for the new product per hour of plant 1. This constraint is displayed algebraically x_{1}≤4. Similarly for plant 2, a limit 2x_{2}≤12 is required for Product 2. plant 3 capacity is used by both products, which limits 3x_{1} +2x_{2}≤18. Because products cannot be negative, the decision variables are non-negative, and its mathematical representation is x_{1}≥0, x_{2}≥0. The mathematical representation of the linear programming model is summarized as follows.

Because this linear programming model consists of only two variables x_{1} and x_{2}, a graphic method can be used to solve the linear program. This method requires drawing a shape with axes x_{1} and x_{2}. Each constraint covers a part of the two-dimensional axis, which creates a feasible region from sharing these constraints. We are looking for the point that creates the maximum value of the objective function, z. To draw the feasible region, we add the constraints one by one as follows.

The figure below shows the geometric representation of the model of this problem, with the difference that constraints 2 and 3 have been replaced by the constraint 9x_{1}^{2}+5x_{2}^{2}≤216. In the figure below, it is clear that the optimal solution will be equal to (2,6)=(x_{1},x_{2}).

In addition, the solution still lies on the boundary of the feasible region, but it is not a feasible corner solution. If the objective function was also changed (for example, Z=3x_{1}+x_{2}), then the optimal solution could be based on a corner solution. However, the fact that the optimal solution does not necessarily lie in a corner means that a very important result that was used in linear programming, that is, **only searching for corner solutions is justified, no longer applies to nonlinear programming.**

Now suppose that the linear constraint of the above example still remains in the problem, but the objective function becomes nonlinear. For example, if Z=126x_{1}-9x_{1}^{2}+182x_{2}-13x_{2}^{2}, then the graphical expression of the problem will be as below and its optimal solution will be x_{2}=5 and x_{1}=2.666. This solution is also placed on the boundary of the feasible region.

Another complication that appears in nonlinear programming problems is that a relative maximum solution is **not necessarily** an absolute maximum solution. For example, consider the one-variable function plotted in the figure below.

This function has three relative maximum in the interval 0<x<5, i.e. x=0, x=2, and x=4, while only one of them, i.e. x=4, is the absolute maximum solution. Similarly, the function has three relative minimums x=1,3,5 and only one absolute minimum x=5.

Since nonlinear programming algorithms can only obtain relative maximum solutions (relative minimum), it is very important to know under what conditions a relative maximum solution will also be an absolute maximum. It is reminded that in the case of a one-variable and unbounded function f(x) (assuming having the first and second derivatives), to prove that the relative maximum solution is the same as the absolute maximum, it is sufficient that the following relationship holds.

Such a function whose convexity is always downward is called a **concave function**. Likewise, if we replace the sign ≤ with ≥, the resulting function whose convexity is always upward is called a **convex function**.

When a nonlinear programming problem is unconstrained, the **concavity** of the objective function guarantees that any relative maximum solution is an **absolute maximum** solution. Similarly, the **convexity** of the objective function is sufficient for the **absolute minimum** of any relative minimum solution.

If the problem has a constraint, then in order to ensure the absoluteness of the relative solution, another condition must be taken into account that the justified region is **a convex set**.

Note: Consider the set S={x|g(x)≤b}. If the function g(x) is a convex function, then the set S is a **convex set**.

**Types of non-linear programming problems**

Nonlinear programming problems appear in many forms. Unlike the simplex method in linear programming, there is no specific algorithm that can solve all types of nonlinear programming problems. However, various algorithms have been developed to solve many of these problems. In the next section, the most important types of nonlinear programming problems are introduced, including unconstrained optimization, constrained optimization, convex programming, and non-convex programming, and solution methods are provided for each one.

**Unconstrained optimization**

Unconstrained optimization problems, as their name implies, do not have any constraints, so their goal is to maximize the function f(x) for x=(x_{1},x_{2},…,x_{n}). The condition is that a specific solution such as x=x* in the function f(x) is maximum (provided it is concave) that the following relationship holds at the point x=x*.

To get the value of the vector x, it is enough to calculate the solution of n equations resulting from setting n partial derivatives equal to zero. Unfortunately, in the case of non-linear functions f(x), such equations are not only often non-linear, but the resulting device is also impossible to solve. The following sections describe how to find solutions for these devices. These processes play an important role in solving many other types of problems that are even constrained, because many algorithms for solving nonlinear constrained problems are designed so that they can be transformed into an unconstrained optimization problem at each iteration.

If the variable x_{j} has a non-negative limit x_{j}≥0, then a change in finding the optimal point is made as follows.

The graphical representation of the above relationship is given in the figure below.

The method of solving this type of optimization is presented in two parts: one-variable and multi-variable optimization without constraints.

**Optimization of one-variable functions without constraints**

This optimization process follows a series of solutions that leads to the optimal solution. In each iteration, it starts from the current solution and finds a better solution with a systematic search. In the preceding discussions, if we are looking for maximization, the objective function should be concave and if we are looking for minimization, the objective function should be convex.

In general, there are four search methods for optimizing one-variable and unconstrained functions, which are:

1- The middle point rule method

2- Newton’s method

3- Decisive line method

4- Direct root method

**1- The middle point rule method**

The one-dimensional search process is based on a simple idea, that is, to detect whether the derivative of the function is positive or negative at the current solution. The positivity of the derivative for the current solution indicates that the optimal solution,x*, is greater than the current solution. Therefore, in such a case, the previous solution can be a lower bound for the next solutions. Likewise, if the derivative becomes negative, the optimal solution will be an upper bound for the optimal solution.

To express this process briefly, we use the following definitions.

x’ :Current solution

:Lower bound for x*

:Upper bound for x*

ε :Error value for x*

There are various logical rules for choosing a new solution, but in the following simple method, which is called the Midpoint rule, the midpoint of two bound is chosen each time. Below is a summary of this algorithm.

**Summary of the midpoint rule**

**First step: **Determine the value of ε. Find a lower and upper bounds (two arbitrary values whose derivative is positive and negative respectively). The initial solution is:

**Iterative step:**

1- At the point x=x’, calculate the following value

2- If

set the value of the lower bound equal to x’ that is

3- If

Set the value of the upper bound equal to x’ that is

4- The new solution is equal to

**Stopping step: **If

The new solution x’ is located at a distance ε from x*. If so, stop. Otherwise, return to the iterative step.

Example: Consider the following function.

Suppose we want to obtain the maximum value of the above function shown in the figure below.

The first and second derivatives of this function are:

Because the second derivative always has a negative value, so f(x) is a concave function and Midpoint rule method can be used to find its maximum. Consider

We choose the mid point, i.e. x’=1, as the initial solution. We take the error value equal to ε=0.01. Therefore, in the final iteration it is necessary to have:

And we consider the optimal solution to be the middle point of this interval. The serie of solutions obtained based on this algorithm is given in the following table.

According to the above table, the optimal solution is as follows.

**2- Newton’s method**

The basis of this method, which is also known as the Newton-Raphson method, is done by approximating the function f(x) by a quadratic function (using Taylor second order approximation) around the point x_{j} as follows. to be:

For the point x_{j+1 }to be an optimal point, we must have:

It means we have:

Example: Minimize the following objective function using Newton’s method and starting from the point x_{1}=2.5.

The first and second derivatives of the above function are as follows.

Iteration 1:

Iteration 2:

Iteration 3:

Iteration 4:

The algorithm converges to the point x=2.

Note: The stopping condition of this algorithm can be defined in the following two ways.

where ε is an arbitrary number close to zero.

**3- Modified Newton method**

To increase the convergence speed of Newton method, the following algorithm is designed based on Newton method.

**Step 1) **Compute f”(x_{j}) for point x_{j} (exists and available). If f”(x_{j})>0, go to step 2. Otherwise, go to step 4.

**Step 2) **Calculate the new point x_{j+1} by Newton’s method, that is, use the following equation.

where

If so, go to step 3. Otherwise, continue from point x_{j+1} and stop if the stop criterion is met. A common stop criterion is the following two simultaneous conditions:

**Step 3) **Find the smallest integer i (i=1,2,…) in such a way that the following two conditions are met at the same time.

After determining i, get the value of x_{j+1} from the following equation.

**Step 4)**

a) If f'(x_{j})>0, include d=-1 and if f'(x_{j})<0, d=+1. Then it is defined:

b) Find the smallest integer i (i=0,1,2,…) such that:

c) Move to point x^{j+1}(i) and go to step 1.

Example: Solve the following problem using Newton modified method.

Solution:

Consider the starting point equal to x^{1}=0.25.

Iteration 1:

Step 1: Because 0>f”(x^{1}=0.25)=-27.75, so we go to step 4.

Step 4: We have:

a) Therefore, we set d=+1 and then

b) We have for i=0

Therefore

According to the calculations above, i=0 is not appropriate. Therefore, we repeat the calculations for i=1.

According to the calculations above, i=1 is not satisfied. Therefore, we repeat the calculations for i=2.

c) Therefore, we move to the point x^{2}(i=2)=2.39 and return to step 1.

Iteration 2:

Step 1: Because

we go to step 2.

Step 2:

also:

The point x^{3}=2.08 is selected as the new search point and we return to step 1. If this process continues, it will converge to the optimal point x*=2.

**4-Three point method**

The evaluation of the one-variable function f(x) in this method is done at points Q_{1}, Q_{2} and Q_{3} (quartile). In this way, the quadrant that has the most appropriate value of the mentioned function is selected and the new distance for the next iteration will include the quadrants on the sides of the quadrant with the most appropriate value. In this way, the search distance around the global optimal point is more limited until the remaining distance is within the convergence criterion, ε, (predetermined). If the range of variable x is [a,b], the quartiles are calculated as follows.

Example: Use the three-point method to obtain the optimal solution of the following problem:

The convergence criterion, ε, is considered equal to 0.7.

Iteration 1: We divide the domain [0,20] into quadrants Q_{1}, Q_{2}, and Q_{3} and then calculate the value of f(x) for quadrants as follows.

It can be seen that f(Q_{2})=57 is more than the other two quadrants, and therefore the new range for the next iteration includes [5,15], that is, the quadrants on the sides of Q_{2}.

Iteration 2: We divide the new domain [5,15] into quadrants, we have:

It can be seen that f(Q_{1})=61.6 is more than the other two quadrants, and therefore the new domain for the next iteration includes [5,10], that is, the quadrants on the sides of Q_{1}.

Iteration 3: We divide the new domain [5,10] into quadrants, we have:

It can be seen that f(Q_{2})=61.55 is more than the other two quadrants, and therefore the new domain for the next iteration includes [6.25,8.75], that is, the same quadrants on the sides of Q_{2}.

Iteration 4: We divide the new domain [6.25,8.75] into quadrants, we have:

It can be seen that f(Q_{3})=61.60 is more than the other two quadrants, and therefore the new range for the next iteration includes [7.5, 8.75].

Iteration 5: We divide the new domain [7.5,8.75] into quadrants, we have:

It can be seen that f(Q_{1})=61.68 is more than the other two quartiles and therefore the new range for the next interval includes [7.5,8.125]. Because this interval is within the acceptable limit (ε=1), the algorithm is stopped and the optimal value of x, the average of the upper limit and the lower limit of the last interval is reported.

**5- Dichotomous Search method**

The middle of the desired distance [a,b] is calculated in this method, and then the points k_{1} and k_{2} are considered at a distance of 2ε, which is determined as follows.:

If f(k_{1})<f(k_{2}), then the new interval for the next iteration is equal to [a,k_{2}], if f(k_{1})>f(k_{2}), then the new interval for the next iteration is equal to [k_{1},b ], and if f(k_{1})=f(k_{2}), then the new interval for the next iteration is equal to [k_{1},k_{2}]. To clarify the method, we give an example.

Example: Obtain the optimal solution of the following problem by Dichotomous search method.

Solution:

Iteration 1:

Because f(k_{1})>f(k_{2}), as a result, the new distance to obtain the optimum includes [0,10.4].

Iteration 2:

Because f(k_{1})<f(k_{2}), as a result, the new distance to obtain the optimal includes [4.8,10.4].

Iteration 3:

Because f(k_{1})<f(k_{2}), as a result, the new distance to obtain the optimal includes [7.2,10.4].

Iteration 4:

Because f(k_{1})>f(k_{2}), as a result, the new distance to obtain the optimal includes [7.2,9.2].

Iteration 5:

Since f(k_{1})>f(k_{2}), the new distance to get the optimal includes [7.2,8.6]. Because 8.6-7.2=1.4, we have to continue the process.

Iteration 6:

Since f(k_{1})>f(k_{2}), the new distance to get the optimal includes [7.2,8.3]. Because 8.3-7.2=1.1 and ε=1.5 is greater than 1.1, the algorithm stops and the optimal solution is equal to x*=7.85.

**6- Fibonacci method**

This algorithm is used to minimize a convex function on a finite interval such as [a,b]. This algorithm uses the sequence of Fibonacci numbers which is as follows.

The steps of the Fibonacci algorithm are as follows:

Step 1: Consider a criterion for the allowed distance from the optimum as ε =I_{n}>0. While the starting distance is equal to I_{1}=b-a.

Step 2: Find the smallest index of k that satisfies the relation F_{k}.ε>(b-a). You calculate the value of ε’ as follows.

Meanwhile, k specifies the number of iterations.

Step 3: Calculate I_{2} using the relation F_{k-1}.ε’=I_{2} and calculate the first two points to evaluate f(x).

Step 4: The new interval is updated as follows.

Step 5: Determine the new point for evaluating the f(x) function proportionally from the remaining distance and compare the function value at this point with the function value of the optimal point in the previous iteration. As a result, the new interval will be calculated sequentially.

Step 6: Calculate the new distances using F_{k-j+1}.ε’=I_{j} where j=4,5,…,n-1 and specify the new point for evaluation proportionally in this interval and repeat step 5 and go to step 7.

Step 7: Stop if the last interval is within the criterion ε. The optimal point is in the last interval. Otherwise, go to step 6.

Example: Calculate the following problem using the Fibonacci method.

Solution:

The searching interval in this problem is I_{1}=b-a=20-0=20, and the index k equals 7 to establish F_{k}.ε>(b-a). which we have in this case:

Iteration 1: For I_{2} we have

The first two points to evaluate the function f(x) are:

Iteration 2: For I_{3} we have:

The next point for evaluation is calculated as follows.

Iteration 3: For I_{4} we have:

Iteration 4: For I_{5} we have:

Iteration 5: For I_{6} we have:

Iteration 6: For I_{7} we have:

The distance between 7.61 and 8.56 is equal to 0.95 and because it is less than the convergence limit, therefore the optimal point will be in this distance and the average will be the upper limit and the lower limit.

**7- Golden method**

The reduction of the search distance in this method, unlike the Fibonacci method, is for successive transitions at a constant rate and will not change. If the rate of change or r remains constant, we have:

Because the rate of change of distances must be non-negative, therefore, r=0.618, which specifies the decreasing ratio of distances in Golden method.

Example: Calculate the following problem by Golden’s method.

Therefore,

Iteration 1:

Because f(x^{2})>f(x^{1}), the new distance becomes [0,12.36].

Iteration 2:

Since f(x^{2})>f(x^{3}), the new distance is: [4.72,12.36]

Iteration 3:

Since f(x^{2})>f(x^{4}), the new distance is: [4.72,9.44].

Iteration 4:

Since f(x^{2})>f(x^{5}), the new distance is: [6.52,9.44]

Iteration 5:

Since f(x^{2})>f(x^{6}), the new distance is: [6.52,8.32]

Iteration 6:

Since f(x^{2})>f(x^{7}), the new distance is: [7.21,8.32]

Iteration 7:

Since f(x^{2})<f(x^{8}), the new distance is: [7.64,8.32].

Due to the fact that the distance between the upper and lower limit is less than the standard limit, we stop and the optimal solution is reported as below.

**Optimization of multi-variable functions without constraints**

Now consider maximizing the function f(x) which is a concave function of several variables x=(x_{1},x_{2},…,x_{n}) and there is no limitation on the values of the variables. By setting the partial derivatives of this function equal to zero, a system of equations is obtained and the solution of which is a necessary and sufficient condition for optimality. If this system is not solvable with an analytical method, a numerical search process must be used to solve it. The question that arises is, how can the method of searching for one variable of the previous section be extended to this problem as well?

In the previous section, by increasing or decreasing the value of x, we moved from the current test solution to the next solution. The direction of movement was determined through the sign of the derivative. The ultimate goal was to reach a point where the value of its derivative is zero. In the multivariable function problem, the movement of the point is not limited to only two directions, but it can move in any directions, each of which corresponds to the partial derivative of the relevant variables. The goal is to reach a solution where all the values of its partial derivatives are equal to zero. To choose the direction of movement, we will deal with the concept of gradient.

Since the function f(x) is assumed to be differentiable, therefore, every point like x’ will have a gradient vector which is shown as ∇f(x’). Specifically, at the point x=x’ the gradient is a vector whose elements are partial derivatives of the function at this point so that for x=x’

The importance of the gradient is because the maximum amount of changes of the function at the point x’ is obtained for moving in the direction of the gradient. To express this concept geometrically, take the direction of the gradient as the direction of the line that connects the origin of the coordinates to the point

and the value of

is calculated for x_{j}=x_{j}‘.

Since the goal of the problem is to find a feasible solution that maximizes the objective function, it seems that if we move in the direction of the gradient as much as possible, we will reach the destination sooner. To guarantee finding the global optimal solution in maximum, it is necessary to first prove that the objective function is concave. If we seek to minimize the objective function, we must show that the objective function is convex. To show that the function f(x) is concave, we act as follows.

1- Hessian matrix: First, the Hessian matrix is calculated as follows. This is a square matrix and the number of rows or columns is equal to the number of variables of the f(x) function.

2- Negative definite or semi-negative definite: If xTHx < 0, then the matrix H is negative definite, and if xTHx ≤ 0 it is semi-negative definite.

Note: To prove that the function f(x) is convex, it is enough to show that the matrix H is positive semi-definite or positive definite.

**1- The cyclic coordinate method**

In order to minimize a multivariable function starting from the point x^{1}, a suitable direction must be determined first, and then the minimum of that function in this direction (d) can be found using one variable optimization method as follows.

The range for λ can be defined as follows.

The coordinate axes specify the direction (d_{j}) for minimization, that is:

For example, moving in the direction d_{j}=e_{j} will mean that only the x_{j} component has changed while the other components will be kept constant. In this case, the minimization in each transition is done only during one of the components. The solution algorithm of this method is as follows:

Step 1: Choose a starting point x^{0} for movement and the termination criterion ε>0, if we have:

We stop.

Step 2: Starting from j=1, perform the k-th minimization as follows.

This minimization is done using one of the single-variable techniques without the derivative (for example, the three-point technique).

a) If λ ̅_{j} is the optimal result of the above problem, then we have

b) If j<n (n is equal to the number of variables in the n-variable problem) continue, otherwise (i.e. j=n) go to step 3.

Step 3: If we have,

We stop and the point x^{k+1} represents a minimum of the problem. Otherwise, go back to step 2 and continue again by substituting x^{k+1}®x^{0}.

Example: Minimize the following problem using the cyclic coordinate method.

Solution:

Iteration 1:

Step 1: Stop criterion and starting point are given above.

Step 2: We start the optimization in x_{1} axis (first component) for j=1. It means:

So

In order to minimize the above one-variable problem, we use the three-point method below for the ε=0.5 criterion.

Therefore, the new range for λ includes [0,2] and we have:

Therefore, the new range for λ includes [0.5, 1.5] and for that we have:

The new range contains [0.75,1.25] for which the stopping criterion is met:

a) So for the new point we have:

b) Since j=1<n=2, we continue to move along the vertical axis, that is, the movement will be along d_{2}=(0,1). So:

It means:

In order to minimize the above one-variable problem, we use the three-point method below for the ε=0.5 criterion.

Therefore, the new range for λ includes [0,2] and for it we have:

Therefore, the new range for λ includes [0.5, 1.5] and for that we have:

The new range contains [0.75,1.25]. The optimality condition holds and we have:

Since j=2, we go to the third step to test the optimality.

Iteration 2:

Step 1: Stopping criterion and starting point are given.

Step 2: We start the optimization in x_{1} axis (first component) for j=1. It mean:

So

In order to minimize the above one-variable problem, we use the three-point method below for the criterion =0.5ε.

Therefore, the new range for λ includes [-1,1] and for it we have:

Therefore, the new range for λ includes [-0.5, 0.5] and for that we have:

The new range contains [-0.25,0.25] for which the stopping criterion is met:

a) So for the new point we have:

b) Since j=1<n=2, we continue to move along the vertical axis, that is, the movement will be along d2=(0,1). So:

So

In order to minimize the above one-variable problem, we use the three-point method below for the ε=0.5 criterion.

Therefore, the new range for λ includes [-1,1] and for it we have:

Therefore, the new range for λ includes [-0.5, 0.5] and for that we have:

The new range contains [-0.25,0.25]. The optimality condition holds and therefore we have:

Since j=2, we go to the step 3 to test the optimality.

Therefore, the optimality condition is satisfied and therefore the point (0,0) is a minimum point for the above problem.

**2- Hooke and Jeeves Method**

Cyclic movement along the components may stop at a non-optimal point and the movement of any of the components does not improve the objective function. The non-derivability of the function can be the cause. To solve this problem, the direction of movement in iteration k can be done in the direction of x^{k+1}-x^{k} instead of moving in the components.

The steps of this method are as follows:

Step 1: Choose an optimality criterion (ε>0) and a starting point (x1).

Step 2: In order to move cyclically, the following one-variable problem is solved using one of the one-variable techniques and the optimal value is obtained:

a) Put the optimal value in the following expression.

b) If j<n, continue the process and return to step 2.

c) Otherwise, i.e. if j=n, put x^{k+1}=t^{n+1} and we stop if the following condition is met.

d) Otherwise, go to the third step.

Step 3: In order to increase the speed of the algorithm, put: d ̅=x^{k+1}-x^{k} and find the optimal λ from the following problem by one-variable method:

After solving the above model, do the following.

Example: Solve the minimization model using Hooke and Jeeves Method. Use the three-point method to solve the optimization model with one variable.

Solution:

Iteration 1:

Step 1: Do the following.

Step 2: Cyclic movement is performed along the components.

To find the optimal value of the above model, we use the three-point method and ε=0.5.

The new search interval is [2,4], so:

The new interval contains [2.5,3.5], so:

Therefore, the final range includes [2.75, 3.25], which satisfies the stop condition, and therefore the optimal value is obtained as follows:

a)

b) Because j=1<n=2, hence we continue on x_{2} axis with d_{2}^{t}=(0,1). Therefore, we have:

Using the three-point method and stopping criteria, we will have:

The new interval is [-2,0], so:

The new interval contains [-2,-1], so:

The new interval contains [-1.75,-1], so:

Therefore, the final interval includes [-1.75,-1.37], which meets the stop criteria (ε=0.5) and therefore the optimal value is obtained as follows:

c) j=n=2 and we put:

For this new point (t^{3}=x^{2}), the optimality criterion is not established based on the following calculations.

Step 3: In order to determine the direction of acceleration, we will have:

For the interval of the decision variable in the above model, we have:

Therefore, the model is as follows:

Using the three-point method, we will have:

The new search interval is [-0.33,0.33], so:

The new search distance is [-0.33,0]. Because the interval is less than 0.5, we reached the optimal solution and therefore we have

Thus

We put j=1 and x^{k}=x^{2} and go to iteration 2 and step 2 to repeat the cyclic movement.

We perform iteration 2 similar to iteration 1 (including two cyclical movements in the direction of the components and one movement in the direction of acceleration) and finally we reach the solution x^{k}=(2.04,1.02) which satisfies the stopping condition.

**3- The steepest descent algorithm**

In this method, the direction of movement d is selected with the following assumption.

With this assumption, the algorithm is expressed as follows.

Step 1: Select point x^{1} to start moving. Set t=1 and go to the step 2.

Step 2: If

Stop. Otherwise put:

and obtain the optimal λ from solving the following problem.

put

and replace t with t+1 and repeat the step 2.

Example: Obtain the optimal solution of the following model using the steepest descent method.

Solution:

Iteration 1:

Step 1: We select the point x^{1}=(1,1) to start the movement and ε=0.5.

Step 2: For point x^{1} we have:

In order to access the optimal amount of movement from point x^{1}, we will have:

In order to determine the optimal amount of movement from point x^{1}, we will have:

To get the optimal value, we can derive the function f(x).

Therefore, the obtained point is minimum.

Iteration 2:

Step 2:

The optimal value of the above model is as follows.

Iteration 3:

Step 2:

For x^{3} we have:

The optimal value of the above model is as follows.

Iteration 4:

Step 2:

For x^{4} we have:

The optimal value of the above model is as follows.

Iteration 5:

Step 2:

For x^{5} we have:

The optimal value of the above model is as follows.

that the optimal solution is obtained in this iteration and is equal to (-0.04, 4.91).

**4- Newton’s method**

If we represent the function f(x) by an approximation of the Taylor’s second order expansion at the point x^{t}, we have

The necessary condition for the function f to be minimum at the point x^{t+1} is:

Hence we have:

Therefore, for Newton’s method we will have:

The direction of movement in Newton’s method is adjusted by the factor H(x)^{-1}, so as to reduce the zigzagging in the movement, which leads to a quadratic approximation instead of a linear approximation.

Example: Solve the following problem using Newton’s method.

Solution:

Iteration 1: To get the x2, we do the following.

Iteration 2:

For x^{3} we have:

Iteration 3:

Iteration 4:

Iteration 5:

Iteration 6:

For x^{7} we have:

Therefore, x^{7} has the optimality criterion and is the minimum solution for the problem. Of course, by continuing the iterations, the algorithm will converge to the minimum x*=(6,2).

**5-Gradient search method**

The gradient search method is one of the solution methods for unconstrained optimization with more than one variable. Since there is no limit in the problem, it seems that moving in the direction of the gradient can be an efficient method. This movement continues until finally the optimal solution is obtained, at which point the value of the gradient is equal to zero. In the gradient method, it is suggested to start moving from the current solution in the direction obtained from the gradient of the current solution and not stop until the value of f(x) increases. Here, the gradient is recalculated to determine the next direction of movement. With this approach, each iteration involves changing the current solution x’ as follows.

where the value of t* is the positive value of t that maximizes the following equation:

Note: the expression f(x’+t∇ f(x’)) is the same as the function f(x) for the following values:

The above expression only contains the constant value x’_{j} and the variable t, and as a result, f(x) will become a function of a variable of t. The iterations of this method continue until the value of the gradient is equal to zero with an acceptable distance. if thisdistance is considered equal to ε, then we stop when we have:

In the gradient search method, the most difficult part of each iteration is determining t*, that is, the value that maximizes the function f in the direction of the gradient vector. Since the values of x’ and ∇f(x’) are known and the function is also concave (the second derivative is negative), solving the problem becomes the maximization of a one-variable concave function. Hence, to solve it, the one variable search method presented in the previous section is used.

**Summary of the gradient method**

**Initial step: **

Determine the value of ε and an initial solution x’. Go to stopping step.

**Iterative step:**

1- define f(x’+t∇ f(x’)) as a function of t. This is done by substituting x’ as follows.

This expression is then inserted into f(x).

2- Determine the value of t* that maximizes the function f(x’+t∇ f(x’)) for all non-negative values of t using the one-variable search method or derivation.

3- Set x’ equal to x’+t*∇ f(x’). Go to stopping step.

**Stopping step:**

At the point x=x’, calculate the value of the gradient. Stop if the following relationship exists.

The current solution of x’ is the optimal solution with acceptable distance ε. Otherwise, return to the iterative step.

Example: Consider the following two-variable problem

Solution:

The partial derivatives in terms of x1 and x2 are as follows.

It is easy to show that the function f(x) is a concave function. The Hessian matrix of the function f(x) becomes as follows.

To show that the function f(x) is concave, it is sufficient to show that the matrix H is negative semi-definite or negative definite. For this purpose, it should be shown that x^{T}Hx≤0. Therefore, we have:

Considering that the matrix H has satisfied the negative semi-definite condition, therefore the function f(x) is concave.

We start the gradient search method from x=(0,0). Because at this point the partial derivatives are equal to 0 and 2 respectively, so the gradient is equal to

Now, we start the first iteration as follows.

Then we put the above values in the f(x).

Given that

and also

Therefore, we get the new solution.

In this new solution, the gradient for this solution is as follows.

So in the second iteration:

So we have:

Given that:

Therefore, we have the new solution as follows.

If we continue this process, the next solutions will be (0.5, 0.75), (0.75, 0.75), (0.75, 0.875), and (0.875, 0.875). These solution are close to the (1,1) which is shown in the figure below this process.

Note: If f(x) is not concave, using this method leads to a local maximum, which is not necessarily an global maximum.

Note: If the goal of the problem is to minimize the function f(x), in this case, in each iteration, you must move in the direction of the negative gradient. In other words, to find the next solution, you have to replace the next solution as follows.

**Constrained optimization**

In this section, we discuss how to determine the optimal solution of a non-linear programming problem (with differentiable functions) with constraints.

**Kahn-Tucker optimality conditions**

The general optimality conditions for constrained optimization problems are expressed as Kuhn-Tucher conditions as follows.

Theorem: Assume that the differentiable functions f(x),g_{1}(x),g_{2}(x),…,g_{m}(x) are available. In this case, the following solution can be optimal:

Provided that it is possible to find u_{1},u_{2},…,u_{m} that apply in the following conditions.

Conditions 3 and 5 only examine the feasibility of the solution.

Note: The above condition is a *necessary condition* for optimality. the above condition is *sufficient*, if the feasible region is the convex set and the objective function concave (for maximization problem).

Remark: the feasible space S is defined as S={x|g_{i}(x)≤b,∀i}. A set S is a convex set if the functions g_{i}(x) are convex.

Example: Consider the following two-variable nonlinear programming problem.

In this problem, m=1, g_{1}(x)=2x_{1}+x_{2} and as a result g_{1}(x) is convex (the linear function is both convex and concave). To show that f(x) is concave, we do the following.

Therefore, f(x) is concave. According to the Kahn-Tucker conditions we have:

To solve this situation, we act as follows.

It follows from condition 1(b) that u_{1}≥1. From x_{1}≥0, the relationship 1/(x_{1}+1)≤1 results. u_{1}≥1 means that 1/(x_{1}+1)-2u_{1}<0 and as a result, the relationship x_{1}=0 is obtained from condition 2(a). Using the same method from condition 4 and considering that u_{1}≥1, it follows that 2x_{1}+x_{2}-3=0 and therefore we have x_{2}=3.

Note: In the case of more complex problems, it is very difficult to obtain an optimal solution directly from the Kahn-Tucker condition.

Note: These conditions are a guide to determine the optimality of any proposed solution.

**Optimization with linear constraints**

The characteristic of optimization problems with linear constraints is that the constraints are linear, but the objective function is nonlinear. In order to solve these models, special algorithms based on the generalization of linear programming have been developed.

The quadratic programming discussed below is an important special case of this type.

**Quadratic programming**

In quadratic programming all constraints are linear, but the objective function f(x) must be quadratic. Therefore, the only difference with a linear programming problem is that some expressions of the objective function are either the square of one variable or the product of two variables.

Quadratic programming is very important because it appears in the formulation of many scientific problems. For example, the investment problem with a certain return is included in the framework of this model.

In the quadratic programming problem, the goal is to find the value of the vector x in such a way that:

where c is a row vector, x and b are column vectors, Q and A are matrices, and T is the inverse of the matrix. The coefficients qij (elements of matrix Q) are fixed values and qij=qji.

Several algorithms have been developed for the special case where the objective function is concave, and here we describe one of these algorithms. Because this algorithm uses the simplex method, it is very common.

In the **modified simplex method**, the first step is to formulate the Kahn-Tucker condition. An easy way to express this condition is:

Except for the last constraint, the above three constraints are the same as linear programming, and the problem becomes finding a feasible basic solution of a linear programming in which the last constraint must be included in it somehow. Variables x and y and variables u and v are called complementary variables.

If c^{T}≤0 and b≥0, the basic variables are y and v, which is as follows:

If the above condition is not met, the problem should be changed by adding an artificial variable to any equation where c_{j}>0 or b_{i}<0. In the case that c_{j}>0, it is obtained by adding the artificial variable z_{1 }to the left side, and in the case that b_{i}<0, it is obtained by subtracting the artificial variable z_{2} from the left side and then multiplying it by -1. In this case, artificial variables z_{1} and z_{2} can be introduced as basic variables. Considering these artificial variables, since the u and x variables become non-basic, they will be zero and the complementary constraint will be established. To eliminate these two variables, one can use the phase one of the **two-phase simplex method** and find a feasible basic solution for the problem. The objective function of phase one is as follows:

In relation to linear programming constraints that also include artificial variables, the only change that should be considered is how to choose the basic input variable, which is done as follows.

**Restricted Entry Rule:** Do not select non-basic variables whose complementary variable is already basic as the basic input variable. Among other non-basic variables, according to the common rule of the simplex method, specify the basic variable of the input.

The above rule ensures that during the execution of the algorithm, the complementary constraint is always in place. When an optimal solution of the following form is found for this problem, then x* will be the optimal solution of the original quadratic problem.

Example: Obtain the optimal solution of the following model.

Solution:

To solve this model, the above model must first be converted to standard format. The extensive format of the general form is as follows.

By putting the objective function with the general form above, we will have:

Considering that the f function is a concave function, the modified simplex method can be used to find the global optimum.

An additional constraint that is not explicitly considered is:

The restricted entry rule ensures that the above constraint is always in place. Specifically, in the case of three complementary pairs, namely (x_{1},y_{1}), (x_{2},y_{2}) and (u_{1},v_{1}), whenever one of the variables of the pairs is a fundamental variable, the other variable cannot be basic. Note that only basic variables can be non-zero. The following table shows the steps of the modified simplex algorithm.

The first part of the table shows the equations after changing the objective function from minimizing Z to maximizing -Z. In the three iterations of this table, the rules of the simplex method have been applied, with the exception that some non-basic variables that could have become basic have been omitted due to the restricted entry rule. In the first iteration, the variable u_{1} is not considered as the basic input variable, because its complementary variable v_{1} is already the basic variable. However, x_{1}, which has a smaller coefficient, is chosen -4 < -3. In the second iteration, u_{1} and x_{1} can be selected as non-basic variables that have a negative coefficient in the zero row, but because the complementary variable u_{1} (v_{1}) is a basic variable, it cannot be selected according to the restricted entry rule. In the third iteration, because v_{1} is no longer the basic variable, as a result, u_{1} is selected as the basic input variable.

The optimal solution of the problem is u_{1}=3, x_{2}=9 and x_{1}=12 and the rest of the variables are equal to zero. Therefore, the optimal solution of the quadratic programming problem will be (x_{1},x_{2})=(12,9).

**Frank-Wolfe algorithm**

In the previous sections, methods related to several special cases of convex programming were discussed. In this section, the Frank-Wolfe algorithm is presented to solve convex programming problems with linear constraints.

The linear approximation of the objective function f(x) at the test point x’ using Taylor expansion is:

Since f(x’) and ∇f(x’)x’ are constant values, the objective function becomes ∇f(x’)x plus a constant value. The resulting linear programming problem can be solved by the simplex method, and the optimal solution x_{LP} can be determined. As a result of moving from x’ to x_{LP}, the value of the objective function increases continuously. But when we move away from x’, the value of the approximate objective function is not necessarily close to the value of the original objective function, which is nonlinear. Therefore, x_{LP} is not considered as a optimal solution, but instead we look for a solution that maximizes the objective function on the line segment connecting x’ and x_{LP}. With the help of a one-variable optimization method,e.g. mid-point method, the solution is obtained in terms of t as a fraction of the distance between x’ and x_{LP}. This solution is the starting point of the next iteration of the algorithm. The successive solutions obtained in successive iterations tend towards the original optimal solution, and when there is no improvement in the optimal solution, we have reached the optimal solution..

**Summary of the Frank-Wolf algorithm**

**Initial step:**

Determine an initial basic solution of x^{(0)} using linear programming methods. Set K equal to one.

**Iterative step:**

**Part 1** – For j=1,2,…,n, calculate the partial derivatives of the function f(x) at the point x=x^{(k-1)} and set c_{j} equal to the following expression.

**Part 2** – Obtain the optimal solution of the following linear programming, i.e. x_{LP}^{(k)}.

**Part 3** – Calculate the value of the parameter t. For this purpose, put x=x^{(k-1)}+t(x_{LP}^{(k)}-x^{(k-1)}) in the function f(x) and express it in terms of t, that is, f(x)=h(t) get Using the one variable optimization method, determine the maximum of h(t) in the interval 0≤t≤1 and x^{(k)} for the optimal t. Go to stopping step.

**Stopping step:**

Stop if x^{(k-1)} and x^{(k)} are close enough, and consider x^{(k)} as the optimal solution. Otherwise, return to the iterative step by setting k=k+1.

Example: Consider the following convex programming problem with linear constraints.

Find the optimal solution of the above model.

Solution:

We have:

Therefore, the maximum objective function is obtained for x=(2.5,2), which clearly **violates** the functional constraints. Therefore, finding an optimal solution that will definitely meet the constraints needs more investigation.

Since x=(0,0) is a feasible solution (as well as a corner solution for the feasible region), we consider it as the basic feasible solution of Frank-Wolf algorithm. By putting this solution in the expressions related to partial derivatives, c_{1}=5, c_{2}=8 and the function 5x_{1}+8x_{2} which is a linear approximation of the objective function is obtained, so the linear programming problem becomes as follows.

The graphical solution of the above model is as follows.

According to part 3 of the iterative step, all points through the line between (0,3) and (0,0) are calculated as follows to find the parameter t that leads to the maximum value of the objective function.

The value of t* that maximizes the function h(t) in the interval 0≤t≤1 is equal to t*=0.66 and is obtained from the following equation.

In this way, the first iteration ends with the determination of the next solution, as follows.

In the second iteration, with the solution x^{(1)}=(0,2), the coefficients of the objective function are as follows.

As a result, the objective function will be 5x_{1} and therefore the linear programming model in this iteration will be as follows.

The graphic solution of the above optimization model is as follows.

The optimal solution of the linear optimization model is equal to x_{LP}^{(2)}=(2,0). Hence, the line between x^{(1)} and x_{LP}^{(2)} becomes

By deriving the above function, we have:

As a result, we have:

The solutions to these two iterations are shown in the figure below.

We repeat the same process as above for the next iterations until the stopping condition is satisfied.

**Note:** In some algorithms, quadratic approximation is used in each iteration instead of linear approximation because quadratic approximation is significantly more accurate than linear approximation.

**Non-convex programming**

Convex programming assumptions make solving the problem easier, every local optimal solution obtained under these assumptions will be an global optimal solution. Although real nonlinear programming problems are close to such assumptions, there can be some minor differences that lead to non-convex programming.

One of the widely used methods for solving non-convex programming problems is the **S**equential **U**nconstrained **M**inimization **T**echnique or **SUMT**. This method is divided into two types. The first type, which is the Exterior Point Algorithm, deals with nonfeasible solutions and is pushed towards the feasible region by using a penalty function. Another type, which is the Interior Point Algorithm, deals with feasible solutions. In this method, a barrier function prevents exit from the feasible region.

**Sequential Unconstrained Minimization Technique or SUMT**

In this method, the main problem is replaced by a series of optimization problems without restrictions, and the solutions of this series of problems lead to the optimal solution. Solving unconstrained optimization problems (such as the gradient method) is much easier compared to problems with constraints, so this algorithm has a special advantage. For each of these series of unconstraint problems, a positive number, r, is chosen and it becomes smaller every time, then the value of x is obtained by solving the following problem.

B(x) is a barrier function that has the following properties for every feasible solution x (for the original problem):

1) If x is far from the boundary of the feasible region, the value of B(x) is small.

2) If x is **close** to the boundary of the feasible region, the value of B(x) is high.

3) When the distance of the point with one of the boundaries of the feasible region tends to zero, the value of B(x) tends to infinity.

In this way, we start from a test solution and determine the maximum function P(x;r) by the search process. The function B(x) prevents crossing the boundary of the feasible region of the main problem and searching outside it.

The most common method of choosing B(x) is:

It should be noted that for the feasible solution of X, the value of the denominator of each term is the distance of the solution x from the functional constraint or the non-negativity constraint.

**Note:** The interesting property of B(x) is that if all assumptions of convex programming hold, P(x;r) will also be a concave function.

If B(x) prevents the search at border points of the feasible region, then what should be done if the optimal solution is in the border? For this reason, the SUMT method solves a series of unconstrained optimization problems in which the value of r becomes smaller and tends to zero each time. Also, the solution of each iteration is the initial solution of the next iteration. Below is a summary of the SUMT method.

**Initial step:**

Choose a feasible solution x^{(0)} that is not on the boundary of the feasible region. Set the value of k equal to one and determine the positive values of r and θ. For example: θ=0.01, r=1

**Iterative step:**

Starting from x^{(k-1)} and using the gradient search method, determine x^{(k)} the local maximum solution of the following function.

**Stopping step:**

If the difference between x^{(k-1)} and x^{(k)} is insignificant, stop and consider x^{(k)} as the local maximum solution of the problem.

Otherwise, set k equal to k+1 and r=θr and return to the **iterative step**.

Note: If the assumptions of convex programming are not valid, then it is necessary to start from different feasible points to reach several local optimal solutions. The best local optimal solution obtained from the main problem is considered as the approximate global solution solution.

Example: To describe the SUMT method, consider the following two-variable problem:

Although g_{1}(x)=x_{1}^{2}+x_{2} is convex due to the convexity of each term, but because f(x)=x_{1}x_{2} is not concave, so this problem will not be a convex programming.

In the initial step, the solution (x_{1},x_{2})=(1,1) is obviously feasible and it is not on the boundary of the feasible region, we choose as the initial solution. Therefore, x^{(0)}=(1,1) and we choose the values r=1,θ=0.01. In the iterative step we have:

To maximize the above function, starting from the point (1,1) and with the parameter r=1, we reach the solution x^{(1)}=(0.9,1.36) by applying the gradient search process. By choosing the value of r=1×0.01=0.01 and starting from the point (0.9,1.36), the gradient search process reaches the solution x^{(2)}=(0.983,1.933). Another iteration with r=0.01×0.01=0.0001 and starting from x^{(2)} leads to x^{(3)}=(0.998,1.994). The value of the solution for the next iterations is as follows.

**Penalty method and barrier method**

In this section, optimization methods for nonlinear programming problems with equal and unequal constraints are reviewed. These methods transform problems with constraints into new problems without constraints corresponding to those problems. In general, two methods, the **penalty function **and the **barrier function**, are studied in this section. The optimization pattern of these methods is actually the transformation of constrained problems into new unconstrained problems with parameters such as γ.

The first method is the **penalty function method**. The term penalty is used to link the mathematical relation of the constraint to the objective function. This method produces a sequence of unacceptable points whose limit is the optimal solution of the initial problem. In the penalty method, parameter γ is the factor linking the mathematical relationship of the constraint to the objective function and represents the high cost of the violation of the constraint in the optimization process.

The second method is the **barrier function method**, in which the barrier parameter (γ) is a factor to link the mathematical relationship of the constraint to the objective function and is used to **prevent** the generation of solutions outside the feasible region. In this method, by adding the parameter γ, the inner point tends to the boundary of the feasible region, so a sequence of feasible solutions is created whose limit is the optimal solution to the initial problem. This method can only be used for problems with unequal constraints.

Figure (5) graphical representation of the of penalty and barrier methods

**Penalty function method**

Penalty function method is used to transform a problem with constraints into an unconstrained problem or a series of unconstrained problems. Constraints are added to the objective function with the help of the penalty parameter and penalty functions prevent violation of constraints. To get familiar with the concept of penalty functions in equal constraint programming problems, consider model (1).

In the above model, X is the vector of decision variables. This model becomes an unconstrained problem by using γ, which is a large number, as follows.

To reach the optimal solution through model (2), h^{2}(X) must tend to zero, otherwise the penalty factor has not been able to play its role. Now consider model (3), which is a mathematical relationship with an unequal constraint:

In this situation, it is no longer possible to use the penalty factor (γ) according to model (2). In fact, the following objective function is not appropriate.

Because the penalty function is considered only for the two situations g(X)<0 and g(X)>0. Therefore, the penalty function of model (3) becomes an unconstrained problem as model (4). This method is also called the external penalty function method.

If g(X)<0, the maximum value inside the bracket becomes zero, i.e

In this case, there is no need for a penalty. Now, if the constraint of the problem is g(X) > 0, the maximum value inside the bracet becomes greater than zero, i.e

In this case, it is necessary to use the penalty factor γ. In general, in the penalty function method, when the points are infeasible, a positive penalty should be applied. If there are feasible, there is no need to use the penalty. Therefore, the appropriate penalty function P(X) will be defined as equation (5).

In the above penalty function,

And the two penalty functions ϕ and θ are defined as follows.

where p is a positive integer. Therefore, the penalty function is written as the following mathematical equation:

With the help of the above function, the objective function is obtained as follows.

Example: For the model below, draw the auxiliary function for different values of γ.

Solution: First, we write the objective function:

The representation of the function F(X) will be as follows.

Figure (6) graphical representation of function F(X) in terms of different values of γ.

**Algorithm of penalty function method**

In this part, an algorithm is presented in which penalty functions are used as a tool to solve the model with constraints. Consider the following problem:

In the above model, X is the non-zero set. Functions f, g, and h are assumed to be continuous functions. The steps of this algorithm are as follows:

**A) Initial step**

1. Selection of initial and starting point X_{k}

2. Selection of penalty parameter value γ_{1}>0 and numerical value β >1.

3. Considering a small numerical value ε>0 to identify the optimal solution and stop the problem solving process

4. Considering k=1 for the first iteration and then referring to the main step

**b) main step**

1. Starting from the initial solution X^{k}, the following problem is solved:

Assuming the optimal solution is X^{k+1}, go forward.

2. If the following condition, which is the criterion of optimality, is met, i.e.:

stops. otherwise by putting

and return to main step part 1 by replacing k+1®k. This process continues until reaching the optimality criterion.

Example: Minimize the following problem using the penalty function method:

Solution:

A) Initial step

1. The initial solution is X_{1}=(2,1) and the objective function becomes zero at this solution.

2. The initial value of the parameters is as follows::

3. ε=0.003 is considered to find the time to reach the optimal solution and stop the problem solving process.

4. Considering k=1 for the initial step and go to the main stage

B) main stage

1. Starting from the initial solution X_{1}=(2,1), the following problem is solved for each step of iteration k in order to reach X_{k}.

2. By putting

To continue the process, refer to part 1 main stage one. A summary of the calculations is given in the table below.

At iteration 5, the stopping condition is met.

**Barrier function method**

Barrier function, like penalty function, are used to transform the objective function with constraints into the objective function without constraints. These types of functions create an obstacle to avoid losing the feasible region. The barrier method is also referred to as the internal penalty function method. To present this method, some definitions are necessary:

A) Initial problem: consider the multivariate objective function f and the mathematical relation with inequality constraint g as follows for minimization.

B) Obstacle problem: To minimize the objective function of the initial problem, we form the barrier function with the help of the following relation:

As the barrier function moves from the inside to the boundary of the region, B tends to infinity. In general, the barrier function B is defined as the following relation:

In this case, the barrier function model is as follows:

**Algorithm of barrier function method**

Now, the algorithm for minimizing a nonlinear programming problem with unequal constraints using the barrier function for the initial problem is presented as follows:

**A) Initial step**

1. Selection of initial solution X_{1}∈E, observing the restriction g(X_{1})<0

2. Determining the value for γ_{1} and 0<β<1 which is the multiple of the reduction of γ_{1} towards zero;

3. Determining a small numerical value ε >0 to stop the problem solving process and identify the optimal solution

4. Setting k=1 for the first iteration and refer to the main step

**B) main step**

1. Starting from the initial solution X_{k}, the following problem is solved:

If we approach the boundary of the acceptable region when the value of the barrier function B is a large negative number, taking a step may place the point outside the feasible region. Therefore, with more procaution in maintaining the feasible region, the problem is optimized without constraints.

Assuming the optimal solution X_{k+1 }go forward.

2. If the following optimality condition is established:

The problem solving process is stopped and otherwise by placing

And substituting k+1 ®k returns to part 1 main step.

Example: Minimize the two-variable problem using the barrier function method:

Solution:

A) Initial step

1. Initial and starting point X=(0,1)

2. The initial value for γ in the first iteration is γ = 10 and β = 0.1

3. Stop criterion ε=0.02

B) Main step:

1. Forming the barrier objective function:

The summary of the calculations of the barrier function method is as follows:

In iteration 6, the stop condition is established as follows, so we have:

And the algorithm ends.

**Fractional programming**

The objective function of a fractional programming is a ratio of two linear or non-linear functions. Based on this, fractional programming can be classified as linear fractional planning and non-linear fractional planning.

Among the important applications of this planning, we can mention economic applications such as: maximization of profitability, maximization of return on investment, maximization of income on risk, minimization of cost on time.

Consider the following fractional programming (linear or nonlinear):

where S is the feasible space, P and Q are continuous functions and Q(x)>0. To solve the above model, we use the following algorithm.

Step 1:

For the point X^{1} inside the feasible region and take *a*_{2}=0 or *a*_{2}=P(X^{1})/Q(X^{1}), go to the step 2. Assume K=2 and the optimality criterion to be a positive value (ε).

Step 2:

a) Solve the following program with an appropriate method.

Consider the value of the optimal solution of the above model equal to X^{k}.

b) Calculate the following value:

Step 3:

a) If

Stop it.

b) otherwise:

Set *a _{k+1}*

*®a*and return to step 2.

_{k}Example: Solve the following model by fractional programming method.

Solution: To solve this problem, the following definitions are done first.

Iteration 1:

Step 1:

Step 2: We solve the following convex model:

The solution to the above model is as follows.

Step 3: Because the stopping condition is not met, we calculate *a*_{3} as follows.

And we return to step 2.

Iteration 2:

We solve the following quadratic convex model.

By solving the above model (using Frank-Wolf method), the optimal solution is as follows.

Step 3: Because the stop condition is not met, the value of *a*_{4} is calculated.

And we go to iteration 3.

After two more iterations, we reach the following optimal solution.

which is very close to the global optimum of this problem which is X*={1,1}.

**Separable programming**

The general model of separable programming is as follows:

The decision variables in this model are also separable so that each of the existing functions, i.e. f_{j}, in the objective functions has only one decision variable x_{j} and also each of the gij functions for the i-th constraint will also include a decision variable x_{j}.

Solving separable models is not done directly, but first a linear approximation of such models is obtained and then the problem is solved using the simplex method. The linear approximation is done as follows. Consider the following example.

It seems that the Z function is not separable despite the term (x_{1}+x_{2})^{2} in it, but assuming x_{1}+x_{2}=x_{3}, the above model can be converted into a separable model as follows.

The objective function can be separated in the following way:

Also, the constraint functions are clearly separable. In order to estimate a linear approximation of the functions in Z, we first consider f_{1}(x_{1}).

Figure (7) function f_{1}(x_{1}) and linearized function

It can be seen that the linear approximation for the function f_{1}(x_{1}) is estimated as three line segments in the above figure. Each point can be represented as a convex linear combination of the value of the function at the points of intersection of three line segments, that is, we will have:

Also, the variable x_{1} can be represented as a convex linear combination of the intersection points of three line segments, that is:

Therefore, to approximate the function f1(x1) we will have:

For the other two functions, the linear approximation can be expressed in the same way. For the function f2(x2) it will be as follows.

The representation of the function f_{2}(x_{2}) and the linear approximation is as follows.

Figure (8) function f_{2}(x_{2}) and linearized function

For the function f_{3}(x_{3}) it will be as follows.

According to the approximate functions above, the final model is as follows.

The above model is a linear programming model. By solving by simplex method, we get the following answers.

It should be noted that the variable x_{1} (for example) can only be located in one of the three available intervals for the three line segments. Hence, at most only two consecutive values of λ_{j} should be positive. This is known as the adjacency principle and only if this principle is satisfied, the approximation of one program can be used to solve it.

The **adjacency principle** for maximizing concave functions will be provided automatically. But the adjacency principle is not established automatically for the maximization of non-concave functions. To establish this principle, the following mechanism should be used in the simplex method in order to provide that principle to solve the problem.

If the criteria for selecting the input variable to the base in the simplex method causes the selection of one of the coefficients of existing convex combinations, for example λ_{i}, to enter the base, this entry will be possible if either one of the desired coefficients (λ_{i+1} and λ_{i-1}) It does not exist in the current base (in other words, it is zero) or only one of its coefficients λ_{i+1} and λ_{i-1} are already in the base.

It should be noted that if the above mechanism makes it impossible to enter a coefficient such as λ_{i} into the existing basis of a simplex table, the simplex method will continue by selecting the next variable to enter the basis.

The functions f_{1}, f_{2} and f_{3} are concave. Therefore, the adjacency principle will be provided for the above linear program and there will be no need to use the adjacency rule in simplex.

Many problems seem inseparable, but they can be investigated using separable function. Some of these conversions, for example, can be seen in the following table:

Hence, it can be seen that most of the problems can be transformed into separable programs, at least theoretically, but the solution of this separable program may not be possible due to the increase in the number of variables λ and the increase in the number of constraints of the problem. .

**Lagrangian Relaxation Method**

This method is used to solve nonlinear programming problems with equality constraints. Consider the following optimization model.

A French mathematician named Lagrange first integrated the objective function and existing constraints into a new function called the Lagrange function and then showed that maximizing this new function is equal to solving the original problem. The Lagrange function is:

So λ_{i} are known as Lagrange coefficients and also X=(x_{1},x_{2},…,x_{n}). The necessary conditions to maximize the function L(X,λ) are as follows:

The relationship (*) shows that any possible solution to the problem must apply within the existing constraints (the same as g_{i}(X)=b_{i}). Also, the relation (**) indicates that the following relation should be provided for every solution of the problem.

The above necessary conditions are also sufficient to solve a convex programming and the global optimal point can be determined. We consider the following example for convex programming.

In order to solve this problem using the Lagrange function, we will have:

To maximize this function, we do the following.

From the solution of the above equations we will have:

**Exercises**

Exercise 1: Consider the following function:

a) Find the local maximum and minimum points using the first and second derivatives.

b) Using the first and second derivatives, show that f(x) does not have an global maximum or an global minimum.

Solution:

a)

b) For x>19.592, we have,

and

Therefore, f has no upper bound.

For x<0.408, we have,

and

Therefore, f has no lower bound.

Exercise 2: For each of the following functions, investigate whether it is convex, concave, or neither concave nor convex.

a) f(x)=10x-x^{2}

b) f(x)=x^{4}+6x^{2}+12x

c) f(x)=2x^{3}-3x^{2}

d) f(x)=x^{4}+x^{2}

e) f(x)=x^{3}+x^{4}

Solution:

A) Because we have for all values of x

Therefore, the function f is concave.

b) Because we have for all values of x

Therefore, the function f is convex.

c) because

is, therefore, the function f is neither convex nor concave.

d) Because we have for all values of x

Therefore, the function f is convex.

e) because

is, therefore, the function f is neither convex nor concave.

Exercise 3: Investigate whether the following functions are convex, concave, or neither convex nor concave using the Hessian matrix.

a) f(x)=10x-x^{2}

b)f(x)=3x_{1}+2x_{1}^{2}+4x_{2}+x_{2}^{2}-2x_{1}x_{2}

c) f(x)=x_{1}x_{2}

Solution:

a) The Hessian matrix is calculated as follows:

Therefore, the function f is negative semidefinite, which means that the function f is concave. Of course, there is another method to determine the type of convexity of function f, which is as follows:

Note: If

And

Let the function f be negative definite and therefore the function f is concave.

Note: If

And

Let the function f be positive definite and therefore the function f is convex

Note: If any of the above inequalities are equal, then the function f becomes positive semi-definite or negative semi-definite.

In this example because:

Therefore, the function f is negative definite and concave.

b) Hessian matrix is calculated as follows:

Because

and

Therefore, the function f is positive definite and the function f is convex.

c) The Hessian matrix is calculated as follows:

Because

Therefore, it is not included in any of the conditions and therefore this function is neither convex nor concave.

Exercise 4: Consider the following function:

Show that this function is convex. For this purpose, make the function as a sum of functions of one or two variables, each of which is convex.

Solution:

Consider

Because

For all values of x, the function f_{1} is both convex and concave.

because

For all values of x, the function f1 is convex.

For all values of x we have:

also

And

For all values of x we have

Therefore, the function f_{34} is convex.

For all values of x we have:

also

and

For all values of x we have:

Therefore, the function f_{56} is convex.

Since f_{67}(x_{6},x_{7})=f_{56}(x_{7},x_{6}), therefore the function f_{67} is also convex.

Considering that all functions are convex, therefore the function f is convex.

Exercise 5: solve the following problems using the one-variable search process method and using the acceptable error ε = 0.04 and the specified limits.

a)

b)

Solution:

a) The results are shown in the following table.

b) The results are shown in the following table.

Exercise 6: Consider the following unconstrained optimization problem.

a) Starting from the point (1,1) and ε=0.3, determine the optimal solution of this problem using the gradient search process method.

b) Obtain the exact solution of the problem by solving the equation ∇f(x)=0.

c) Show the trend of the solutions obtained from part (a) in a figure. Then estimate the next three solution. In addition, show that the series of solutions tends to the solution of part (b).

Solution:

a)

Because|-0.25|<0.3, therefore, the stopping condition in iteration 4 is met.

b)

c)

The optimal solution tends to the solution (0.5, 0.5).

Exercise 7: Consider the following optimization problem with linear constraints.

a) Show that this problem is convex programming.

b) Obtain an optimal solution using KKT conditions.

c) Show with a simple argument that the solution obtained from part (b) is optimal.

Solution:

The Hessian matrix of the function f(x) is obtained as follows.

For all x values except x_{1}=-1 we have:

also

Since x_{1}≥0, therefore the function f(x) is negative definite and therefore the function f(x) is concave. Because the constraints are linear, this is a convex programming problem.

b)

The KKT conditions for this problem are as follows.

Suppose u≠0. From (4), we have that x_{1}+2x_{2}=3. Consider that x_{2}=0, then x_{1}=3 and from (2a) we have that u=0.25. These assumptions satisfy all conditions and therefore (x_{1},x_{2})=(3,0).

c) Because -x_{2}^{2} is a strictly decreasing function in the interval x_{2}≥0 and the function ln(x_{1}+1) is a strictly increasing function in x_{1}≥0. Therefore, it is clear that in order to maximize the function f, the value of x_{1} should increase and the value of x_{2} should decrease, so according to x_{1}+2x_{2}≤3, the solution (x_{1},x_{2})=(3,0) is optimal.

Exercise 8: Using KKT, find the optimal solution to each of the following problems:

a)

b)

Solution:

a)

KKT conditions are as follows.

The solution to the above equations is equal to

Because this is a convex programming problem, therefore

Is the global optimal solution.

b)

The solution to the above equations is equal to

Because this is a convex programming problem, therefore

Is the global optimal solution.

Exercise 9: Consider the following convex programming:

Obtain the optimal solution of the above model by using the simplex algorithm.

Solution: The vector form of the model is as follows.

In addition to the above model, the following complementry constraint must be considered.

To solve using the simplex method, the objective function is subtracted from the sum of the first and second constraints. Since the standard format of the simplex table is in the form of maximization (Max), therefore, by multiplying the objective function by -1, minimization can be converted into maximization. As a result of the above changes, the basic model is as follows.

According to the above model, the simplex method can be used to solve the problem as follows.

The optimal solution is as follows.

Exercise 10: Consider the following convex programming.

a) Starting from the solution (x_{1},x_{2})=(0.5,0.5), use the Frank-Wolf method four iterations.

b) Show graphically what the sequence of solution obtained in iterations is like.

c) Using KKT conditions, obtain the optimal solution using these conditions and compare it with the solution of part (b).

Solution:

a)

For four iterations of the Frank-Wolf method, the results are as follows.

b)

c)

From the solution of the above equations, the optimal solution is equal to (x_{1},x_{2})=(0.8934,0.7131),u=0.5737.

Exercise 11: Apply the SUMT method to the following convex programming problem. Consider r=1,0.01,0.0001

Solution:

First, we get P(x;r):

Exercise 12: Apply the SUMT method to the following convex programming problem. Consider r=1,10^{-2},10^{-4},10^{-6} and start from the initial solution (x_{1},x_{2})=(0.5,0.5).

Solution:

Exercise 13: Consider the following non-convex programming.

a) Determine the feasible region of x. Obtain a relation of the first three derivatives of f(x). Using this information, sketch the general form of f(x) in the feasible region. Specify the relative and absolute maximum points of the function f(x).

b) Use a one-variable search process with ε = 0.05 to determine each of the relative maximum solution. To determine the initial bound, use the general form of the function drawn in part (a). Which of the relative maximum solutions is the absolute maximum?

c) To determine the relative maximum solution, use the SUMT method with r=1,10^{2},10^{4}. Use the solutions x=3 and x=15 as the initial solution for these searches. Obtain the maximum P(x;r) with the help of the search process of a variable described in part (b). Which of the relative maximum solution is the absolute maximum?

Solution:

A) To find the feasible region, the solutions of the equation x^{2}+x-500=0 are obtained, so the feasible region becomes x∈[0,21.866].

Based on the above functions, the general display of the function is as follows.

b)

The local maximum is 1.6016 and the global maximum is 21.051.

c)

With the initial solution x=3, the optimal solution obtained from the SUMT method is equal to x=1.625 with the objective function f(x)=733.4. In the With the initial solution x=15, the optimal solution is equal to x=21.07 with the objective function f(x)=20562, in which case the global maximum is equal to x=21.07.