Course 9: Deterministic dynamic programming

Author: OptimizationCity Group

Table of Contents

Introduction

Dynamic programming is a type of mathematical technique that is able to divide large and complex problems into smaller and simpler problems and by solving each of them, obtain the optimal solution to the main problem. This method, in fact, is used for a series of consecutive and dependent decisions. Because these decisions are made in several stages, they are called multi-stage decision problems. A type of multi-stage decision making problem is dynamic planning.

Dynamic programming attempts to achieve maximum efficiency by determining a combination of successive decisions. Unlike methods such as linear programming and integer programming which are standard and limited, the dynamic programming method does not have a specific and standard framework for formulating problems, and in fact, what dynamic programming does is to provide a general method to solve problems. By using dynamic programming, a wide range of programming problems can be solved, including linear programming, nonlinear programming, network, integer programming, and so on. However, using this method in solving linear programming problems, especially in cases where the dimensions of the problem are large, is not very economical.

In spite of such a wide range and capability in solving different types of problems, in order to be able to solve the problem by dynamic programming, we must know the general structure of the dynamic programming problem and specify the stages and states of the problem. Dynamic planning is used in various fields such as production and sales planning, workforce planning, production and inventory control, and maintenance system.

The characteristics of dynamic programming problems are as follows:

1- If we divide every dynamic program into several smaller problems, each of those smaller problems is called a stage. At each stage, a decision must be made that affects the next stages.

2- At each stage, it should be determined in what situation we are. Therefore, each stage includes one or more states that connect it to the previous and subsequent stages. The state at each step is determined by two things: (1) the previous state and (2) is the decision made at the previous step. Decisions are made at each stage according to the state at the current stage. The number of states can be finite or infinite.

3- At each stage, by making a decision, the state of the current stage is transferred to the state of the next stage.

4- Assuming that the state in one stage is known, the optimal policy regarding the remaining stages is independent of the policy adopted in the previous stages.

5- Having the current state of the system contains all the information needed to determine the optimal policy related to the remaining steps.

6- The method of solving these problems starts with finding the optimal solution for all the states of the last stage.

7- The optimal policy of all states of stage n can be determined by a return relationship and assuming that the optimal policy of all states of stage (n+1) is known. The return relationship is as follows.

In the above relationship, the system is in stage n and state s, and the goal is to find x_n such that the term f_n(s,x_n) or p(s,x_n) + f^*_n+1(x_n) is minimized.

8- The solution method is backward movement. This is how the solution is done from the last step to the first step and it is obtained by using the recursive relationship from one step to the previous step.

9- In dynamic programming, knowing the current state of the system contains all the information needed to determine the optimal policy related to the remaining steps. This property is called the Principle of Optimality.

10- The method of solving these problems starts with finding the optimal solution for all the states of the last stage.

Since there is no single structure to solve all problems in dynamic programing, here we try to solve different examples from different areas to understand how to solve such problems.

Examples

Example:

The World Health Organization plans to send 5 medical groups to three countries for health and medical education in poor countries. This organization should determine how many groups should be assigned to each country in order to maximize the efficiency of all 3 countries. The criterion for measuring efficiency is the increase in life expectancy, which is based on the number of medical groups in the following table:

Solution:

x_n: representing the number of medical groups assigned to the nth country.

s (state): The number of groups that have not yet been allocated in previous steps.

p_i(x_i): The measure of efficiency for assigning medical group x_i to the i-th country.

The mathematical programming model of the above problem is as follows:

Using the concepts mentioned about the dynamic programming problem, fn(s,xn) becomes as follows:

Using the above, the solution process can be started from n=3 (the last step).

In step 3 we are looking to find the number of medical groups for the country 3. S can take a value from 0 to 5 (integer numbers). If s=0, it means that there is no group left for the country 3 to be allocated and the allocation has been done in the previous steps, so in the country 3, according to the table of basic information, it is not possible to increase the life expectancy and therefore fn* becomes zero.

If s=1, it means a group is not assigned and can be assigned to the country 3. According to the table, fn* is equal to 50. This process is repeated for s from 2 to 5.

The table for the second stage (n=2) is as follows:

To explain how to calculate the above table (n=2), we describe some cells in detail.

Column x₂=0

In the case where s=0 (that is, all the medical groups in the country 1 have been allocated), there are no medical groups left for the countries 2 and 3 to be allocated, and therefore the value of f₂^*(0) becomes zero.

In the case that s=1, it means that one medical group can be assigned to countries 2 and 3. In this column, x₂=0, which means that the group should not be assigned to the country 2, and one group remains, which is assigned to the country 3. Now we return to the table n=3 and get the optimal value for s=1 being equal to 50. Therefore, f₂(x₂,s) is equal to 0 (Life expectancy for the country 2) + 50 (Life expectancy for the country 3).

Column x₂=1

As you can see, for s=0, a dashed line has been drawn. Because if s=0, it means that there are no group left for allocation for countries 2 and 3, and therefore the allocation of a group for the country 2 (x₂=1) does not make sense. Therefore, this point is expressed with a dash, -.

If s=1, it means that a medical group can be allocated, which can be allocated in countries 2 and 3. Because x₂=1, this group is assigned to the country 1 and according to the given table, the increase in life expectancy will be equal to 20 years. Because the allocation of 1 medical group in the country 2does not leave any group for the country 3, therefore Life expectancy is zero for the country 3, so f₂(x₂,s) becomes 20+0.

If s=2, that means 2 medical groups can be allocated in countries 2 and 3, and since x₂=1, one group in the country 2 and one group in the country 3 should be allocated. In this case f₂(x₂,s) is equal to 20 (life expectancy by assigning one group in the country 2) + 50 (life expectancy by assigning one group in the country 3).

The f₂^*(s) column is equal to the maximum f₂(x₂,s) for all x₂, and x₂ corresponding to higher values is placed in x₂^*.

In the table of the last stage (n=1), since no allocation has been made, there are 5 medical groups left that can be allocated to countries 1, 2 and 3. If x_n=0, it means that no medical group is assigned in the country 1 and the remaining 5 groups are used for the countries 2 and 3. Therefore, the optimal value is equal to 160. The way to calculate other values is similar to table n=2. The optimal allocation of medical groups in the form of x₁=1, x₂=3, x₃=1, will cause 170 years of life expectancy.

Example:

The activity of a workshop fluctuates a lot in different seasons. Due to the high cost of training and hiring new workers, the factory manager does not want to fire additional workers. On the other hand, he does not want to have a large number of workers working during the off-season. Due to the nature of the products manufactored in this workshop, it is not possible to manufacture and store the products. With these conditions, the factory manager wants to know the policy of hiring workers in different seasons. The number of workers required for each season is shown in the table below.

The number of workers in each season cannot be less than the above values. Each additional worker costs about $2,000 per season. In addition, the costs of hiring/firing workers are equal to the product of $200 multiplied by the square of the difference in the employment level of two consecutive seasons. The number of workers can be a fraction because it is possible to hire part-time. What should be the level of employment in each season to minimize the cost:

Solution:

n: number of chapters

x_n: employment level in the nth season

S: Recruitment level

r_n: minimum manpower required in the nth stage

We start at step n=4 (spring).

In n=4 (spring), because the number of workers cannot exceed 255, therefore, section 2000 (x_n-255) is equal to zero.

For n=3 (winter), f₃(x_n,s) becomes as follows:

The number of workers in winter is equal to x₃, and because at the beginning of this season, s workers come from the previous season, so (x₃-s) workers must be hired or fired. The cost of hiring and firing (x₃-s) will be equal to 200(x₃-s)². Considering that the required number of workers is equal to 200, if the number of workers in this season (x₃) exceeds 200, additional worker fee should be paid. The cost of stage 4 (spring) is obtained as the number of workers in this season (x₃). Since, at the beginning of the spring, the the number of workers is equal to x₃, the total hiring cost is equal to 200(255-x₃)².

In order to obtain the minimum of the function f₃(x_n,s), we must take the partial derivative of this function in terms of x₃, and therefore we have:

Then

Because the second derivative is positive, therefore x₃^* is minimum. Because (s+250)/2 is in the range of (245,252.5), therefore (s+250)/2 is the optimal value of x₃. After replacing x₃ with (s+250)/2 in f₃(x₃^*,s) we have:

For stage n=2 (autumn), the calculations are done as follows:

To find x₂^*, the following model is solved:

By taking the derivative and setting the derivative equal to zero, we will have:

If 240≤s≤255, then x₂* =(2s+240)/3. But if 220≤s≤240, x₂* will be less than 240 and the minimum f₂(s,x₂) will be within 220≤ s≤240 occurs at x₂*=240. The summary of the results for n=2 is given in the following table:

For n=1 (summer) we have:

The value of f₂^*(x₁) has two different values in the interval 220≤x₁≤240 and 240≤x₁≤250, so f₁(s,x₁) can be written as follows:

To find the optimal value of f₁(s,x₁), first assume that x₁≤240, the derivative of f₁(s,x₁) in terms of x₁ is as follows:

Considering that s=255, then x₁^*=240.9, so for all values of x₁≤240 we have:

Therefore, the minimum occurs in the interval x₁≤240 for x₁=240.

If 240≤x₁≤255 then:

And

Because at this stage, s=255, so in the interval 240≤x₁≤255, minimum f₁(s,x₁) is created for x₁=247.5, which is less than the minimum value of the objective function in the interval 220≤x₁≤240, so we have:

The summary of the n=1 results is given in the following table:

Therefore, the optimal policy is as follows and the total cost is 185,000.

Example

Solve the following linear programming using the dynamic programming method.

Solution:

n (the number of decision variables) is equal to 2.

(R₁,R₂,R₃) is the state vector where R_i is the residual value of source i.

x_i is the value of the i-th variable.

For n=2 , the value of function f₂(R₁,R₂,R₃,x₂) is equal to 5x₂ where the values of x₂ must satisfy the constraints of 2x₂≤R₂ and 2x₂≤R₃. Therefore, to calculate x₂^*, the following model must be solved:

The optimal solution of the above model is as follows:

For n=1

Because resources are not used in the first stage, it is R₁=4, R₂=12, R₃=18. Therefore, f₁^*becomes as follows.

notice that:

As (A) and (B), we have:

Considering that in x₁=2, both terms x₁-4.5, 3x₁+30 are maximized, so x₁^*=2 becomes. Then we have (R₁,R₂,R₃)=(4-2,12-0,18-3*2)=(2,12,12) and x₂^*=6 and the total profit will be equal to 36.

Example

The owner of a chain store bought 5 boxes of strawberries to sell in three of his branches. The amount of strawberry sales in these three branches is unknown. Therefore, the store owner wants to allocate 5 boxes to three branches in such a way that the mathematical expectation of the total profit is maximized. It is possible that a store does not have strawberries. The following table shows the mathematical expectation of profit according to the number of strawberry boxes allocated.

Using dynamic programming, determine how to allocate these 5 boxes to three branches so that the mathematical expectation of total profit is maximized.

Solution:

x_n: number of strawberry boxes assigned to branch n.

p_n(x_n): Expected sales profit from allocating x_n boxes to branch n

s_n: Number of remaining boxes to allocate to branches

The number of stage is equal to 3.

s₃=0: In this case, there is no box left for allocation to branch 3, and therefore no profit can be expected.

s₃=1: In this case, one box is left and a profit of 4 units is earned from the allocation of this box.

x₂=0,s₂=0: In this case, there is no box left to be allocated to branch 2, and therefore, no profit is earned in branch 2 and branch 3.

x₂>0,s₂=0: Since there is no box left, it is not possible to assign the box to branch 2 and branch 3, and therefore it is indicated by -.

x₂=1, s₂=1: In this case, one box is left for allocation, and this box is allocated to branch 2, and there is no box left for branch 3, so the resulting profit is equal to 0+6.

x₂=2, s₂=3: In this case, there are 3 boxes left for allocation, which are two boxes for branch 2 and one box for branch 3, so the profit is equal to 4+11=15.

x₂=3, s₂=5: In this case, five boxes are left for allocation, three boxes are allocated to branch 2 and the remaining two boxes are allocated to branch 3. Therefore, the profit is equal to 9+15=24.

This problem has two optimal solutions as follows:

Example:

A student has 7 days a week to prepare for 4 exams. It takes at least one day to prepare each lesson. This student has decided not to study more than one lesson in a day to focus more. Therefore, he can spend one, two, three, or four days for each lesson. How can he prepare himself for the exam of 4 subjects by using dynamic programming. The score of this student for each subject is predicted as follows according to the number of days he studies:

Solution:

x_n: equal to the number of days studied for the n-th course

p_n(x_n): the grade obtained after x_n days of studying the n-th course

s_n: number of days left for study

The number of stage is 4.

s₃=2 and x₃=1: In this case, there are two days left for studying, one day of which is allocated to lesson 3, and the remaining day can be allocated to lesson 4. His expected grade is 4+4=8.

s₃=3 and x₃=1: In this case, there are three days left for studying, one day is used to study the lesson 3 and the remaining two days are allocated to the lesson 4. His expected grade is 4+4=8.

s₃=4 and x₃=2: In this case, there are four days left for studying, two days are used for studying the lesson 3 and the remaining two days are used for the lesson 4. His expected grade is 4+6=10.

s₃=5 and x₃=3: In this case, there are five days left for studying, three days are used for the lesson 3 and the remaining 2 days are for lesson 4. Therefore, his expected grade is equal to 7+4=11

s₂=3 and x₂=1: In this case, there are three days left for studying, one day is considered for the lesson 2. In this case, his expected grade is 5+8=13.

s₂=4 and x₂=2: In this case, there are four days left for studying, and two days are left for the lesson 2, in which case his expected grade is 8+6=14.

Note: Since, the student must allocate at least one day for each lesson, the remaining days for lesson 2 should not be less than three days.

Considering that the lesson 1 is at the beginning of the week, there are seven days left to study.

s₁=7 and x₁=3: In this case, there are seven days left for studying, three days are used for the lesson 1, and the rest are used for the next lessons. then his expected grade is 6+15=21. Based on the above table, the optimal solution is as follows:

Example:

The official of a political party plans the election campaign. For this purpose, he can use the services of five people to work in four areas. If an assistant works in more than one area, his efficiency will decrease, so each assistant is assigned to one area. Also, the areas can remain without assistants. It is estimated that the increase in the number of votes in each area according to the assigned assistants is described in the following table:

Solution:

x_n: the number of active assistants in the n-th field

p_n(x_n): number of votes when the assistant is assigned.

s_n: the number of unassigned assistants up to the nth stage.

The number of stages is 4.

If x₃=0, s₃=1, then there is only one assistant left for areas 3 and 4, which does not belong to area 3 and is assigned only to area 4, and the number of votes is equal to 0+3=3.

If x₃=1, s₃=2, then there are two assistants left for areas 3 and 4, one assistant is assigned to area 3 and one assistant is assigned to area 4. Then the number of votes is equal to 5+3=8.

If x₃=3, s₃=3, then three assistants are left to be assigned to areas 3 and 4, three assistant are assigned to area 3 and no one is assigned to area 4. Then the number of votes is equal to 11+0=11.

If and x₃=2, s₃=5, then five assistants are left to be allocated to areas 3 and 4, two assistants are allocated to area 3 and three assistants are allocated to area 4. Then the number of votes is equal to 12+9=21.

Based on the above tables, the optimal solution is as follows:

Example:

Solve the following linear programming problem using dynamic programming.

Solution:

Consider S=(R₁,R₂) where R_i is the lack of i-th constraint.

The maximum value of f₁(.) occurs at x₁=2 and then f₁^*=50. Thus, the optimal solution is as follows.

Example:

Consider the following problem with initial cost. Solve this problem using dynamic programming.

Solution:

Solution: Consider s=(R₁,R₂) where Ri is the lack of the i-th constraint.

Therefore, the optimal solution is as follows.

Exercise:

Solve the following nonlinear and integer programming problem using dynamic programming.

Solution:

State is the remaining value of the constraint. The number of stage is equal to 2.

The optimal solution is as follows:

Example:

Consider the following integer non-linear programming model. Calculate the optimal value using the dynamic programming method.

Solution:

The state is the remaining value of the constraint and the number of stage is equal to 3. The goal is to find the value of f₁^*(20):

Considering that for all the numbers between 0 and 4, the optimal value and f₃^*(s₃) and x₃^* are the same, so s₃ is shown as an interval of 0 – 4 in the table. If s₃ goes from 5 to 9, then x₃ can be one and the value of f₃^*(s₃) is equal to 20.