Stochastic optimal control to a nonlinear differential game

Basimanebotlhe, Othusitse; Xue, Xiaoping

doi:10.1186/1687-1847-2014-266

Research
Open access
Published: 14 October 2014

Stochastic optimal control to a nonlinear differential game

Othusitse Basimanebotlhe^1,2 &
Xiaoping Xue¹

Advances in Difference Equations volume 2014, Article number: 266 (2014) Cite this article

1519 Accesses
3 Citations
Metrics details

Abstract

The paper studies the optimal control of a nonlinear stochastic differential game of two persons subjected to noisy measurements. The logarithmic transformation to the value function is used in trying to find the solution of the problem. The conversion of a quasilinear partial differential equation to an ordinary linear differential equation is considered. Lastly, the iterative optimal control path estimates for the minimization maximization differential game are attained.

1 Introduction

Control theory is a field of mathematics and engineering used in a wide range of fields and their applications, such as architecture, communications, queueing theory, robotics and in economics as evidenced in [1–3] just to mention a few. Control theory is a subject of much interest in today’s real world. As stated in [4], optimizing a sequence of actions to attain some future goal is the general topic of control theory. Therefore, the objective of optimal control theory is to attain an optimal regulation of the system evolution [5]. Without the indulgence of the noise, the continuous time control problems can be solved in two ways: using Pontryagin Minimum Principle (PMP), which is a pair of ordinary differential equations, or the Hamilton-Jacobi-Bellman (HJB), which is a partial differential equation as in [4]. The addition of differential equations as constraints in the optimization problem leads to the property that in optimal control theory the minimum is no longer represented by one point $x^{*}$ in the state space but by a path or trajectory $x^{*} = {(x_{i}^{*})}_{i = 1, \dots, N}$ , which is known as the optimal trajectory.

In the presence of noise, the PMP formalism has no obvious generalization as mentioned in [6]. However, the inclusion of the noise in the HJB framework is mathematically quite straightforward, while the numerical solution of either deterministic or stochastic HJB equation is difficult due to the curse of dimensionality. A control problem is said to be stochastic when it is subjected to some disturbances or noise terms and time dependent, that is being uncertain of its future state. Under control theory, there lies a topic of interest to this paper which is game theory.

Game theory deals with strategic interactions among several decision makers, known as players. These players have objectives that may be contradicting or non-contradicting. In situations whereby players’ aims are not in contradiction, they are said to be in cooperation, which is enough to regard their aims as of one player, as seen in [7] and [8]. If the players’ decisions are in contradiction, then it is widely known as non-cooperative game theory that is even more interesting as each player may have to be analyzed individually or in distinct groups basing on the similarity of the objectives. Non-cooperative game theory has been discussed by several authors [9–12].

The nonlinear stochastic optimal control theory is one of the optimization fundamentals with plethora of applications in several domains, see [13] and [14]. The main difficulty associated with stochastic minimax dynamic games studied here is the presence of the noise in the dynamical constraints and the solution to a nonlinear second order Hamilton Jacobi Bellman equation (HJB) as mentioned in [4].

In this paper a nonlinear stochastic problem is considered as a two-person zero sum game. One player tries to attain the reward at minimal costs, while the other player tries to maximize the costs of the other player, knowing that it would be advantageous to him. The problem is modeled as a dynamical system with nonlinear stochastic constraints, and our idea is to find the saddle point of the game after a certain time of play. The two controls act as opponents, that is, the control u is a stabilizer or minimizer, while v is the destabilizing or maximizing control variable. A different approach to finding the solution of the problem deployed through certain conditions assumed in this paper is similar to that of [15].

2 Problem formulation

Let $(Ω, F, P, F_{t})$ be a complete probability space, and let time interval $[0, T]$ with $0 < T < \infty$ be given. Assume that on this space an n-dimensional Brownian motion ${W (t), F_{t}}_{t \in [0, T]}$ is defined with ${F_{t}}_{t \in [0, T]}$ as the Brownian filtration, Guo et al. [16]. The expectation under the probability measure ℙ will be denoted by $E^{t}$ , while ℱ is a σ-algebra of the subsets of Ω.

Definition 2.1 (Brownian motion)

Brownian motion ${W (t)}$ is the stochastic process with the following properties [17].

(i)
At any time $s < t$ , the increment $W (t) - W (s)$ is Gaussian with mean zero and variance $E^{t} [{(W (t) - W (s))}^{2}] = t - s$ ; moreover, the increments associated with disjoint intervals are independent.
(ii)
Its sample path is continuous, that is, the function $t \to W (t)$ is almost surely continuous.
(iii)
Its initial state and time is zero, that is, $W (0) = 0$ .

Let ${x (t, ω)}_{t \in [0, T]}$ denote a stochastic process, then ${x (t, ω)}_{t \in [0, T]}$ is said to be $F_{t}$ adapted whenever $F_{t}$ is known at time t.

Consider the evolution of the system given by a nonlinear stochastic differential game of two persons,

\begin{aligned} d x (t, ω) = g (x (t, ω), u (t, ω), v (t, ω), t) d t + σ (x (t, ω)) d W (t) for t \in [0, T], \\ x (0, ω) = x_{0} (ω) . \end{aligned}

(1)

Here, the volatility coefficient of the noise term σ is only dependent on the state trajectory x, similarly as addressed by [18]. The function g is dependent on the controls or policies, the state trajectory and the time t. The value $x_{0} (ω)$ denotes the initial random state. The problem is formulated as a differential game with two opponents as in Theodorou [15] with

\begin{matrix} min_{u} max_{v} J (t, x (t, ω), u (x (t, ω)), v (x (t, ω))) \\ = min_{u} max_{v} E^{t} {ϕ (x (T, ω)) + \int_{0}^{T} e^{- β t} L (t, x (t, ω), u (x (t, ω)), v (x (t, ω))) d t} \end{matrix}

(2)

with

\begin{matrix} L (t, x (t, ω), u (x (t, ω)), v (x (t, ω))) \\ = q (x (t, ω)) + \frac{1}{2} u^{T} (x (t, ω)) R (t, ω) u (x (t, ω)) \\ - v^{T} (x (t, ω)) S (t, ω) v (x (t, ω)), \end{matrix}

(3)

under stochastic nonlinear dynamical constraints,

\begin{array}{rcl} d x (t, ω) & = & (f (x (t, ω)) + G (ω) u (x (t, ω)) + H (ω) v (x (t, ω))) d t \\ + σ (t, x (t, ω)) d W (t), \end{array}

(4)

where

(i)
the function $L : [0, T] \times \times R^{n} \times E^{p} \times E^{q} \to R$ is known as the immediate cost to go, while $ϕ : R^{n} \to R$ is the terminal cost;
(ii)
$x (t, ω) \in L_{2} (Ω, F, P)$ is the n-dimensional random state vector for $t \geq 0$ with $ω \in Ω$ as the supporting set of a complete probability measure space ℙ;
(iii)
$u (x (t, ω)) \in E^{p}$ and $v (x (t, ω)) \in E^{q}$ are the p-dimensional and q-dimensional random control variables of player one and player two defined respectively in the two metric spaces;
(iv)
the symmetric and positive definite random time varying matrices $R (t, ω)$ and $S (t, ω)$ are matrices associated with the controls of $u (x (t, ω))$ and $v (x (t, ω))$ , respectively;
(v)
$β > 0$ is the discounting factor of the value function.

Definition 2.2 (Admissible control)

From [18] a control ${u (t), F_{t}}_{t \in [0, T]}$ is said to be admissible if

(i)
for every $(t, x)$ the system of SDEs in (1) with initial condition $x (0, ω) = x_{0} (ω)$ admits a pathwise unique strong solution;
(ii)
there exists some function $ϕ : R^{n} \to U$ of class $C^{1, 2}$ such that u is in relative feedback to ϕ, that is, $u (t) = ϕ (x (t))$ for every $t \in [0, T]$ .

The two controls $u (x (t, ω))$ and $v (x (t, ω))$ are referred to as the stabilizing and the destabilizing controllers, respectively. The stabilizing controller minimizes the cost function, while the destabilizing controller tries to maximize the cost function.

Considering the nonlinear stochastic dynamical constraints, let

(i)
$f : [0, T] \times R^{n} \to R$ and $σ : [0, T] \times R^{n} \to R^{n \times m}$ be bounded and continuous functions;
(ii)
$G (ω)$ and $H (ω)$ be $n \times p$ and $n \times q$ random matrices.

3 Conditions

The following standard regularity conditions are made as in [15] and [19] throughout the paper.

(i)
The functions $f (x)$ , $L (t, x, u, v)$ are continuously differentiable in $(t, x, u, v) \in [0, T] \times R^{n} \times E^{p} \times E^{q}$ and ϕ is twice differentiable in $x \in R^{n}$ .
(ii)
J and ℒ are nonnegative functions.
(iii)
$V (x)$ , $\nabla_{x} V$ , ℒ, and ϕ are bounded, where V is defined as the value of the game that each control tries to optimize from.
(iv)
The controls are bounded in $R^{m}$ and given in the spaces
$\begin{matrix} L_{F}^{2} : = {u : {∥ u (x (t, ω)) ∥}_{2}} < \infty, \\ L_{F}^{\infty} : = {u : {∥ u (x (t, ω)) ∥}_{\infty}} < \infty, \end{matrix}$

thus

\begin{matrix} {∥ u (x (t, ω)) ∥}_{2} = E^{t} (\int_{0}^{T} | u (x (t, ω)) | d t), \\ {∥ v (x (t, ω)) ∥}_{\infty} = E^{t} {sup | v (x (t, ω)) | : t \in [0, T]} \end{matrix}

for v continuous. The same conditions for u under (iv) are applicable to v as well.

4 Approach to the solution of stochastic optimal controls

Our approach in finding the optimal control is based on the definition of the saddle point given below as in [20] with slight changes to suit our problem.

Definition 4.1

(i)
If the pair $(u^{*} (x (t, ω)), v^{*} (x (t, ω))) \in U_{1} \times U_{2}$ is optimal, then there exists a saddle point of the game over the interval $[0, T]$ with respect to $x (t, ω) \in R^{n}$ , if
$\begin{matrix} J (t, x (t, ω), u^{*} (x (t, ω)), v (x (t, ω))) \\ \leq J (t, x (t, ω), u^{*} (x (t, ω)), v^{*} (x (t, ω))) \\ \leq J (t, x (t, ω), u (x (t, ω)), v^{*} (x (t, ω))) \end{matrix}$

for all $u (x (t, ω)) \in U_{1}$ and $v (x (t, ω)) \in U_{2}$ , where $U_{1}$ and $U_{2}$ are nonempty sets of admissible controls.

(ii)
The upper value of the game at any path $x (t, ω)$ and time $t \in [0, T]$ is defined by
$V^{*} (x (t, ω)) = inf_{u \in U_{1}} sup_{v \in U_{2}} J (t, x (t, ω), u (x (t, ω)), v (x (t, ω))),$

and the lower value of the game is

V_{*} (x (t, ω)) = sup_{v \in U_{2}} inf_{u \in U_{1}} J (t, x (t, ω), u (x (t, ω)), v (x (t, ω)))

and if

V^{*} (x (t, ω)) = V_{*} (x (t, ω)) \equiv V (x (t, ω)) .

The objective is to find the optimal admissible controls, $u^{*} (x (t, ω)) \in U_{1}$ and $v^{*} (x (t, ω)) \in U_{2}$ , such that $V (x (t, ω), t)$ satisfies Definition 4.1 for $U_{1} \subset E^{p}$ and $U_{2} \subset E^{q}$ .

Theorem 4.1 (Bellman principle of optimality)

If $u^{*} (x (t, ω))$ is optimal over the interval $[0, T]$ starting at an initial state $x_{0} (ω)$ , then $u^{*} (x (t, ω))$ is necessarily optimal over the subinterval $[t, t + d t]$ for any dt such that $T - t \geq d t > 0$ .

For the proof of the above theorem, refer to [15].

Applying Theorem 4.1 and Definition 4.1 to the value of the game $V (x (t, ω))$ , we have that

\begin{array}{rcl} V (x (t, ω)) & = & min_{u} max_{v} E^{t} {ϕ (x (T, ω)) \\ + \int_{t}^{t + d t} e^{- β τ} L (τ, x (τ, ω), u (x (τ, ω)), v (x (τ, ω))) d τ} \\ = & E^{t} {L (t, x (t, ω), u^{*} (x (t, ω)), v^{*} (x (t, ω))) d t + e^{- β d t} V (x (t + d t, ω))} \\ = & L (t, x (t, ω), u^{*} (x (t, ω)), v^{*} (x (t, ω))) d t \\ + t (1 - β d t) E^{t} [V (x (t + d t, ω))] . \end{array}

(5)

We need to calculate the expectation of the function $V (x (t + d t, ω))$ . Approximating the function $V (x (\cdot, ω))$ using Taylor’s formula, we have

\begin{array}{rcl} V (x (t + d t, ω)) & = & V (x (t, ω)) + V^{'} (x (t, ω)) [x (t + d t, ω) - x (t, ω)] \\ + \frac{1}{2} V^{″} (x (t, ω)) {[x (t + d t, ω) - x (t, ω)]}^{2} + \dots . \end{array}

Ignoring the terms of higher powers and letting $d x (t, ω) = x (t + d t, ω) - x (t, ω)$ , we get

V (x (t + d t, ω)) = V (x (t, ω)) + V^{'} (x (t, ω)) [d x (t, ω)] + \frac{1}{2} V^{″} (x (t, ω)) {[d x (t, ω)]}^{2} .

(6)

Substituting the stochastic equation (4) into equation (6) and using the properties of Ito’s lemma, we give the function $V (x (t + d t, ω))$ by

\begin{array}{rcl} V (x (t + d t, ω)) & = & V (x (t, ω)) + [(f (x (t, ω)) + G (ω) u (x (t, ω)) \\ + H (ω) v (x (t, ω))) V^{'} (x (t, ω)) \\ + \frac{1}{2} σ^{2} (x (t, ω), t) V^{″} (x (t, ω), t)] d t \\ + σ (x (t, ω), t) V^{'} (x (t, ω)) d W (t) . \end{array}

(7)

Taking the expectation of equation (7), we have

\begin{aligned} E^{t} [V (x (t + d t, ω))] = & V (x (t, ω)) + [(f (x (t, ω)) + G (ω) u (x (t, ω)) \\ + H (ω) v (x (t, ω))) V^{'} (x (t, ω)) \\ + \frac{1}{2} σ^{2} (x (t, ω), t) V^{″} (x (t, ω))] d t . \end{aligned}

(8)

Substituting equation (8) to equation (5) yields

\begin{array}{rcl} β V (x (t, ω)) & = & L (t, x (t, ω), u^{*} (x (t, ω)), v^{*} (x (t, ω))) + [f (x (t, ω)) \\ + G (ω) u (x (t, ω)) + H (ω) v (x (t, ω))] V^{'} (x (t, ω)) \\ + \frac{1}{2} Tr [σ σ^{T} (x (t, ω), t) V^{″} (x (t, ω))] . \end{array}

(9)

The above equation is the Bellman equation similar to the one in [21] which is a parabolic differential equation that has simple solutions for some simple processes and utility functions. In this paper we will adopt the idea of [22] instead of solving the Bellman equation, which is not always easy. From the Bellman equation we can solve for the optimum values $u (x (t, ω)) \in U_{1}$ and $v (x (t, ω)) \in U_{2}$ , by taking the derivative with respect to $u (x (t, ω))$ and $v (x (t, ω))$ ,

\begin{matrix} 0 = L_{u} + G^{T} (ω) V_{x} (x (t, ω), t), \\ R (t, ω) u (x (t, ω)) = - G^{T} (ω) V_{x} (x (t, ω), t), \\ u^{*} (x (t, ω)) = - R^{- 1} (t, ω) G^{T} (ω) V_{x} (x (t, ω), t) . \end{matrix}

(10)

As for the maximizer $v (x (t, ω))$ , we have

\begin{matrix} 0 = L_{v} + H^{T} (ω) V (x (t, ω), t), \\ S (t, ω) v (x (t, ω)) = H^{T} (ω) V_{x} (x (t, ω), t), \\ v^{*} (x (t, ω)) = S^{- 1} (t, ω) H^{T} (ω) V_{x} (x (t, ω), t) . \end{matrix}

(11)

Substituting the values of $u^{*} (x (t, ω))$ and $v^{*} (x (t; ω))$ onto ℒ in equation (9) and collecting the like terms yields the expression

\begin{array}{rcl} β V (x (t, ω)) & = & q (x (t, ω)) \\ - \frac{1}{2} V_{x}^{T} (x (t, ω), t) G (ω) R^{- 1} (t, ω) G^{T} (ω) V_{x} (x (t, ω)) \\ + \frac{1}{2} V_{x}^{T} (x (t, ω), t) H (ω) S^{- 1} (t, ω) H^{T} (ω) V_{x} (x (t, ω)) \\ + V_{x}^{T} (x (t, ω), t) f (x (t, ω)) \\ + \frac{1}{2} Tr [V_{x x} (x (t, ω)) σ (x (t, ω)) σ^{T} (x (t, ω))] . \end{array}

(12)

Equation (12) is a nonlinear second order partial differential equation (PDE), and its solution is a bit challenging as it is nonlinear and in high dimensions. As assumed in [13], there is a connection between the controls and the variance of the Brownian noise. Considering the difference in our control weights, we have the following cases:

(i)
$H S^{- 1} H^{T} - G R^{- 1} G^{T} < 0$ implies that more weight is on the minimizing control than on the maximizing control variable.
(ii)
$H S^{- 1} H^{T} - G R^{- 1} G^{T} > 0$ implies more weight on the maximizing control than on the minimizing control variable.
(iii)
$H S^{- 1} H^{T} - G R^{- 1} G^{T} = 0$ , the weights of the controls are equivalent, hence it is an ideal situation for a minimax optimal control.

The intuition we get from [13] is that the higher the variance, the lower the weight of the controls, hence ‘cheap’ controls and vice versa. In our case we want to strike a deal such that both players attain their optimums. The variance of the Brownian noise here is given by $σ σ^{T} > 0$ , therefore we want to attain a situation whereby $λ (t) [G R^{- 1} G^{T} - H S^{- 1} H^{T}] = σ σ^{T}$ for all $x \in R^{n}$ and $t \in [0, T]$ , where the difference of the control coefficients will be the same as the variance of the noise. Our assumption on the balancing parameter is different from the one suggested by other authors, as in [13] and [15], where the balancing term is just a constant parameter. In our case, the balancing variable $λ (t)$ is dependent on t such that at any time instant the equality sign is attained as the variance terms differing with time.

Suppose that

V (x (t, ω)) = - λ (t) log Φ (x (t, ω)) .

(13)

We determine all the partial derivatives of the new value function given in equation (13),

V_{x} (x (t, ω)) = - λ (t) \frac{1}{Φ (x (t, ω))} Φ_{x} (x (t, ω))

(14)

and

V_{x x} (x (t, ω)) = - λ (t) \frac{Φ_{x x} (x (t, ω)) Φ (x (t, ω)) - Φ_{x} (x (t, ω)) Φ_{x} (x (t, ω))}{Φ (x (t, ω)) Φ^{T} (x (t, ω))} .

(15)

Therefore substituting (13), (14), (15) and taking into consideration the assumption that $λ (t) [G R^{- 1} G^{T} - H S^{- 1} H^{T}] = σ σ^{T}$ for all $t \in [0, T]$ to the nonlinear PDE given in (12), we have

\begin{array}{rcl} β Φ (x (t, ω)) log Φ (x (t, ω)) & = & - \frac{1}{λ (t)} Φ (x (t, ω)) q (x (t, ω)) \\ + Φ_{x}^{T} (x (t, ω)) f (x (t, ω)) \\ + \frac{1}{2} Tr [Φ_{x x} (x (t, ω)) σ (x (t, ω)) σ^{T} (x (t, ω))], \end{array}

(16)

which yields a second order quasilinear PDE with the boundary condition given as

Φ (x (T, ω)) = exp (- \frac{1}{λ (T)} ϕ (x (T, ω))) .

(17)

If the solution $Φ (x (t, ω))$ is found to exist for equation (16), then we have the results given below.

Theorem 4.2 If $Φ (x (t, ω))$ satisfies equation (16), then the transformed control optimums are given as

u^{*} (x (t, ω)) = λ (t) R^{- 1} (t, ω) G^{T} (ω) \frac{Φ_{x} (x (t, ω))}{Φ (x (t, ω))}

and

v^{*} (x (t, ω)) = - λ (t) S^{- 1} (t, ω) H^{T} (ω) \frac{Φ_{x} (x (t, ω))}{Φ (x (t, ω))}

for the value

V (x (t, ω)) = λ (t) log Φ (x (t, ω)),

where $λ (t)$ satisfies

λ (t) [G R^{- 1} G^{T} - H S^{- 1} H^{T}] = σ σ^{T}, \forall x \in R^{n}, t \in [0, T] .

One would observe that $u^{*} (x (t, ω))$ is now positive while $v^{*} (x (t, ω))$ , this is so because the problem has been transformed from minimax to maxmin problem. The PDE in (16) is found to be a bit difficult to solve in terms of dependence variables x and t, therefore in this paper we resort to transforming the above PDE to an ODE for which, in most cases, a solution can be obtained. Consider a one-dimensional problem for this case, thus $n = 1$ and fix t, then the equation becomes more dependent on x. This leads to a nonlinear ODE, and before solving the nonlinear ODE, we have the following assumptions.

(A:1)

(i)
$Φ (x (t, ω), t)$ , $f (x (t, ω))$ and $q (x (t, ω))$ are nonnegative functions.
(ii)
$Φ (x (t, ω), t)$ is Lipschitz continuous for all $(t, x) \in ([0, T] \times R)$ and $ω \in Ω$ .
(iii)
$f (x (t, ω))$ and $q (x (t, ω))$ are also continuous functions and bounded functions for all $x \in R$ .

Let

σ (x (t, ω)) σ^{T} (x (t, ω)) = θ (x (t, ω)) > 0 .

Multiplying throughout by $θ^{- 1} (x (t, ω))$ , we have

\begin{array}{rcl} [\frac{d^{2} Φ (x (t, ω), t)}{d x^{2}} + \tilde{f} (x) \frac{d Φ (x (t, ω), t)}{d x}] & = & \frac{2}{λ} \tilde{q} (x) Φ (x (t, ω), t) \\ + r (x) Φ (x (t, ω), t) log Φ (x (t, ω), t), \end{array}

(18)

where

\begin{matrix} \tilde{f} (x (t, ω)) = 2 f (x (t, ω)) θ^{- 1} (x (t, ω)), \\ \tilde{q} (x (t, ω)) = 2 q (x (t, ω)) θ^{- 1} (x (t, ω)) \end{matrix}

and

r (x (t, ω)) = β θ^{- 1} (x (t, ω)) .

For transformation and simplicity purposes, we would represent the following functions as $U = Φ (x (t, ω), t)$ and $V = \frac{d Φ (x (t, ω), t)}{d x}$ .

This yields the following first order ODE:

{\begin{cases} \dot{U} = V, \\ \dot{V} = - \tilde{f} (x) V + \frac{2}{λ} \tilde{q} (x) U + r (x) U log U, \end{cases}

(19)

which gives the equation

F (x, \dot{U}, \dot{V}) = (\begin{array}{c} V \\ - \tilde{f} (x) V + \frac{2}{λ} \tilde{q} (x) U + r (x) U log U \end{array}) .

(20)

Given the following conditions:

(A:2)

(i)
$U = U_{1} \times U_{2} \in R^{m}$ is a compact and bounded set.
(ii)
$I \in R$ is bounded.
(iii)
$H = [a, b]$ , $a > 0$ and $b > 0$
$F : U \times I \times H \to R^{2} ({∥ \cdot ∥}_{ℓ_{1}} norm) .$

By the Lipschitz condition in (A:1), we have

\begin{array}{rcl} | F (x, U, V) - F (x, \tilde{U}, \tilde{V}) | & \leq & | U - \tilde{U} | + | \tilde{f} (x) | | V - \tilde{V} | \\ + \frac{2}{| λ |} | \tilde{q} (x) | | U - \tilde{U} | + | \tilde{r} (x) | | U log U - \tilde{U} log \tilde{U} |, \end{array}

(21)

we know that

\begin{array}{rcl} | U log U - \tilde{U} log \tilde{U} | & = & (log (ξ) + 1) | U - \tilde{U} | for ξ \in [a, b] \\ \leq & max_{ξ \in [a, b]} (1 + log ξ) | U - \tilde{U} | . \end{array}

(22)

For the equation

X = (\begin{array}{c} U \\ V \end{array}) .

Therefore,

{\begin{cases} \dot{X} = F (x, X), \\ X (x_{0} (ω)) = X_{0} (ω) \in I_{0} \times H_{0} . \end{cases}

(23)

Hence the solution has been found to exist, with the terminal condition given by

Φ (x_{0} (ω), t) = exp (- \frac{1}{λ (t)} ϕ (x_{0} (ω))) .

(24)

In summary we have the following results.

Theorem 4.3 Consider a special case for the equation

\begin{array}{rcl} β Φ (x (t, ω)) log Φ (x (t, ω)) & = & - \frac{1}{λ (t)} Φ (x (t, ω)) q (x (t, ω)) \\ + Φ_{x}^{T} (x (t, ω)) f (x (t, ω)) \\ + \frac{1}{2} Tr [Φ_{x x} (x (t, ω)) σ (x (t, ω)) σ^{T} (x (t, ω))] \end{array}

for a one-dimensional problem and for

Φ (x (t, ω), t) = Φ (x (t, ω)) .

Then, assuming that (A:1) and (A:2) hold, at least one solution has been found to exist.

The solution in (23) is not necessarily unique, and to attain uniqueness, more boundary conditions to the ODE must be given. For a one-dimensional problem at least one solution has been found to exist, and for $n \geq 2$ the equation is a PDE which is difficult to solve.

4.1 Iterative optimal control estimates

From Theorem 4.3, consider the estimated value function to be given as

\begin{array}{rcl} Φ (x (t_{j + 1}, ω)) & = & \int_{\partial ϒ} ρ (ϒ | x_{j}) exp (- \frac{1}{λ (T)} ϕ (x (T, ω))) \\ \times exp (- \int_{t_{j}}^{t_{N - 1}} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) d t) d ϒ, \end{array}

(25)

where

ϒ = (x (t_{j}, ω), x (t_{j + 1}, ω), \dots, x (t_{N - 1}, ω))

and

d ϒ = (d x (t_{j}, ω) d x (t_{j + 1}, ω) \dots d x (t_{N - 1}, ω)) .

The expectation of the value function is driven by stochastic differential equation (19). The function $ρ (ϒ | x_{j})$ in equation (25) is the probability density function of the transitions, and the function ${\tilde{Q}}_{j}$ will be defined later.

Certainly, we cannot surely know future paths and the future control values due to the presence of the noise to the problem. This does not mean we have to give up since future paths cannot be certainly known, therefore we may estimate future paths, hence future control values, in order to attain optimums as the controls are dependent on the path control.

The continuous time interval is divided into small time intervals to attain small equal discrete paths assuming we are not distorting the trajectory in any way, that is, let

x_{j + 1} (ω) - x_{j} (ω) = x (t_{j + 1}, ω) - x (t_{j}, ω) for all t_{j} \in [ϵ, T - ϵ] for ϵ \to 0 .

Suppose the transition between the paths is given by

\begin{array}{rcl} ρ (ϒ | x_{j}) & = & ρ (x_{N - 1}, \dots, x_{j + 1} | x_{j}) \\ = & \prod_{j = 0}^{N - 1} ρ (x_{j + 1} | x_{j}), \\ x_{j} ’s are identically independent and j = 0 is the initial state . \end{array}

(26)

The above equation is the cumulative probability density function for the sample path from $x_{j}$ to $x_{N - 1}$ . The transitions of the sample paths are Markovian as they are solely dependent on the current path ( $x_{j}$ ) at time $t_{j}$ . Following the work of [15] to the latter, we take the noise term to be Gaussian distributed with mean zero and variance $θ (x) = σ σ^{T}$ as given earlier. Therefore,

ρ (x_{j + 1} | x_{j}) = \frac{exp (- \frac{1}{2} \sum_{j = 0}^{N - 1} {∥ \frac{x_{j + 1} (ω) - x_{j} (ω)}{δ_{j}} - g_{j} (ω) ∥}^{2} δ_{j} θ_{j}^{- 1} (ω))}{\prod_{j = 0}^{N - 1} {((2 π) | θ_{j} (ω) |)}^{\frac{1}{2}}}

(27)

for $δ_{j} = t_{j + 1} - t_{j}$ , which is the change in time t.

Hence we have the following results given as a lemma.

Lemma 4.1 From both Theorem 4.2 and Theorem 4.3, and assuming that the transitions are given by equation (27), we give the iterative optimal controls as

u^{*} (ω) = - R_{j}^{- 1} (ω) G^{T} (ω) \frac{exp (A_{j} (ω) + B_{j} (ω))}{\int_{\partial ϒ} exp (A_{j} (ω) + B_{j} (ω)) d ϒ}

and

v^{*} (ω) = - S_{j}^{- 1} (ω) H^{T} (ω) \frac{exp (A_{j} (ω) + B_{j} (ω))}{\int_{\partial ϒ} exp (A_{j} (ω) + B_{j} (ω)) d ϒ}

for the estimated value function

\begin{array}{rcl} Φ (x (t_{j + 1}, ω)) & = & \int_{\partial ϒ} ρ (ϒ | x_{j}) exp (- \frac{1}{λ (T)} ϕ (x (T, ω))) \\ \times exp (- \sum_{j = 0}^{N - 1} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) d t) d ϒ, \end{array}

where

A_{j} (ω) = - \frac{1}{2} \sum_{j = 0}^{N - 1} {∥ \frac{x_{j + 1} (ω) - x_{j} (ω)}{δ_{j}} - g_{j} (ω) ∥}^{2} δ_{j} θ_{j}^{- 1} (ω)

and

B_{j} (ω) = \sum_{j = 0}^{N - 1} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) θ_{j}^{- 1} (ω) δ_{j} .

Proof From Theorem 4.2, suppose that the solution is given as an estimated iterative value function in equation (25). Consider the discrete paths of the optimal trajectory given as

\begin{array}{rcl} Φ (x (t_{j + 1}, ω)) & = & \int_{\partial ϒ} ρ (ϒ | x_{j}) exp (- \frac{1}{λ (T)} ϕ (x (T, ω))) \\ \times exp (- \int_{t_{j}}^{t_{N - 1}} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) d t) d ϒ . \end{array}

(28)

Now, substituting equation (27) to equation (25), we have

\begin{array}{rcl} Φ (x (t_{j}, ω), t) & = & \int_{\partial ϒ} \frac{exp (- \frac{1}{λ (T)} ϕ (x (T, ω)))}{\prod_{j = 0}^{N - 1} {((2 π) | θ_{j} |)}^{\frac{1}{2}}} \\ \times exp (- \frac{1}{2} \sum_{j = 0}^{N - 1} \frac{1}{λ_{j}} {∥ \frac{x_{j + 1} - x_{j}}{δ_{j}} - g_{j} ∥}^{2} δ_{j} θ_{j}^{- 1}) \\ \times exp (\sum_{j = 0}^{N - 1} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) δ_{j}) d ϒ \end{array}

(29)

for

| \tilde{Q} (x) | = A | r (x) | + (1 + \frac{2}{λ} | \tilde{q} (x) |),

where

C = max_{ξ \in (a, b)} (1 + log ξ) .

We know that

\tilde{f} (x) = 2 f (x) θ^{- 1} (x), \tilde{q} (x) = q (x) θ^{- 1} (x)

and

r (x) = β θ^{- 1} (x) .

Therefore, we have

\begin{array}{rcl} Φ (x (t_{j}, ω)) & = & \int_{\partial ϒ} \frac{exp (- \frac{1}{λ (T)} ϕ (x (T, ω)))}{\prod_{j = 0}^{N - 1} {((2 π) | θ_{j} |)}^{\frac{1}{2}}} \\ \times [exp (- \frac{1}{2} \sum_{j = 0}^{N - 1} \frac{1}{λ_{j}} {∥ \frac{x_{j + 1} - x_{j}}{δ_{j}} - g_{j} ∥}^{2} δ_{j} θ_{j}^{- 1})] \\ \times [exp (\sum_{j = 0}^{N - 1} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) δ_{j})] d ϒ . \end{array}

(30)

Let

\begin{matrix} A_{j} (ω) = - \frac{1}{2} \sum_{j = 0}^{N - 1} {∥ \frac{x_{j + 1} (ω) - x_{j} (ω)}{δ_{j}} - g_{j} (ω) ∥}^{2} δ_{j} θ_{j}^{- 1} (ω), \\ B_{j} (ω) = \sum_{j = 0}^{N - 1} (| {\tilde{f}}_{j} | + | {\tilde{Q}}_{j} |) θ_{j}^{- 1} (ω) δ_{j} \end{matrix}

and

f_{j} (ω) = f (x (t_{j}, ω)) .

Similarly, extending that representation to other functions, we may express the iterative value of the game as

Φ_{j} (ω) = \int_{\partial ϒ} \frac{exp (- \frac{1}{λ (T)} ϕ (x (T, ω)))}{\prod_{j = 0}^{N - 1} {((2 π) | θ_{j} |)}^{\frac{1}{2}}} [exp (A_{j} (ω) + B_{j} (ω))] d ϒ .

(31)

Applying Theorem 4.2, we obtain the optimal iterative control estimates, which completes the proof. □

5 Conclusion and future work

The paper studied the nonlinear stochastic control problem of a zero sum differential game. The problem studied here is similar to the one in [15], the difference is that ours is a minimax problem and the state function is dependent on an additional random variable ω, which was suggested as future work. Equating the control weights to the variance of the Brownian noise played an important role towards the solution. A quasilinear PDE was attained due to the logarithmic transformation of the value function, though the equation was difficult to solve. However, the quasilinear differential equation was converted to an ODE by fixing the time variable t, which showed that at least a solution (saddle point) does exist. The uniqueness of the solution can be attained if more boundary conditions are set to the problem. The problem is path-dependent and Gaussian distributed with mean zero and variance θ. Therefore, the Gaussian distribution has played a vital role in obtaining the control path estimates as given in Lemma 4.1. We are of the view that nonlinear Feynmac-Kac formula can be of great help in attaining the solution for the nonlinear PDE [23]. Lastly, note that for $n \geq 2$ , the problem is a nonlinear PDE, which still remains an open problem according to our understanding.

References

Hellerstein JL, Morrison V, Eilebrecht E: Applying control theory in the real world: experience with building a controller for the .NET thread pool. ACM SIGMETRICS Perform. Eval. Rev. 2010, 37: 38-42. 10.1145/1710115.1710123
Article Google Scholar
Kendrick D: Applications of control theory to macroeconomics. Ann. Econ. Soc. Meas. 1976, 5: 171-190.
Google Scholar
Kilian C: Modern Control Technology. Thomson Delmar Learning, Clifton Park; 2005.
Google Scholar
Kappen, HJ: Stochastic optimal control theory. ICML, Helsinki, Radbound University, Nijmegen, Netherlands (2008)
Google Scholar
Mazliak, L: An introduction to probabilistic methods In stochastic control. Laboratory Probabilities, University of Paris, France (1996)
Google Scholar
Yong J, Zhou X: Stochastic Controls. Hamiltonian Systems and HJB Equations. Springer, Berlin; 1999.
Google Scholar
Vorob’ev NH: Game Theory. Springer, Berlin; 1977.
Book Google Scholar
Tirole J, Fundenberg D: Game Theory. MIT Press, Cambridge; 1991.
Google Scholar
Basar, T: Lecture notes on non-cooperative game theory. University of Illinois, Urbana (2010)
Google Scholar
Ho YC, Bryson AE, Baron S: Differential games and optimal pursuit-evasion strategies. IEEE Trans. Autom. Control 1965, 10: 385-389. 10.1109/TAC.1965.1098197
Article MathSciNet Google Scholar
Yeung DWK: A feedback Nash equilibrium solution for noncooperative innovations in a stochastic differential game framework. Stoch. Anal. Appl. 1991, 9: 195-213. 10.1080/07362999108809234
Article MathSciNet Google Scholar
Browne S: Stochastic differential portfolio games. J. Appl. Probab. 2000, 37: 126-147.
Article MathSciNet Google Scholar
Theodorou, EA, Todorov, E: Stochastic optimal control for nonlinear Markov jump diffusion processes. PhD thesis, University of Southern Carlifornia (2011)
Google Scholar
Fleming WH, Soner HM: Controlled Markov Processes and Viscosity Solutions: Applications of Mathematics. 2nd edition. Springer, New York; 2006.
Google Scholar
Theodorou, EA: Iterative path integral stochastic optimal theory: theory and applications to motor control. Thesis, University of Southern California (2011)
Guo X, Liu J, Zhou XY: A constrained non-linear regular-singular stochastic control problem, with applications. Stoch. Process. Appl. 2004, 109: 167-187. 10.1016/j.spa.2003.09.008
Article MathSciNet Google Scholar
Oksendal BK: Stochastic Differential Equations: An Introduction with Applications. 6th edition. Springer, Berlin; 2003.
Book Google Scholar
Josa-Fombellida R, Rincon-Zapatero JP: New approach to stochastic optimal control. J. Optim. Theory Appl. 2007, 135: 163-177. 10.1007/s10957-007-9262-5
Article MathSciNet Google Scholar
Silva, F: Interior penalty approximation for optimal control problems. Optimality conditions in stochastic optimal control theory. CMAP, Ecole Polytechnique, and INRIA Saclay (2010)
Mou L, Yong J: Two person zero sum linear quadratic stochastic differential games by Hilbert space method. J. Ind. Manag. Optim. 2006, 2(1):93-115.
MathSciNet Google Scholar
Rigobon R: Brownian Motion and Stochastic Calculus Introductory Notes. MIT Press, Cambridge; 2009.
Google Scholar
Chow, GC: Optimal control without solving the Bellman equation. Econometric Research Program. Research Memo No. 364, Princeton University (1992)
Google Scholar
Peng, S, Wang, F: BSDE, path-dependent PDE and nonlinear Feynmac-Kac formula. School of Mathematics, Shandong University, Jinan, China (2011)
Google Scholar

Download references

Acknowledgements

We would like to take this opportunity to thank the unknown referees and the Chinese Scholarship Council for giving us an opportunity to further our studies on this research area. Our sincere acknowledgements are also extended to our supervisor, families, colleagues for moral support and guidance.

Author information

Authors and Affiliations

Department of Mathematics, Harbin Institute of Science and Technology, Nangang District, Harbin, 150001, P.R. China
Othusitse Basimanebotlhe & Xiaoping Xue
Department of Mathematics, University of Botswana, 4775 Notwane Rd., Gaborone, Botswana
Othusitse Basimanebotlhe

Authors

Othusitse Basimanebotlhe
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Othusitse Basimanebotlhe.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

The mathematical derivation, development of theorems and the writing up of the article is done by the first author. The other author helped in mathematical derivations and cross checking of the errors within the paper. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Basimanebotlhe, O., Xue, X. Stochastic optimal control to a nonlinear differential game. Adv Differ Equ 2014, 266 (2014). https://doi.org/10.1186/1687-1847-2014-266

Download citation

Received: 09 April 2014
Accepted: 24 September 2014
Published: 14 October 2014
DOI: https://doi.org/10.1186/1687-1847-2014-266

Stochastic optimal control to a nonlinear differential game

Abstract

1 Introduction

2 Problem formulation

3 Conditions

4 Approach to the solution of stochastic optimal controls

4.1 Iterative optimal control estimates

5 Conclusion and future work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords