import torch
from .lqr import LQR
from torch import nn
from ..utils.stepper import ReduceToBason
[docs]class MPC(nn.Module):
r'''
Model Predictive Control (MPC) based on iterative LQR.
Args:
system (:obj:`instance`): The system to be soved by MPC.
Q (:obj:`Tensor`): The weight matrix of the quadratic term.
p (:obj:`Tensor`): The weight vector of the first-order term.
T (:obj:`int`): Time steps of system.
stepper (``Planner``, optional): the stepper to stop iterations. If ``None``,
the ``pypose.utils.ReduceToBason`` with a maximum of 10 steps are used.
Default: ``None``.
**Model Predictive Control**, also known as Receding Horizon Control (RHC), uses the
mathematical model of the system to solve a finite, moving horizon, and
closed loop optimal control problem. Thus, the MPC scheme is able to utilize the
information about the current state of the system to predict future states
and control inputs for the system. At each time stamp, MPC solves the
optimization problem on a time horizon :math:`T`:
.. math::
\begin{align*}
\mathop{\arg\min}\limits_{\mathbf{x}_{1:T}, \mathbf{u}_{1:T}} \sum\limits_t
&\mathbf{c}_t (\mathbf{x}_t, \mathbf{u}_t) \\
\mathrm{s.t.}\quad \mathbf{x}_{t+1} &= \mathbf{f}(\mathbf{x}_t, \mathbf{u}_t),\\
\mathbf{x}_1 &= \mathbf{x}_{\text{init}} \\
\end{align*}
where :math:`\mathbf{c}` is the cost function; :math:`\mathbf{x}`, :math:`\mathbf{u}`
are the state and input of the linear system; :math:`\mathbf{f}` is the dynamics
model; :math:`\mathbf{x}_{\text{init}}` is the initial state of the system.
For the linear system with quadratic cost, MPC is equivalent to solving an LQR problem
on each horizon as well as considering the dynamics as constraints. For nonlinear
system, one way to solve the MPC problem is to use iterative LQR, which uses linear
approximations of the dynamics and quadratic approximations of the cost function to
iteratively compute a local optimal solution based on the current states and control
sequences. Then, analytical derivative for the backward can be computed using one
additional forward pass of iLQR for the learning problem, such as learning the
parameters of the dynamic system.
Specifically, the discrete-time non-linear system can be described as:
.. math::
\begin{aligned}
\mathbf{x}_{t+1} &= \mathbf{f}(\mathbf{x}_t, \mathbf{u}_t, t_t) \\
\mathbf{y}_{t} &= \mathbf{g}(\mathbf{x}_t, \mathbf{u}_t, t_t) \\
\end{aligned}
We can do a linear approximation at current point :math:`\chi^*=(\mathbf{x}^*,
\mathbf{u}^*, t^*)` along a trajectory with small perturbation
:math:`\chi=(\mathbf{x}^*+\delta\mathbf{x}, \mathbf{u}^* +\delta\mathbf{u}, t^*)`
near :math:`\chi^*` for both dynamics and cost:
.. math::
\begin{aligned}
\mathbf{f}(\mathbf{x}, \mathbf{u}, t^*) &\approx \mathbf{f}(\mathbf{x}^*,
\mathbf{u}^*, t^*) + \left.\frac{\partial \mathbf{f}}{\partial \mathbf{x}}
\right|_{\chi^*} \delta \mathbf{x} + \left. \frac{\partial \mathbf{f}}
{\partial \mathbf{u}} \right|_{\chi^*} \delta \mathbf{u} \\
&= \mathbf{f}(\mathbf{x}^*, \mathbf{u}^*, t^*) + \mathbf{A} \delta \mathbf{x}
+ \mathbf{B} \delta \mathbf{u} \\
\delta \mathbf{x}_{t+1} &= \mathbf{A}_t \delta \mathbf{x}_t + \mathbf{B}_t
\delta \mathbf{u}_t \\
&= \mathbf{F}_t \delta \mathbf{\tau}_t \\
\mathbf{c} \left( \mathbf{\tau}, t^* \right) &\approx
\mathbf{c} \left(\mathbf{\tau}^*, t^* \right) + \frac{1}{2} \delta
\mathbf{\tau}^\top\nabla^2_{\mathbf{\tau}}\mathbf{c}\left(\mathbf{\tau}^*,
t^* \right) \delta \mathbf{\tau} + \nabla_{\mathbf{\tau}}
\mathbf{c} \left( \mathbf{\tau}^*, t^* \right)^\top \delta \mathbf{\tau}\\
\bar{\mathbf{c}} \left( \delta \mathbf{\tau} \right) &= \frac{1}{2} \delta
\mathbf{\tau}_t^\top \bar{\mathbf{Q}}_t \delta \mathbf{\tau}_t +
\bar{\mathbf{p}}_t^\top \delta \mathbf{\tau}_t \\
\end{aligned}
where :math:`\mathbf{F}_t` = :math:`\begin{bmatrix}
\mathbf{A}_t & \mathbf{B}_t \end{bmatrix}`.
Then, LQR can be performed on a linear quadractic problem with
:math:`\delta \mathbf{\tau}_t`, :math:`\mathbf{F}_t`, :math:`\bar{\mathbf{Q}}_t` and
:math:`\bar{\mathbf{p}}_t`.
- The backward recursion.
For :math:`t` = :math:`T` to 1:
.. math::
\begin{align*}
\mathbf{Q}_t &= \bar{\mathbf{Q}}_t + \mathbf{F}_t^\top\mathbf{V}_{t+1}
\mathbf{F}_t \\
\mathbf{q}_t &= \bar{\mathbf{p}}_t + \mathbf{F}_t^\top\mathbf{v}_{t+1} \\
\mathbf{K}_t &= -\mathbf{Q}_{\delta \mathbf{u}_t,\delta \mathbf{u}_t}^{-1}
\mathbf{Q}_{\delta \mathbf{u}_t,\delta \mathbf{x}_t} \\
\mathbf{k}_t &= -\mathbf{Q}_{\delta \mathbf{u}_t,\delta \mathbf{u}_t}^{-1}
\mathbf{q}_{\delta \mathbf{u}_t} \\
\mathbf{V}_t &= \mathbf{Q}_{\delta \mathbf{x}_t, \delta \mathbf{x}_t}
+ \mathbf{Q}_{\delta \mathbf{x}_t, \delta \mathbf{u}_t}\mathbf{K}_t
+ \mathbf{K}_t^\top\mathbf{Q}_{\delta\mathbf{u}_t,\delta \mathbf{x}_t}
+ \mathbf{K}_t^\top\mathbf{Q}_{\delta\mathbf{u}_t,\delta \mathbf{u}_t}
\mathbf{K}_t \\
\mathbf{v}_t &= \mathbf{q}_{\delta \mathbf{x}_t}
+ \mathbf{Q}_{\delta \mathbf{x}_t, \delta \mathbf{u}_t}\mathbf{k}_t
+ \mathbf{K}_t^\top\mathbf{q}_{\delta \mathbf{u}_t}
+ \mathbf{K}_t^\top\mathbf{Q}_{\delta \mathbf{u}_t,
\delta \mathbf{u}_t}\mathbf{k}_t \\
\end{align*}
- The forward recursion.
For :math:`t` = 1 to :math:`T`:
.. math::
\begin{align*}
\delta \mathbf{u}_t &= \mathbf{K}_t \delta \mathbf{x}_t + \mathbf{k}_t \\
\mathbf{u}_t &= \delta \mathbf{u}_t + \mathbf{u}_t^* \\
\mathbf{x}_{t+1} &= \mathbf{f}(\mathbf{x}_t, \mathbf{u}_t) \\
\end{align*}
Note:
For the details of lqr solver and the iterative LQR, please refer to :meth:`LQR`.
Please note that the linear system with quadratic cost only requires one single
iteration.
Note:
The definition of MPC is cited from this book:
* George Nikolakopoulos, Sina Sharif Mansouri, Christoforos Kanellakis, `Aerial
Robotic Workers: Design, Modeling, Control, Vision and Their Applications
<https://doi.org/10.1016/C2017-0-02260-7>`_, Butterworth-Heinemann, 2023.
The implementation of MPC is based on Eq. (10)~(13) of this paper:
* Amos, Brandon, et al, `Differentiable mpc for end-to-end planning and control
<https://tinyurl.com/3h9tsuyb>`_, Advances in neural information processing
systems, 31, 2018.
Example:
>>> import torch, pypose as pp
>>> class CartPole(pp.module.NLS):
... def __init__(self, dt, length, cartmass, polemass, gravity):
... super().__init__()
... self.tau = dt
... self.length = length
... self.cartmass = cartmass
... self.polemass = polemass
... self.gravity = gravity
... self.poleml = self.polemass * self.length
... self.totalMass = self.cartmass + self.polemass
...
... def state_transition(self, state, input, t=None):
... x, xDot, theta, thetaDot = state.squeeze()
... force = input.squeeze()
... costheta = torch.cos(theta)
... sintheta = torch.sin(theta)
... temp = (force + self.poleml * thetaDot**2 * sintheta) / self.totalMass
... thetaAcc = (self.gravity * sintheta - costheta * temp) /
... (self.length * (4.0 / 3.0
... - self.polemass * costheta**2 / self.totalMass))
... xAcc = temp - self.poleml * thetaAcc * costheta / self.totalMass
... _dstate = torch.stack((xDot, xAcc, thetaDot, thetaAcc))
... return (state.squeeze() + torch.mul(_dstate, self.tau)).unsqueeze(0)
...
... def observation(self, state, input, t=None):
... return state
...
>>> torch.manual_seed(0)
>>> dt, len, m_cart, m_pole, g = 0.01, 1.5, 20, 10, 9.81
>>> n_batch, T = 1, 5
>>> n_state, n_ctrl = 4, 1
>>> n_sc = n_state + n_ctrl
>>> Q = torch.tile(torch.eye(n_state + n_ctrl, device=device), (n_batch, T, 1, 1))
>>> p = torch.randn(n_batch, T, n_sc)
>>> time = torch.arange(0, T, device=device) * dt
>>> current_u = torch.sin(time).unsqueeze(1).unsqueeze(0)
>>> x_init = torch.tensor([[0, 0, torch.pi, 0]])
>>> stepper = pp.utils.ReduceToBason(steps=15, verbose=True)
>>> cartPoleSolver = CartPole(dt, len, m_cart, m_pole, g).to(device)
>>> MPC = pp.module.MPC(cartPoleSolver, Q, p, T, stepper=stepper).to(device)
>>> x, u, cost = MPC(dt, x_init, u_init=current_u)
>>> print("x = ", x)
>>> print("u = ", u)
x = tensor([[[ 0.0000e+00, 0.0000e+00, 3.1416e+00, 0.0000e+00],
[ 0.0000e+00, -3.7711e-04, 3.1416e+00, -1.8856e-04],
[-3.7711e-06, -3.0693e-04, 3.1416e+00, -1.5347e-04],
[-6.8404e-06, -2.4288e-04, 3.1416e+00, -1.2136e-04],
[-9.2692e-06, -2.6634e-04, 3.1416e+00, -1.3293e-04],
[-1.1933e-05, -2.2073e-04, 3.1416e+00, -1.0991e-04]]])
u = tensor([[[-0.8485],
[ 0.1579],
[ 0.1440],
[-0.0530],
[ 0.1023]]])
'''
def __init__(self, system, Q, p, T, stepper=None):
super().__init__()
self.stepper = ReduceToBason(steps=10) if stepper is None else stepper
self.stepper.max_steps -= 1 # n-1 loops, 1 loop with gradient
self.lqr = LQR(system, Q, p, T)
[docs] def forward(self, dt, x_init, u_init=None, u_lower=None, u_upper=None, du=None):
r'''
Performs MPC for the discrete system.
Args:
dt (:obj:`int`): The interval (:math:`\delta t`) between two time steps.
x_init (:obj:`Tensor`): The initial state of the system.
u_init (:obj:`Tensor`, optinal): The current inputs of the system along a
trajectory. Default: ``None``.
u_lower (:obj:`Tensor`, optinal): The lower bounds on the controls.
Default: ``None``.
u_upper (:obj:`Tensor`, optinal): The upper bounds on the controls.
Default: ``None``.
du (:obj:`int`, optinal): The amount each component of the controls
is allowed to change in each LQR iteration. Default: ``None``.
Returns:
List of :obj:`Tensor`: A list of tensors including the solved state sequence
:math:`\mathbf{x}`, the solved input sequence :math:`\mathbf{u}`, and the
associated quadratic costs :math:`\mathbf{c}` over the time horizon.
'''
x, u = None, u_init
best = {'x': x, 'u': u, 'cost': None}
self.stepper.reset()
with torch.no_grad():
while self.stepper.continual():
x, u, cost = self.lqr(x_init, dt, u)
self.stepper.step(cost)
if best['cost'] == None or cost < best['cost']:
best = {'x': x, 'u': u, 'cost': cost}
return self.lqr(x_init, dt, u_traj=best['u'])