pypose.module.LQR¶

class pypose.module.LQR(system, Q, p, T)[source]¶

Linear Quadratic Regulator (LQR) with Dynamic Programming.

Parameters:

system (instance) – The system to be soved by LQR.
Q (Tensor) – The weight matrix of the quadratic term.
p (Tensor) – The weight vector of the first-order term.
T (int) – Time steps of system.

A discrete-time linear system can be described as:

\[\begin{align*} \mathbf{x}_{t+1} &= \mathbf{A}_t\mathbf{x}_t + \mathbf{B}_t\mathbf{u}_t + \mathbf{c}_{1t} \\ \mathbf{y}_t &= \mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t\mathbf{u}_t + \mathbf{c}_{2t} \\ \end{align*} \]

where \(\mathbf{x}\), \(\mathbf{u}\) are the state and input of the linear system; \(\mathbf{y}\) is the observation of the linear system; \(\mathbf{A}\) and \(\mathbf{B}\) are the state matrix and input matrix of the linear system; \(\mathbf{C}\), \(\mathbf{D}\) are the output matrix and observation matrix of the linear system; \(\mathbf{c}_{1}\), \(\mathbf{c}_{2}\) are the constant input and constant output of the linear system. The subscript \(\cdot_{t}\) denotes the time step.

LQR finds the optimal nominal trajectory \(\mathbf{\tau}_{1:T}^*\) = \(\begin{Bmatrix} \mathbf{x}_t, \mathbf{u}_t \end{Bmatrix}_{1:T}\) for the linear system of the optimization problem:

\[\begin{align*} \mathbf{\tau}_{1:T}^* = \mathop{\arg\min}\limits_{\tau_{1:T}} &\sum\limits_t\frac{1}{2} \mathbf{\tau}_t^\top\mathbf{Q}_t\mathbf{\tau}_t + \mathbf{p}_t^\top\mathbf{\tau}_t \\ \mathrm{s.t.} \quad \mathbf{x}_1 &= \mathbf{x}_{\text{init}}, \\ \mathbf{x}_{t+1} &= \mathbf{F}_t\mathbf{\tau}_t + \mathbf{c}_{1t} \\ \end{align*} \]

where \(\mathbf{\tau}_t\) = \(\begin{bmatrix} \mathbf{x}_t \\ \mathbf{u}_t \end{bmatrix}\), \(\mathbf{F}_t\) = \(\begin{bmatrix} \mathbf{A}_t & \mathbf{B}_t \end{bmatrix}\).

One way to solve the LQR problem is to use the dynamic programming, where the process can be summarised as a backward and a forward recursion.

The backward recursion.

For \(t\) = \(T\) to 1:

\[\begin{align*} \mathbf{Q}_t &= \mathbf{Q}_t + \mathbf{F}_t^\top\mathbf{V}_{t+1} \mathbf{F}_t \\ \mathbf{q}_t &= \mathbf{p}_t + \mathbf{F}_t^\top\mathbf{V}_{t+1} \mathbf{c}_{1t} + \mathbf{F}_t^\top\mathbf{v}_{t+1} \\ \mathbf{K}_t &= -\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}^{-1} \mathbf{Q}_{\mathbf{u}_t, \mathbf{x}_t} \\ \mathbf{k}_t &= -\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}^{-1} \mathbf{q}_{\mathbf{u}_t} \\ \mathbf{V}_t &= \mathbf{Q}_{\mathbf{x}_t, \mathbf{x}_t} + \mathbf{Q}_{\mathbf{x}_t, \mathbf{u}_t}\mathbf{K}_t + \mathbf{K}_t^\top\mathbf{Q}_{\mathbf{u}_t, \mathbf{x}_t} + \mathbf{K}_t^\top\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t} \mathbf{K}_t \\ \mathbf{v}_t &= \mathbf{q}_{\mathbf{x}_t} + \mathbf{Q}_{\mathbf{x}_t, \mathbf{u}_t}\mathbf{k}_t + \mathbf{K}_t^\top\mathbf{q}_{\mathbf{u}_t} + \mathbf{K}_t^\top\mathbf{Q}_{\mathbf{u}_t, \mathbf{u}_t}\mathbf{k}_t \\ \end{align*} \]

The forward recursion.

For \(t\) = 1 to \(T\):

\[\begin{align*} \mathbf{u}_t &= \mathbf{K}_t\mathbf{x}_t + \mathbf{k}_t \\ \mathbf{x}_{t+1} &= \mathbf{A}_t\mathbf{x}_t + \mathbf{B}_t\mathbf{u}_t + \mathbf{c}_{1t} \\ \end{align*} \]

Then quadratic costs of the system over the time horizon:

\[\mathbf{c} \left( \mathbf{\tau}_t \right) = \frac{1}{2} \mathbf{\tau}_t^\top\mathbf{Q}_t\mathbf{\tau}_t + \mathbf{p}_t^\top\mathbf{\tau}_t \]

For the non-linear system, sometimes people want to solve MPC problem with iterative LQR. A discrete-time non-linear system can be described as:

\[\begin{aligned} \mathbf{x}_{t+1} &= \mathbf{f}(\mathbf{x}_t, \mathbf{u}_t, t_t) \\ \mathbf{y}_{t} &= \mathbf{g}(\mathbf{x}_t, \mathbf{u}_t, t_t) \\ \end{aligned} \]

We can do a linear approximation at current point \(\chi^*=(\mathbf{x}^*, \mathbf{u}^*, t^*)\) along a trajectory with small perturbation \(\chi=(\mathbf{x}^*+\delta\mathbf{x}, \mathbf{u}^* +\delta\mathbf{u}, t^*)\) near \(\chi^*\) for both dynamics and cost:

\[\begin{aligned} \mathbf{f}(\mathbf{x}, \mathbf{u}, t^*) &\approx \mathbf{f}(\mathbf{x}^*, \mathbf{u}^*, t^*) + \left.\frac{\partial \mathbf{f}}{\partial\mathbf{x}} \right|_{\chi^*} \delta \mathbf{x} + \left. \frac{\partial \mathbf{f}} {\partial \mathbf{u}} \right|_{\chi^*} \delta \mathbf{u} \\ &= \mathbf{f}(\mathbf{x}^*, \mathbf{u}^*, t^*) + \mathbf{A} \delta \mathbf{x} + \mathbf{B} \delta \mathbf{u} \\ \delta \mathbf{x}_{t+1} &= \mathbf{A}_t \delta \mathbf{x}_t + \mathbf{B}_t \delta \mathbf{u}_t \\ &= \mathbf{F}_t \delta \mathbf{\tau}_t \\ \mathbf{c} \left( \mathbf{\tau}, t^* \right) &\approx \mathbf{c} \left( \mathbf{\tau}^*, t^* \right) + \frac{1}{2} \delta \mathbf{\tau}^\top\nabla^2_{\mathbf{\tau}}\mathbf{c}\left(\mathbf{\tau}^*, t^* \right) \delta \mathbf{\tau} + \nabla_{\mathbf{\tau}} \mathbf{c} \left( \mathbf{\tau}^*, t^* \right)^\top \delta \mathbf{\tau}\\ \bar{\mathbf{c}} \left( \delta \mathbf{\tau} \right) &= \frac{1}{2} \delta \mathbf{\tau}_t^\top \bar{\mathbf{Q}}_t \delta \mathbf{\tau}_t + \bar{\mathbf{p}}_t^\top \delta \mathbf{\tau}_t \\ \end{aligned} \]

where \(\delta \mathbf{\tau}_t\) = \(\begin{bmatrix} \delta \mathbf{x}_t \\ \delta \mathbf{u}_t \end{bmatrix}\), \(\mathbf{F}_t\) = \(\begin{bmatrix} \mathbf{A}_t & \mathbf{B}_t \end{bmatrix}\), \(\bar{\mathbf{Q}}_t = \mathbf{Q}_t\), \(\bar{\mathbf{p}}_t\) = \(\mathbf{Q}_t \mathbf{\tau}^*_t + \mathbf{p}_t\).

Then, LQR can be performed on a linear quadractic problem with \(\delta \mathbf{\tau}_t\), \(\mathbf{F}_t\), \(\bar{\mathbf{Q}}_t\) and \(\bar{\mathbf{p}}_t\).

The backward recursion.

For \(t\) = \(T\) to 1:

\[\begin{align*} \mathbf{Q}_t &= \bar{\mathbf{Q}}_t + \mathbf{F}_t^\top\mathbf{V}_{t+1} \mathbf{F}_t \\ \mathbf{q}_t &= \bar{\mathbf{p}}_t + \mathbf{F}_t^\top\mathbf{v}_{t+1} \\ \mathbf{K}_t &= -\mathbf{Q}_{\delta \mathbf{u}_t, \delta \mathbf{u}_t}^{-1} \mathbf{Q}_{\delta \mathbf{u}_t, \delta \mathbf{x}_t} \\ \mathbf{k}_t &= -\mathbf{Q}_{\delta \mathbf{u}_t, \delta \mathbf{u}_t}^{-1} \mathbf{q}_{\delta \mathbf{u}_t} \\ \mathbf{V}_t &= \mathbf{Q}_{\delta \mathbf{x}_t, \delta \mathbf{x}_t} + \mathbf{Q}_{\delta \mathbf{x}_t, \delta \mathbf{u}_t}\mathbf{K}_t + \mathbf{K}_t^\top\mathbf{Q}_{\delta \mathbf{u}_t, \delta \mathbf{x}_t} + \mathbf{K}_t^\top\mathbf{Q}_{\delta \mathbf{u}_t, \delta \mathbf{u}_t} \mathbf{K}_t \\ \mathbf{v}_t &= \mathbf{q}_{\delta \mathbf{x}_t} + \mathbf{Q}_{\delta \mathbf{x}_t, \delta \mathbf{u}_t}\mathbf{k}_t + \mathbf{K}_t^\top\mathbf{q}_{\delta \mathbf{u}_t} + \mathbf{K}_t^\top\mathbf{Q}_{\delta \mathbf{u}_t, \delta \mathbf{u}_t}\mathbf{k}_t \\ \end{align*} \]

Note

Because we made a linear approximation, here \(\bar{\mathbf{p}}_t\) leads to a difference with \({\mathbf{q}}_t\) and \({\mathbf{k}}_t\) relative to the linear backward recursion, and this change will be compensated back by \(\mathbf{u}_t^*\) in the forward recursion.

The forward recursion.

For \(t\) = 1 to \(T\):

\[\begin{align*} \delta \mathbf{u}_t &= \mathbf{K}_t \delta \mathbf{x}_t + \mathbf{k}_t \\ \mathbf{u}_t &= \delta \mathbf{u}_t + \mathbf{u}_t^* \\ \mathbf{x}_{t+1} &= \mathbf{f}(\mathbf{x}_t, \mathbf{u}_t) \\ \end{align*} \]

Then quadratic costs of the system over the time horizon:

\[\mathbf{c} \left( \mathbf{\tau}_t \right) = \frac{1}{2} \mathbf{\tau}_t^\top\mathbf{Q}_t\mathbf{\tau}_t + \mathbf{p}_t^\top\mathbf{\tau}_t \]

Note

The discrete-time system to be solved by LQR can be either linear time-invariant (LTI()) system, or linear time-varying (LTV()) system. For non-linear system, one can approximate it as a linear system via Taylor expansion, using iterative LQR algorithm for MPC. Here we provide a unified general format for the implementation.

From the learning perspective, this can be interpreted as a module with unknown parameters \(\begin{Bmatrix} \mathbf{Q}, \mathbf{p}, \mathbf{F}, \mathbf{f} \end{Bmatrix}\), which can be integrated into an end-to-end learning system.

Note

The implementation of LQR is based on page 24-32 of the slides:

Optimal Control and Planning.

The implementation of iterative LQR is based on Eq. (1)~(19) of this paper:

Li Weiwei, and Emanuel Todorov, Iterative linear quadratic regulator design for nonlinear biological movement systems, ICINCO (1), 2004.

Example

>>> torch.manual_seed(0)
>>> n_batch, T = 2, 5
>>> n_state, n_ctrl = 4, 3
>>> n_sc = n_state + n_ctrl
>>> Q = torch.randn(n_batch, T, n_sc, n_sc)
>>> Q = torch.matmul(Q.mT, Q)
>>> p = torch.randn(n_batch, T, n_sc)
>>> r = 0.2 * torch.randn(n_state, n_state)
>>> A = torch.tile(torch.eye(n_state) + r, (n_batch, 1, 1))
>>> B = torch.randn(n_batch, n_state, n_ctrl)
>>> C = torch.tile(torch.eye(n_state), (n_batch, 1, 1))
>>> D = torch.tile(torch.zeros(n_state, n_ctrl), (n_batch, 1, 1))
>>> c1 = torch.tile(torch.randn(n_state), (n_batch, 1))
>>> c2 = torch.tile(torch.zeros(n_state), (n_batch, 1))
>>> x_init = torch.randn(n_batch, n_state)
>>> u_traj = torch.zeros(n_batch, T, n_ctrl, device=device)
>>> lti = pp.module.LTI(A, B, C, D, c1, c2)
>>> dt = 1
>>> LQR = pp.module.LQR(lti, Q, p, T)
>>> x, u, cost = LQR(x_init, dt)
>>> print("x = ", x)
>>> print("u = ", u)
x = tensor([[[-0.2633, -0.3466,  2.3803, -0.0423],
             [ 0.1849, -1.3884,  1.0898, -1.6229],
             [ 1.2138, -0.7161,  0.2954, -0.6819],
             [ 1.4840, -1.1249, -1.0302,  0.9805],
             [-0.3477, -1.7063,  4.6494,  2.6780],
             [ 7.2346,  4.9958, 17.9926, -7.7881]],
            [[-0.9744,  0.4976,  0.0603, -0.5258],
             [-0.6356,  0.0539,  0.7264, -0.5048],
             [-0.2275, -0.1649,  0.3872, -0.4614],
             [ 0.2697, -0.3577,  0.0999, -0.4594],
             [ 0.3916, -2.0832,  0.0701, -0.5407],
             [ 1.0404, -1.3799, -2.0913, -0.1459]]])
u = tensor([[[ 1.0405,  0.1586, -0.1282],
             [-1.4845, -0.5745,  0.2523],
             [-0.6322, -0.3281, -0.3620],
             [-1.6768,  2.4054, -0.1047],
             [-1.7948,  3.5269,  9.0703]],
            [[-0.1795,  0.9153,  1.7066],
             [ 0.0814,  0.4004,  0.7114],
             [ 0.0436,  0.5782,  1.0127],
             [-0.3017, -0.2897,  0.7251],
             [-0.0728,  0.7290, -0.3117]]])

forward(x_init, dt=1, u_traj=None, u_lower=None, u_upper=None, du=None)[source]¶

Performs LQR for the discrete system.

Parameters:

x_init (Tensor) – The initial state of the system.
dt (int) – The interval (\(\delta t\)) between two time steps. Default: 1.
u_traj (Tensor, optinal) – The current inputs of the system along a trajectory. Default: None.
u_lower (Tensor, optinal) – The lower bounds on the controls. Default: None.
u_upper (Tensor, optinal) – The upper bounds on the controls. Default: None.
du (int, optinal) – The amount each component of the controls is allowed to change in each LQR iteration. Default: None.

Returns:

A list of tensors including the solved state sequence \(\mathbf{x}\), the solved input sequence \(\mathbf{u}\), and the associated quadratic costs \(\mathbf{c}\) over the time horizon.

Return type:

List of Tensor

pypose.module.LQR¶

Docs

Tutorials

Get Started