pypose.module.EPnP¶
- class pypose.module.EPnP(intrinsics=None, refine=True)[source]¶
Batched EPnP Solver - a non-iterative \(\mathcal{O}(n)\) solution to the Perspective-\(n\)-Point (PnP) problem for \(n \geq 4\).
- Parameters:
intrinsics (
torch.Tensor
, optional) – The camera intrinsics. The shape is (…, 3, 3). Default: Nonerefine (
bool
, optional) – refine the solution with Gaussian-Newton optimizer. Default:True
.
Assume each of the \(n\) points in the world coordinate is \(p^w_i\) and in camera coordinate is \(p^c_i\). They are represented by weighted sums of the four virtual control points, \(c^w_j\) and \(c^c_j\) in the world and camera coordinate, respectively.
\[\begin{aligned} & p^w_i = \sum^4_{j=1}{\alpha_{ij}c^w_j} \\ & p^c_i = \sum^4_{j=1}{\alpha_{ij}c^c_j} \\ & \sum^4_{j=1}{\alpha_{ij}} = 1 \end{aligned} \]Let the projection matrix be \(P = K[R|T]\), where \(K\) is the camera intrinsics, \(R\) is the rotation matrix and \(T\) is the translation vector. Then we have
\[\begin{aligned} s_i p^{\text{img}}_i &= K\sum^4_{j=1}{\alpha_{ij}c^c_j}, \end{aligned} \]where \(p^{\text{img}}_i\) is pixels in homogeneous form \((u_i, v_i, 1)\), \(s_i\) is the scale factor. Let the control point in camera coordinate represented by \(c^c_j = (x^c_j, y^c_j, z^c_j)\). Rearranging the projection equation yields two linear equations for each of the \(n\) points:
\[\begin{aligned} \sum^4_{j=1}{\alpha_{ij}f_xx^c_j + \alpha_{ij}(u_0 - u_i)z^c_j} &= 0 \\ \sum^4_{j=1}{\alpha_{ij}f_yy^c_j + \alpha_{ij}(v_0 - v_i)z^c_j} &= 0 \end{aligned} \]Assume \(\mathbf{x} = \begin{bmatrix} c^{c^T}_1 & c^{c^T}_2 & c^{c^T}_3 & c^{c^T}_4 \end{bmatrix}^T\), then the two equations form a system \(Mx = 0\) considering all of the \(n\) points. Its solution can be expressed as
\[\begin{aligned} x &= \sum^4_{i=1}{\beta_iv_i}, \end{aligned} \]where \(v_i\) is the null vectors of matrix \(M^T M\) corresponding to its least 4 eigenvalues.
The final step involves calculating the coefficients \(\beta_i\). Optionally, the Gauss-Newton algorithm can be used to refine the solution of \(\beta_i\).
Example
>>> import torch, pypose as pp >>> f, (H, W) = 2, (9, 9) # focal length and image height, width >>> intrinsics = torch.tensor([[f, 0, H / 2], ... [0, f, W / 2], ... [0, 0, 1 ]]) >>> object = torch.tensor([[2., 0., 2.], ... [1., 0., 2.], ... [0., 1., 1.], ... [0., 0., 1.], ... [1., 0., 1.], ... [5., 5., 3.]]) >>> pixels = pp.point2pixel(object, intrinsics) >>> pose = pp.SE3([ 0., -8, 0., 0., -0.3827, 0., 0.9239]) >>> points = pose.Inv() @ object ... >>> epnp = pp.module.EPnP(intrinsics) >>> pose = epnp(points, pixels) SE3Type LieTensor: LieTensor([ 3.9816e-05, -8.0000e+00, 5.8174e-05, -3.3186e-06, -3.8271e-01, 3.6321e-06, 9.2387e-01])
Warning
Currently this module only supports batched rectified camera intrinsics, which can be defined in the form:
\[K = \begin{pmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{pmatrix} \]The full form of camera intrinsics will be supported in a future release.
Note
The implementation is based on the paper
Francesc Moreno-Noguer, Vincent Lepetit, and Pascal Fua, EPnP: An Accurate O(n) Solution to the PnP Problem, International Journal of Computer Vision (IJCV), 2009.
- forward(points, pixels, intrinsics=None)[source]¶
- Parameters:
points (
torch.Tensor
) – 3D object points in the world coordinates. Shape (…, N, 3)pixels (
torch.Tensor
) – 2D image points, which are the projection of object points. Shape (…, N, 2)intrinsics (torch.Tensor, optional) – camera intrinsics. Shape (…, 3, 3). Setting it to any non-
None
value will override the default intrinsics kept in the module.
- Returns:
estimated pose (
SE3type
) for the camera.- Return type:
LieTensor