1. Introduction
Inertial measurement units (IMUs) are important sensors in the context of state estimation. A popular approach in robotics is to treat IMU measurements as inputs to a motion model and then to numerically integrate the motion model to form relative motion factors between pairs of estimation times in a process known as preintegration [Reference Lupton and Sukkarieh1–Reference Brossard, Barrau, Chauchat and Bonnabel3]. In this paper, we investigate the treatment of IMUs as a measurement of the state using continuous-time state estimation with a Gaussian process motion prior. We compare treating an IMU as an input versus a measurement on a simple 1D simulation problem. We then test our approach to lidar-inertial odometry using a simulated environment and compare to a baseline that represents the IMU-as-input approach. Finally, we provide experimental results for our lidar-inertial odometry on the Newer College Dataset.
Preintegration was initially devised as a method to avoid having to estimate the state at each measurement time in (sliding-window) batch trajectory estimation. We will show that by employing our continuous-time estimation techniques, we can achieve the same big-O complexity as classic preintegration, which is linear in the number of measurements. The contributions of this work are as follows:
-
• We provide a detailed comparison of treating an IMU as an input to a motion model versus as a measurement of the state on a 1D simulation problem. Such a comparison has not been previously presented in the literature.
-
• We show how to perform preintegration using heterogeneous factors (a combination of binary and unary factors) using continuous-time state estimation. To our knowledge, this has not been shown before in the literature.
-
• We present our novel approach to lidar-inertial odometry using a Singer prior which includes body-centric acceleration in the state. We provide experimental results on the Newer College Dataset and on a custom simulated dataset. On the Newer College Dataset, we demonstrate state-of-the art performance.
2. Related work
Lupton and Sukkarieh [Reference Lupton and Sukkarieh1] were the first to show that a temporal window of inertial measurements could be summarized using a single relative motion factor in a method known as preintegration. Forster et al. [Reference Forster, Carlone, Dellaert and Scaramuzza2] then showed how to perform preintegration on the manifold $SO(3)$ . Subsequently, Brossard et al. [Reference Brossard, Barrau, Chauchat and Bonnabel3] demonstrated how to perform preintegration on the manifold of extended poses $SE_2(3)$ , showing that this approach captures the uncertainty resulting from IMU measurements more consistently than $SO(3) \times \mathbb{R}^3$ . All of these approaches treat the IMU as an input to a motion model. This approach has a few shortcomings. First, it conflates the IMU measurement noise with the underlying process noise. Second, it is unclear how the state and covariance should be propagated in the absence of IMU measurements. IMU measurement dropout is rare. However, it is worth considering the possibility in safety-critical applications. The third issue is that classic preintegration does not lend itself well to dealing with multiple high-rate sensors such as a lidar and an IMU or multiple asyncrhonous IMUs.
Previous work in continuous-time lidar-only odometry and lidar-inertial odometry include [Reference Droeschel and Behnke4–Reference Lang, Chen, Tang, Ma, Lv, Liu and Zuo9] all of which employed B-splines. In B-spline approaches, exact derivatives of the continuous-time trajectory can be computed allowing for unary factors to be created for each accelerometer and gyroscope measurement, removing the need for preintegration. However, the spacing of control points has a large impact on the smoothness of B-spline trajectories. Determining this spacing is an important engineering challenge in B-spline approaches. This can be avoided by working with Gaussian processes instead. For a detailed comparison between B-splines and Gaussian processes in continuous-time state estimation, we refer the reader to Johnson et al. [Reference Johnson, Mangelson, Barfoot and Beard10].
Recently, Zheng and Zhu [Reference Zheng and Zhu11] demonstrated continuous-time lidar-inertial odometry using Gaussian processes where rotation is decoupled from translation. They use a white-noise-on-jerk (WNOJ) prior for position in a global frame, a white-noise-on-acceleration (WNOA) prior for rotation using a sequence of local Gaussian processes, and a white-noise-on-velocity prior for the IMU biases. Using this approach, they demonstrate competitive performance on the HILTI SLAM benchmark [Reference Helmberger, Morin, Berner, Kumar, Cioffi and Scaramuzza12]. One consequence of estimating position in a global frame is that the power spectral density matrix $\mathbf{Q}$ of the prior must typically be isotropic, whereas a body-centric approach allows for lateral-longitudinal-vertical dimensions to be weighted differently. In addition, as was shown by Brossard et al. [Reference Brossard, Barrau, Chauchat and Bonnabel3], decoupling rotation from translation does not capture the uncertainty resulting from IMU measurements as accurately as keeping them coupled. However, one clear advantage of their approach is that all parts of the state are directly observable by the measurements, whereas in our approach angular acceleration is not directly observable.
In our prior work, we demonstrated continuous-time lidar-only odometry [Reference Burnett, Wu, Yoon, Schoellig and Barfoot13] using a WNOA motion prior. Lidar-only odometry using a WNOJ prior [Reference Tang, Yoon and Barfoot14] and a Singer prior [Reference Wong, Yoon, Schoellig and Barfoot15] have also been previously demonstrated. In this work, we choose to work with the Singer prior, which includes body-centric acceleration in the state. By including acceleration in the state, we can treat gyroscope and accelerometer measurements as measurements of the state rather than as inputs to a motion model.
Another approach based on Gaussian processes is that of Le Gentil and Vidal-Calleja [Reference Gentil, Vidal-Calleja and Huang16, Reference Gentil and Vidal-Calleja17]. They employ six independent Gaussian processes, three for acceleration in a global frame, and three for angular velocity. They optimize for the state at several inducing points given a set of IMU measurements over a preintegration window. They then analytically integrate these Gaussian processes to obtain preintegrated measurements that can be queried at any time of interest.
Prior lidar-inertial odometry methods include [Reference Hu, Zhou, Miao, Yuan and Liu18–Reference Chen, Nemiroff and Lopez21]. For a recent survey and comparison of open-source lidar-only and lidar-inertial odometry approaches, we refer the reader to [Reference Fasiolo, Scalera and Maset22]. For a more detailed literature review of lidar odometry, lidar-inertial odometry, and continuous-time state estimation, we refer the reader to our prior work [Reference Burnett, Schoellig and Barfoot23]. For a recent survey on continuous-time state estimation, we refer the reader to Talbot et al. [Reference Talbot, Nubert, Tuna, Cadena, Dümbgen, Tordesillas, Barfoot and Hutter24]. Extrinsic calibration of an IMU and calibration of the temporal offset between the IMU timestamps and the other sensors are both important areas of research. Recent works in this area include [Reference Li, Nie, Guo, Yang and Mei25, Reference Ferguson, Ertop, Herrell and Webster26]. By working with an existing dataset where extrinsic calibration and temporal synchronization has been taken care of, we can focus on the task of localization.
3. IMU as an input versus a measurement
In this section, we investigate an approach where IMU measurements are treated as direct measurements of the state using a continuous-time motion prior. We compare the performance of these two approaches on a simulated toy problem where we estimate the position and velocity of a 1D robot given noisy measurements of position and acceleration. Position measurements are acquired at a lower rate (10 Hz), while the acceleration measurements are acquired at a higher rate (100 Hz). For both approaches, our goal will be to reduce the number of states being estimated through preintegration.
3.1. IMU as an input
As a baseline, we consider treating IMU measurements as inputs to a discrete motion model:
where $\Delta t_k = t_k - t_{k-1}$ , ${\boldsymbol{\Phi }}(t_k, t_{k-1})$ is the transition function, and $\mathbf{u}_k$ are acceleration measurements. A preintegration window is then defined between two endpoints, $(t_{k-1}, t_k)$ , which includes times $\tau _0, \ldots, \tau _J$ . Following the approach of [Reference Lupton and Sukkarieh1, Reference Forster, Carlone, Dellaert and Scaramuzza2], the preintegrated measurements $\Delta \mathbf{x}_{k, k-1}$ are computed as:
These preintegrated measurements can be used to replace the acceleration measurements with a single relative motion factor between two endpoints:
where
and $\mathbf{Q}_n$ is the covariance of the acceleration input $\mathbf{u}_n \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}_n)$ . Note that in this approach, uncertainty is propagated using the covariance on the acceleration input, $\mathbf{Q}_n$ , which conflates IMU measurement noise and the underlying process noise. If the acceleration measurements were to drop out suddenly, it is unclear how the state and covariance should be propagated using this approach. It can be shown that preintegration is mathematically equivalent to marginalizing out the states between $\mathbf{x}_{k-1}$ and $\mathbf{x}_k$ . The overall objective function that we seek to minimize is then
where
are the measurement factors, and $\mathbf{R}_k$ is the associated measurement covariance.
3.2. Continuous-time state estimation
In order to incorporate potentially asynchronous position and acceleration measurements, we propose to carry out continuous-time trajectory estimation as exactly sparse Gaussian process regression [Reference Anderson, Barfoot, Tong and Särkkä27–Reference Barfoot29]. We consider systems with a Gaussian process (GP) prior and a linear measurement model:
where $\mathbf{x}(t)$ is the state, $\check{\mathbf{x}}(t)$ is the mean function, $\check{\mathbf{P}}(t, t^\prime )$ is the covariance function, and $\mathbf{y}_k$ are measurements corrupted by zero-mean Gaussian noise $\mathbf{n}_k \sim \mathcal{N}(\mathbf{0}, \mathbf{R}_k)$ . In this section, we restrict our attention to a class of GP priors resulting from linear time-invariant (LTI) stochastic differential equations (SDEs) of the form:
where $\mathbf{w}(t)$ is a white-noise Gaussian process, $\boldsymbol{Q}$ is a power spectral density matrix, and $\mathbf{u}(t)$ is a known exogenous input. The general solution to this differential equation is
where ${\boldsymbol{\Phi }}(t, s) = \exp\! (\mathbf{A}(t - s))$ is the transition function. The mean function is
Over a sequence of estimation times, $t_0 \lt t_1 \lt \cdots \lt t_K$ , the mean function can be written as:
assuming piecewise-constant input $\mathbf{u}_n$ . This can be rewritten in a lifted form as:
where $\mathbf{A}$ is the lifted lower-triangular transition matrix, the inverse of which is
$\mathbf{B} = \textrm{diag}(\mathbf{1}, \boldsymbol{B}_1, \cdots, \boldsymbol{B}_K)$ , and $\mathbf{u} = [\check{\mathbf{x}}_0^T\,\mathbf{u}_1^T\,\cdots \,\mathbf{u}_K^T]^T$ . See [Reference Barfoot29] for further details on the formulations above. The covariance function is then
The covariance can also be rewritten in a lifted form using the same set of estimation times as above:
where $\mathbf{Q} = \textrm{diag}(\check{\mathbf{P}}_0, \mathbf{Q}_1, \cdots, \mathbf{Q}_K)$ , and
Our prior over the entire trajectory can then be written as:
where $\check{\mathbf{P}}$ is the kernel matrix. Note that the inverse kernel matrix $\check{\mathbf{P}}^{-1}$ is block-tridiagonal thanks to the Markovian nature of the state. This sparsity property also holds for linear time-varying (LTV) SDEs, provided that they are also Markovian [Reference Anderson and Barfoot28]. The exact sparsity of $\check{\mathbf{P}}^{-1}$ is what allows us to perform efficient Gaussian process regression. This fact can be observed more easily by inspecting the following linear system of equations:
where the Hessian is on the left-hand side, $\hat{\mathbf{P}}^{-1}$ is block-tridiagonal since $\check{\mathbf{P}}^{-1}$ is block-tridiagonal, and $\mathbf{C}^T \mathbf{R}^{-1} \mathbf{C}$ is block-diagonal. Thus, this linear system of equations can be solved in $O(K)$ time using a sparse Cholesky solver. The exact sparsity of $\check{\mathbf{P}}^{-1}$ also enables us to perform efficient Gaussian process interpolation. The standard GP interpolation formulas are given by:
where
The key to performing efficient interpolation relies on the sparsity of
where
are the only nonzero block columns at indices $k+1$ and $k$ , respectively. Thus, each interpolation query of the posterior trajectory is an $O(1)$ operation.
3.3. A generalization to preintegration
In Section 3.1, we showed how to perform preintegration when considering acceleration measurements as inputs to a motion model following closely from [Reference Lupton and Sukkarieh1, Reference Forster, Carlone, Dellaert and Scaramuzza2]. In this section, we generalize the concept of preintegration to support heterogeneous factors (a combination of binary factors and unary factors). An example factor graph is shown in Figure 1. As a motivating example, consider having measurements of position such as from a GPS and measurements of acceleration coming from an accelerometer. These are unary measurement factors, called such because they only involve a single state. The binary factors here are motion prior factors derived from our Gaussian process motion prior, called binary because they are between two states. In classic preintegration, the only factors that are preintegrated are binary factors. Here, we show that we can simply use the formulas for querying a Gaussian process posterior at the endpoints of the preintegration window to form a preintegration factor that summarizes all the measurements contained therein. First, we consider the joint density of the state at a set of query times $(\tau _0 \lt \tau _1 \lt \cdots \lt \tau _J)$ and the measurements:
We then perform the usual factoring using a Schur complement to obtain the posterior:
where we obtain expressions for $\hat{\mathbf{x}}_\tau$ and $\hat{\mathbf{P}}_{\tau, \tau }$ . We rearrange this further by inserting $\check{\mathbf{P}}^{-1}\check{\mathbf{P}}$ after the first instance of $\check{\mathbf{P}}_\tau$ and by applying the Sherman–Morrison–Woodbury identities to obtain
where we note that $(\check{\mathbf{P}}^{-1} + \mathbf{C}^T\mathbf{R}^{-1}\mathbf{C})$ is block-tridiagonal, and so the product $(\check{\mathbf{P}}^{-1} + \mathbf{C}^T\mathbf{R}^{-1}\mathbf{C})^{-1} \mathbf{C}^T \mathbf{R}^{-1} (\mathbf{y} - \mathbf{C} \check{\mathbf{x}})$ can be evaluated in $O(K)$ time using a sparse Cholesky solver. When we encounter products resembling $\mathbf{A}^{-1}\mathbf{b}$ where $\mathbf{A}$ is block-tridiagonal, we can instead solve $\mathbf{A}\mathbf{z} = \mathbf{b}$ for $\mathbf{z}$ using an efficient solver that takes advantage of the sparsity of $\mathbf{A}$ . Finally, the product $\check{\mathbf{P}}_\tau \check{\mathbf{P}}^{-1}$ is quite sparse, having only two nonzero block columns per block row as shown earlier in (21). It follows that $\hat{\mathbf{x}}_\tau$ and $\hat{\mathbf{P}}_{\tau, \tau }$ can be computed in time that scales linearly with the number of measurements. Now, we consider the case where the query times consist of the beginning and end of a preintegration window, $(t_k, t_{k+1})$ . The queried mean and covariance can be treated as pseudomeasurements summarizing the measurements contained in the preintegration window. We adjust our notation to make it clear that these are now being treated as measurements by using $\tilde{\mathbf{x}}$ instead of $\hat{\mathbf{x}}$ . What we obtain is a joint Gaussian factor for the states at times $t_k$ and $t_{k+1}$ :
A diagram depicting how this affects the resulting factor graph is shown in Figure 2. Using this approach, we can ‘preintegrate’ heterogeneous factors between pairs of states. One clear advantage of this approach is that it offers a tidy method for bookkeeping measurement costs and uncertainties. In the supplementary materials, we provide an alternative formulation using a Schur complement with the same linear time complexity. Indeed, marginalization with a Schur complement is equivalent to the presented marginalization approach using a Gaussian process. However, the Gaussian process still serves a useful purpose in creating motion prior factors. Furthermore, the Gaussian process provides methods for interpolating the posterior.
It is unclear how to extend this marginalization approach to $SE(3)$ due to our choice to approximate $SE(3)$ trajectories using sequences of local Gaussian processes [Reference Anderson and Barfoot28]. It is possible that this marginalization approach could be applied using a global GP formulation such as the one presented by Le Gentil and Vidal-Calleja [Reference Gentil and Vidal-Calleja17]. However, their approach must contend with rotational singularities. We leave the extension to $SE(3)$ as an area of future work. In our implementation of lidar-inertial odometry, we instead use the posterior Gaussian process interpolation formula as in (19) to form continuous-time measurement factors. This is actually an approximation, as it is not exactly the same as marginalization. However, we have found the interpolation approach to work well in practice. Furthermore, the interpolation approach lends itself quite easily to parallelization enabling a highly efficient implementation.
3.4. IMU as a measurement
In our proposed approach, we treat IMU measurements as direct measurements of the state within a continuous-time estimation framework. For the 1D toy problem, we chose to use a Singer prior, which is defined by the following LTI SDE:
where $\mathbf{w}(t)$ is a white-noise Gaussian process and $\mathbf{Q}_c\,=\,2 \boldsymbol{\alpha } \boldsymbol{\sigma }^2$ is the power spectral density matrix [Reference Wong, Yoon, Schoellig and Barfoot30]. By varying $\boldsymbol{\alpha }$ and $\boldsymbol{\sigma }^2$ , the Singer prior can model motion priors ranging from WNOA ( $\boldsymbol{\alpha } \rightarrow \boldsymbol{\infty }, \tilde{\boldsymbol{\sigma }}^2 = \boldsymbol{\alpha }^2 \boldsymbol{\sigma }^2$ ) to WNOJ ( $\boldsymbol{\alpha }{\rightarrow } \mathbf{0}$ ). The discrete-time motion model is given by:
where $\mathbf{w}_k$ is the process noise, $\mathbf{x}_k = [\mathbf{p}_{k}^T\,\dot{\mathbf{p}}_{k}^T\,\ddot{\mathbf{p}}_{k}^T]^T$ , ${\boldsymbol{\Phi }}(t_{k},t_{k-1})$ is the state transition function, and $\mathbf{Q}_k$ is the discrete-time covariance. Expressions for ${\boldsymbol{\Phi }}(t_{k},t_{k-1})$ and $\mathbf{Q}_k$ for the Singer prior are provided by Wong et al. [Reference Wong, Yoon, Schoellig and Barfoot30] and are repeated in the supplementary materials. The binary motion prior factors are given by:
The unary measurement factors have the same form as in (6) except that now accelerations are treated as direct measurements of the state. In addition, the overall objective is the same as in (5). We arrive at the linear system of equations from (18). By default, this approach would require that we estimate the state at each measurement time. In order to reduce the size of the state space, as in Section 3.1, we build preintegrated factors using the approach presented in Section 3.3 in order to bundle together both the unary measurement factors as well as the binary motion prior factors into a single factor. In the exact same fashion as the IMU-as-input approach, we preintegrate between pairs of states associated with the low-rate measurement times.
3.5. Simulation results
In this section, we compare two different approaches to combining high-rate acceleration measurements with low-rate position measurements. We refer to these two different approaches as IMU-as-input and IMU-as-measurement. The comparison is conducted on a toy 1D simulation problem. We sample trajectories from a GP motion prior as defined in (17) by using standard methods for sampling from a multidimensional Gaussian [Reference Barfoot29].
In our first simulation, we sample trajectories from a WNOJ motion prior with $Q_c = 1.0$ . Figure 3 provides a qualitative comparison of the IMU-as-input and IMU-as-measurement approaches for a single sampled simulation trajectory. Note that the performance of the two estimators including their 3- $\sigma$ confidence bounds appears to be nearly identical. Figure 4 depicts 1000 trajectories sampled from this motion prior. In order to simulate noisy sensors, we also corrupt both position and acceleration measurements using zero-mean Gaussian noise where $y_{\textrm{pos},k} = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} \mathbf{x}_k + n_{\textrm{pos},k}$ , $n_{\textrm{pos},k} \sim \mathcal{N}(0, \sigma ^2_{\textrm{pos}})$ , $y_{\textrm{acc},k} = \begin{bmatrix} 0 & 0 & 1 \end{bmatrix} \mathbf{x}_k + n_{\textrm{acc},k}$ , and $n_{\textrm{acc},k} \sim \mathcal{N}(0, \sigma ^2_{\textrm{acc}})$ . In the simulation, we set $\sigma _{\textrm{pos}} = 0.01m$ and $\sigma _{\textrm{acc}} = 0.01m/s^2$ .
Both IMU-as-input and IMU-as-measurement have parameters that need to be tuned on the dataset. For this purpose, we use a separate training set of 100 sampled trajectories. For the IMU-as-input approach, we learn the covariance of the acceleration input $Q_k$ using maximum likelihood over the training set where we have the benefit of using noiseless ground-truth states in simulation. The objective that we seek to minimize is
where $\mathbf{x}_t$ are validation set trajectories, $\check{\mathbf{x}} = \mathbf{A} \mathbf{v}$ is the full trajectory prior for the IMU-as-input method, $\check{\mathbf{P}} = \mathbf{A}\mathbf{Q}\mathbf{A}^T$ is the prior covariance over the whole trajectory, $\mathbf{A}$ is the lifted transition matrix as in (13), $\mathbf{v}\,=\,\begin{bmatrix} \check{\mathbf{x}}_0^T & \Delta \mathbf{x}_{1, 0}^T & \cdots & \Delta \mathbf{x}_{K, K-1}^T \end{bmatrix}^T$ , and $\mathbf{Q}\,=\,\textrm{diag}(\check{\mathbf{P}}_0, \boldsymbol{\Sigma }_1, \cdots, \boldsymbol{\Sigma }_K)$ .
Using a numerical optimizer, we found that $Q_k \approx 0.00338$ given a WNOJ motion prior with $Qc = 1.0$ , and acceleration measurement noise of $\sigma ^2_{\textrm{acc}} = 0.0001m^2/s^4$ . Note that the input covariance $Q_k$ is clearly much greater than the simulated noise on the acceleration measurements $\sigma ^2_{\textrm{acc}}$ . This is because, for the IMU-as-input approach, the covariance on the acceleration input $Q_k$ is conflating two sources of noise, the IMU measurement noise and the underlying process noise. It also means that $Q_k$ has to be trained and adapted to new datasets in order to maintain a consistent estimator.
For the IMU-as-measurement approach, we need to train the parameters of our Gaussian process (GP) motion prior. In order to highlight the versatility of our proposed approach, we employ a Singer motion prior [Reference Wong, Yoon, Schoellig and Barfoot30], which has the capacity to model priors ranging from WNOA to WNOJ. The Singer prior is parametrized by an inverse length scale matrix $\boldsymbol{\alpha }$ and a variance $\boldsymbol{\sigma }^2$ , both of which are diagonal. For values of $\boldsymbol{\alpha }$ close to zero, numerical optimizers encounter difficulties due to numerical instabilities of $\mathbf{Q}_k$ and its Jacobians. Instead, we derive the analytical gradients to learn $\{{\boldsymbol{\alpha }}, \boldsymbol{\sigma }^2\}$ using gradient descent following the approach presented by Wong et al. [Reference Wong, Yoon, Schoellig and Barfoot30]. The objective that we seek to minimize is
where both $\mathbf{e}_{k,t}$ and $\mathbf{Q}_{k,t}$ are functions of the Singer prior parameters $\{{\boldsymbol{\alpha }}, \boldsymbol{\sigma }^2\}$ . Further details on the analytical gradients are provided in the supplementary materials. Note that the approach of Wong et al. [Reference Wong, Yoon, Schoellig and Barfoot30] supports learning the parameters of the Singer prior even with noisy ground truth; however, it requires that we first estimate the measurement covariances and then keep them fixed during the optimization. In order to learn both the GP parameters and the measurement covariances simultaneously, Wong et al. leverage exactly sparse Gaussian variational inference [Reference Wong, Yoon, Schoellig and Barfoot15].
Figure 5 shows the results of our first simulation experiment with the WNOJ prior. Each row in Figure 5 is a box plot of a metric computed independently for each of the 1000 simulated trajectories. The blue boxes represent the interquartile range, the red lines are the medians, the whiskers correspond to the 2.5 and 97.5 percentiles, and the red dots denote outliers (data points beyond the whiskers). The black dashed lines in the first, third, and fourth rows corresponds to the target value that is 0 for the mean error and 1 for the normalized estimation error squared (NEES). Underneath each box plot, we also compute the mean value of the metrics across all data points.
In the first row of Figure 5, we can see that the mean error in both position and velocity is close to zero for both estimators. The gray lines in the first row denote a 95% two-sided confidence interval, a statistical test to check that the estimators are unbiased. We expect to see the whiskers of the box plots lie within the 95% confidence interval in order to confirm that the estimator is unbiased, which is the case.
The third row displays a commonly used method for computing the NEES. This method uses the marginals of the posterior covariance and relies on the ergodic hypothesis in order to treat the error from each timestep as being independent. In this case, we compute the marginal covariance at each timestep for position and velocity only so that the results of the two estimators can be compared directly. The gray lines denote a 95% chi-squared bound, a statistical test for checking that the estimators are consistent. Interestingly, we observe that neither estimator passes the statistical test in this case, even though the mean and median NEES are close to 1. It appears that, in this case, the ergodic hypothesis is not valid.
In the fourth row, we present an alternative formulation of the NEES that uses the full posterior covariance over the entire trajectory. In this case, we are satisfied to find that both estimators pass the statistical test for confirming that they are consistent. The main difference between this version of the NEES and the previous one is that we have retained the cross-covariance terms between timesteps.
In summary, we observe that the two approaches achieve nearly identical performance. Both estimators are unbiased and consistent so long as their parameters are trained on a training set.
Figures 6, 7 depict the results of our second experiment where the ground-truth trajectories are sampled from a Singer prior with $\alpha = 10.0$ , $\sigma ^2 = 1.0$ . This large value of $\alpha$ is intended to approximate a WNOA prior. Our results show that both estimators are capable of adapting to a dataset with a different underlying motion prior while remaining unbiased and consistent.
3.6. Discussion
As mentioned previously, the big-O time complexity of classic preintegration and our approach are the same. In practice, our approach is slightly slower but not by much. Using a modern CPU, either approach can be considered real-time capable. Note that the number of preintegration windows could be adjusted and each preintegration window could be computed in parallel to make the approach more efficient. In this way, we could parallelize the solving of some estimation problems. We are motivated by sensor configurations that cannot be easily handled by classic preintegration such as multiple asynchronous high-rate sensors. This could include a lidar and an IMU or multiple asynchronous IMUs.
4. Lidar-inertial odometry
Our lidar-inertial odometry is implemented as sliding-window batch trajectory estimation. The factor graph corresponding to our approach is depicted in Figure 8. The state $\mathbf{x}(t) = \{\mathbf{T}(t), \boldsymbol{\varpi }(t), \dot{\boldsymbol{\varpi }}(t), \mathbf{b}(t)\}$ consists of the $SE(3)$ pose $\mathbf{T}_{vi}(t)$ , the body-centric velocity $\boldsymbol{\varpi }_v^{vi}(t)$ , the body-centric acceleration $\dot{\boldsymbol{\varpi }}_v^{vi}(t)$ , and the IMU biases $\mathbf{b}(t)$ . $\boldsymbol{\varpi }_v^{vi}$ is a $6\times 1$ vector containing the body-centric linear velocity $\boldsymbol{\nu }_v^{vi}$ and angular velocity $\boldsymbol{\omega }_v^{vi}$ . We approximate the $SE(3)$ trajectory using a sequence of local Gaussian processes as in [Reference Anderson and Barfoot28]. Between pairs of estimation times, the local variable $\boldsymbol{\xi }_k(t)$ is defined as:
We use (32) and the following to convert between global and local variables:
where the approximation for $\ddot{\boldsymbol{\xi }}_k(t)$ was originally derived by Tang et al. [Reference Tang, Yoon and Barfoot14]. We use a Singer prior, introduced in [Reference Wong, Yoon, Schoellig and Barfoot30], which is defined by the following Gaussian process:
and which can equivalently be represented using the following LTI SDE:
where
$\boldsymbol{\sigma }^2$ is a variance, $\boldsymbol{\ell }$ is a length scale, $\boldsymbol{\alpha } = \boldsymbol{\ell }^{-1}$ , and $\mathbf{w}(t)$ is a white-noise Gaussian process where $\mathbf{Q}_c = 2\boldsymbol{\alpha }\boldsymbol{\sigma }^2$ is the associated power spectral density matrix. (36) can be stochastically integrated to arrive at a local Gaussian process:
where the formulation for the transition function $\boldsymbol{\Phi }(t, t_k)$ and the covariance $\mathbf{Q}_t$ can be found in [Reference Wong, Yoon, Schoellig and Barfoot30]. In order to convert our continuous-time formulation into a factor graph, we build a sequence of motion prior factors between pairs of estimation times using
Our IMU measurement model is
where $\mathbf{b}_a$ and $\mathbf{b}_\omega$ are the accelerometer and gyroscope biases, $\mathbf{w}_a \sim \mathcal{N}(\mathbf{0}, \mathbf{R}_{a})$ and $\mathbf{w}_\omega \sim \mathcal{N}(\mathbf{0}, \mathbf{R}_{\omega })$ are zero-mean Gaussian noise. Due to angular velocity and acceleration being a part of the state, the associated IMU error function is straightforward:
where we rely on forming measurement factors using the posterior Gaussian process interpolation formula. For each of these continuous-time measurement factors, we compute Jacobians of the perturbation to the state at the interpolated times with respect to the bracketing estimation times. This is an approximation, as it is not exactly the same as marginalization. However, we have found it to work well in practice. We also include motion prior factors for the IMU biases:
where $\mathbf{Q}_{b,k} = \boldsymbol{Q}_b \Delta t_k$ is the covariance resulting from a white-noise-on-velocity motion prior and $\boldsymbol{Q}_b$ is the associated power spectral density matrix. We use point-to-plane factors. The associated error function is
where $\mathbf{q}_j$ is the query point, $\mathbf{p}_j$ is the matched point in the local map, $\mathbf{n}_j$ is an estimate of the surface normal at $\mathbf{p}_j$ given neighboring points in the map, $\mathbf{D}$ is a constant matrix removing the homogeneous component, $\mathbf{T}_{vs}$ is a known extrinsic calibration between the lidar frame and the vehicle frame, and ${\alpha _j} = (\sigma _2 - \sigma _3)/\sigma _1$ [Reference Dellenbach, Deschaud, Jacquet and Goulette31] is a heuristic weight to favor planar neighborhoods. The objective function that we seek to minimize is
We solve this nonlinear least squares problem for the optimal perturbation to the state using Gauss-Newton. Once the solver has converged, we update the pointcloud correspondences and iterate this two-step process to convergence. In practice, we typically limit the maximum number of inner-loop Gauss-Newton iterations to 10, and the number of outer-loop iterations to 10 in order to enable real-time operation.
In our approach, we estimate the orientation of the gravity vector relative to the initial map frame at startup. We perform sliding-window batch trajectory estimation where the length of the sliding window is $200$ ms. We output the pose at the middle of the newest lidar scan.
5. IMU-as-input lidar-inertial baseline
As a baseline where IMU measurements are treated as an input, we consider a lidar-inertial odometry approach where IMU measurements are used to de-skew the lidar pointcloud and classic preintegration is used as a prior. The baseline is implemented as sliding-window batch trajectory estimation and the factor graph corresponding to the baseline approach is depicted in Figure 9. The state $\mathbf{x}(t_k) = \{\mathbf{C}_{iv}(t_k), \mathbf{r}_i^{vi}(t_k), \mathbf{v}^{vi}_i(t_k), \mathbf{b}(t_k) \}$ consists of the orientation $\mathbf{C}_{iv}(t_k)$ , position $\mathbf{r}_i^{vi}(t_k)$ , velocity $\mathbf{v}^{vi}_i(t_k)$ , and IMU biases. All variables are expressed in a global frame. We use classic preintegration to form binary factors between pairs of estimated states in the sliding window [Reference Forster, Carlone, Dellaert and Scaramuzza2]. At each iteration of the optimization, we integrate the IMU measurements to extrapolate for the state at each IMU measurement time using
The position at a given lidar point time can then be obtained by linearly interpolating between the positions at the IMU measurement times. The orientation at a given lidar point time can be obtained using the following formula:
where $\alpha = (\tau _j - t_\ell ) / (t_{\ell +1} - t_\ell )$ . Using these interpolated states, we can write the point-to-plane error function as:
The Jacobians of this error function with respect to perturbations to the state variables are provided in the appendix.
6. Lidar-inertial simulation
In this section, we compare the performance of our lidar-inertial odometry to the baseline imu-as-input approach in a simulated environment. The simulated environment is a rectangular room, and we simulate trajectories using sinusoidal body-centric velocities:
where $A_j$ and $f_j$ are configurable amplitude and frequency parameters. The resulting body-centric acceleration can be obtained via differentiation:
We then step through the simulation so as to replicate the lidar firing sequence of a Velodyne Alpha-Prime 128-beam lidar, obtaining the pose of the sensor for each firing sequence. Starting with $\mathbf{T}_0 = \mathbf{T}_{vi}(t_0) = \mathbf{1}$ ,
where $\Delta t_k$ is very small (53.3 $\mu$ s). By generating the trajectories in this way, it is straightforward to extract the body-centric angular velocity and linear acceleration to simulate IMU measurements. We use the measurement model in (39) to simulate biases and gravity components. We simulate a constant nonzero bias on each gyroscope and accelerometer axis. We include zero-mean Gaussian noise on IMU measurements as well as Gaussian noise on the range measurement of each lidar point. We chose measurement noises to be close to what we experience on our experimental platform. Figure 10 depicts an example pointcloud produced in our simulation environment where the points are colored based off which wall they are reflected. Figure 11 compares the trajectory estimated by our lidar-inertial odometry with the ground truth.
For the simulation parameters, we use an IMU rate of 200 Hz, a simulation length of 20s, and three motion regimes denoted slow, medium, and fast. Where for each of these motion regimes, we randomly sample for the amplitudes and frequencies of the body-centric velocities used in the simulation of a sequence. The ranges for these parameters is given in Table I. One set of amplitudes and frequencies is sampled for each of the 20 sequences simulated for the three motion regimes. We set the standard deviation of the accelerometer measurement noise to 0.02 $\textrm{m/s}^2$ , the standard deviation of the gyroscope measurement noise to 0.01 rad/s, and the standard deviation of the lidar range measurements to 0.02 m. The accelerometers were given a constant bias of 0.05 $\textrm{m/s}^2$ , and the gyroscopes were given a constant bias of 0.05 rad/s.
We compare the performance of our lidar-inertial odometry against the baseline in Table II where we also show the performance of our approach using only the lidar and the gyroscope, and lidar only. We obtained the results by computing the absolute trajectory error between our estimated trajectories and the ground truth using the evo evaluation tool.1 The results in the table are the overall root mean squared error obtained by concatenating the error from each individual sequence. The results show that in the low speed regime, the imu-as-input baseline approach and our imu-as-measurement approach based on the Singer prior achieve nearly identical results. This is unsurprising as it appears to replicate the results from Section 3.5. Interestingly, our lidar-only approach performs the best on the slow regime. However, in the medium and fast regime, the advantage of our lidar-inertial approach becomes apparent. In the medium regime, the baseline imu-as-input approach begins to break down. This is possibly due to the fact that the motion is no longer approximately constant acceleration and constant angular rate. On the other hand, our lidar-inertial approach performs roughly the same in the medium regime. Our lidar-only approach also breaks down in the medium regime. In the fast regime, both our lidar-inertial and lidar with gyro approaches achieve respectable results, while the lidar-only approach and the imu-as-input baseline fail completely.
7. Experimental results
In this section, we provide experimental results on the Newer College Dataset [Reference Ramezani, Wang, Camurri, Wisth, Mattamala and Fallon32]. This dataset features a 64-beam Ouster lidar and provides the internal IMU measurements of the Ouster lidar. Ground-truth poses were obtained by matching live lidar poses to a map of the environment created using a survey-grade lidar at several stationary poses. This dataset is somewhat unique in that it features several sequences with highly dynamic motions. In Table III, we compare the performance of using our continuous-time Singer prior using lidar only (Singer-LO), lidar and a gyroscope only (Singer-LO + Gyro), and a full lidar-inertial setup including an accelerometer (Singer-LIO). We also compare the performance of our approach to some comparable works in the literature such as CT-ICP [Reference Dellenbach, Deschaud, Jacquet and Goulette31], a lidar-only approach, FAST-LIO2 [Reference Xu, Cai, He, Lin and Zhang20] and DLIO [Reference Chen, Nemiroff and Lopez21], which can be considered the prior state of the art for this dataset, and SLICT [Reference Nguyen, Duberg, Jensfelt, Yuan and Xie33] and CLIO [Reference Lv, Lang, Xu, Wang, Liu and Zuo8], which are two continuous-time approaches that use linear interpolation and B-splines, respectively.
Our approach, Singer-LIO, demonstrates the best performance on the 01-Short and 02-Long sequences and also demonstrates the best overall performance. Interestingly, the sequences in which we expected the IMU to make the most difference were 05-Quad w/ Dynamics and 06-Dynamic Spinning due to their dynamic motions. However, we observe that in these sequences, our lidar-only approach performs similarly or even better, replicating the results of our lidar-inertial simulation. It appears that, in this dataset, the addition of an IMU mainly helps in areas where there are geometric degeneracies rather than the areas with dynamic motions. Sequences 05-Quad w/ Dynamics and 06-Dynamic Spinning are very similar to our lidar-inertial simulation in the slow regime, as they are conducted in a rectangular quad at New College, Oxford. FAST-LIO2 and DLIO can be considered state-of-the-art IMU-as-input approaches, and our approach demonstrates a clear advantage over these methods.
8. Conclusions
In this work, we compared treating an IMU as an input to a motion model versus treating it as a measurement of the state. On a 1D simulation problem, we showed that these two approaches performed identically when the data are sampled from either a constant velocity or constant acceleration prior and both methods are trained on a hold-out set. We demonstrated our approach to continuous-time lidar-inertial odometry using the Singer prior where body-centric acceleration is included in the state. In our simulated environment, we showed that our lidar-inertial odometry outperformed lidar-only odometry and an IMU-as-input baseline approach. On the Newer College Dataset, we demonstrated state-of-the-art resuts. There is still plenty of work to be done in treating IMU measurements as measurements of the state. Similar to nonuniform B-splines, it would be interesting to investigate a setup where the parameters of the Singer prior are adjusted on the fly so as to adjust between periods of smooth versus highly dynamic motion. When the IMU is treated as a measurement of the state, this allows us to now incorporate exogenous control inputs into our Gaussian process motion prior. This could be a promising area of research for estimating the state of drones where the torque commanded to the motors is often known. Our approach to combine multiple asynchronous high-rate sensors may prove beneficial in other sensor configurations such as multiple asynchronous IMUs.
Author contributions
Keenan Burnett carried out the research and wrote the article, Angela Schoellig provided feedback on research, and Tim Barfoot conceived of the study and provided feedback on research.
Financial support
This work was supported in part by the National Sciences and Engineering Research Council of Canada (NSERC) and by an Ontario Research Fund: Research Excellence (ORF-RE) grant.
Competing interests
The authors declare none.
Data availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0263574724002121.