1. Introduction
Ice thickness and bed elevation are fundamental for modelling glacier flow dynamics and ice-sheet–climate interactions (Durand and others, Reference Durand, Gagliardini, Favier, Zwinger and Le Meur2011; Bamber and others, Reference Bamber2013). Measuring these parameters requires remote-sensing techniques that penetrate ice and can cover extensive areas. Radio-echo sounding (RES) has been used since the 1950s for subglacial topography detection in Antarctica and Greenland and remains the primary method for mapping ice-sheet basal topography from aircraft (Robin and others, Reference GdQ, Evans and Bailey1969; Dowdeswell and Evans, Reference Dowdeswell and Evans2004; Bingham and Siegert, Reference Bingham and Siegert2007; Schroeder and others, Reference Schroeder2020). RES survey accuracy is influenced by factors such as rough topography, subglacial and englacial water, and crevassed ice, which can scatter or attenuate radar signals (Chu and others, Reference Chu, Schroeder, Seroussi, Creyts, Palmer and Bell2016; Jordan and others, Reference Jordan2017; Kendrick and others, Reference Kendrick2018). Additional factors, like radar instrument specifications and survey platform stability, further affect accuracy (Lapazaran and others, Reference Lapazaran, Otero, Martín-Español and Navarro2016).
Due to the vast size of the ice sheets, RES data are generally collected along flight lines with gaps of several kilometres, which requires interpolation to generate continuous bed topography maps essential for modelling ice dynamics and subglacial hydrology (Studinger and others, Reference Studinger, Koenig, Martin and Sonntag2010). Bed topography significantly influences ice and subglacial water flow, the estimation of total ice volume and the rate at which ice is discharged to the surrounding ocean (Bamber and others, Reference Bamber2013). As stated in the most recent IPCC assessment report, Greenland’s ice mass, which accounted for 24.5% of observed sea-level rise from 1901–2018, will continue to impact sea level due to ongoing ice mass loss (Fox-Kemper and others, Reference Fox-Kemper2021). Accurate bed topography data are needed to model the ice sheet’s response to changes in meltwater input, which is increasing with atmospheric warming. Subglacial topography is crucial for controlling ice flow and discharge, grounding line position and ice front dynamics, particularly for marine-terminating sectors (Durand and others, Reference Durand, Gagliardini, Favier, Zwinger and Le Meur2011; Van den Broeke and others, Reference van den Broeke2016; Cooper and others, Reference Cooper, Jordan, Schroeder, Siegert, Williams and Bamber2019). Attributes of the ice-bed interface, such as roughness and the distribution of free water, determine basal traction and consequently control the rate of movement of ice mass from the accumulation zone to the ablation zone. (Gudmundsson, Reference Gudmundsson1997; Bingham and Siegert, Reference Bingham and Siegert2009; Hoffman and others, Reference Hoffman2016). Diurnal, seasonal and longer-term changes in subglacial meltwater storage and distribution further impact basal friction and ice dynamics (Palmer and others, Reference Palmer, Shepherd, Nienow and Joughin2011, Reference Palmer2013, Reference Palmer, McMillan and Morlighem2015; Chu and others, Reference Chu, Schroeder, Seroussi, Creyts, Palmer and Bell2016). Accurate bed topography is therefore critical to understanding ice sheet responses to meltwater variability and other climate-driven factors.
Since its release in 2017, the BedMachine Greenland v3 data have become the most widely used estimates of subglacial topography in Greenland (Morlighem and others, Reference Morlighem2017). Quantifying uncertainty in these data is important to assign confidence to the outputs of modelling studies that use these data, as well as to inform future RES data acquisition strategies, although this task has proven challenging (Morlighem and others, Reference Morlighem, Seroussi, Larour and Rignot2013, Reference Morlighem2017). The BedMachine data contains uncertainty estimates, which are derived via different methods in different regions (Morlighem et al., 2017 supplementary materials). Near the ice margin, where mass conservation is used, these estimates are obtained through propagation of errors and assume that the ice flow is in ‘steady-state’. Further inland, where kriging is used, uncertainties are estimated through proximity to radar observations, reaching several hundred metres in many places.
Previous studies have shown that machine learning approaches such as random forests (Breiman, Reference Breiman2001) have the potential to improve estimates of geophysical parameters. Quantile regression forests (QRFs) (Meinshausen and Ridgeway, Reference Meinshausen and Ridgeway2006) are an extension of random forests that provide information about the conditional distribution of the target variable (e.g. ice depth) for a given input (e.g. position in space), rather than just a prediction of the conditional mean value of the target variable. These decision tree-based learning algorithms have become widely used in spatial machine learning applications (Prasad and others, Reference Prasad, Iverson and Liaw2006; Hengl and others, Reference Hengl2015; Kirkwood and others, Reference Kirkwood, Cave, Beamish, Grebby and Ferreira2016), and various investigations have been carried out into their properties as spatial learning algorithms (Hengl and others, Reference Hengl, Nussbaum, Wright, Heuvelink and Gräler2018; Møller and others, Reference Moller, Beucher, Pouladi and Greve2020; Sekulić and others, Reference Sekulić, Kilibarda, Heuvelink, Nikolić and Bajat2020). Unlike more traditional methods, QRF does not assume linearity or specific error distributions, allowing it to provide more flexible and spatially nuanced uncertainty estimates. This data-driven technique can yield prediction intervals that better reflect the true variability in bed elevation, especially in regions with sparse observations or complex subglacial terrain, thereby improving confidence in glaciological modelling.
In this study, we investigate the potential for a QRF approach to (i) improve ice sheet bed elevation datasets and (ii) to provide a more rigorous quantification of uncertainty. We present a QRF to map the basal topography of Greenland’s ice sheet, with estimated uncertainty. Our new method uses the approach of Mø ller and others Reference Moller, Beucher, Pouladi and Greve(2020) to provide ‘oblique geographic coordinates’, i.e. a range of different compass orientations as input features to the QRF, so that spatial thresholds are not restricted to align north-south and east-west, leading to more realistic estimates of subglacial topography.
2. Data
We used data assets published by the BedMachine v3 study (Morlighem and others, Reference Morlighem2017), as well as ice velocity rasters from the MEaSUREs project Joughin and others (Reference Joughin, Smith, Howat and Scambos2015, Reference Joughin, Smith and Howat2018), and ice surface elevation data from the Greenland Ice Mapping Project (Howat and others, Reference Howat, Negrete and Smith2014). To train our QRF, we use the same airborne radar observations of ice thickness as were used to develop BedMachine v3; these primarily come from NASA’s Operation IceBridge, with data processed by the Center for Remote Sensing of Ice Sheets (CReSIS; Paden and others, Reference Paden, Li, Leuschen, Rodriguez-Morales and Hale2010 updated 2019) as well as other smaller scale survey projects (see section 2 of Morlighem and others, Reference Morlighem2017). To test the accuracy and calibration of our QRF approach, and to provide a fair comparison with BedMachine v3, we use circum-Greenland ice thickness measurements collected during the PROMICE airborne surveys Sø rensen and others Reference Sorensen(2018) as a test dataset for both BedMachine and our QRF(Fig. 1).

Figure 1. Ice sheet surface elevation (a), RES observations of ice thickness (b), BedMachine mask (c), BedMachine sources (d). The PROMICE observations used to test both the QRF and BedMachine predictions are shown in blue on panel (b).
3. Methodology
BedMachine inferred ice thickness using a combination of a mass conservation approach along the periphery of the ice sheet and ordinary kriging within the interior (Fig. 1; Morlighem and others, Reference Morlighem2017). Mass conservation relies on modelling the transport of ice, and the directions and magnitude of the ice speed are critical inputs, as they control how the ice is distributed spatially. In regions of slow speed, flow directions become poorly defined, and the accuracy of mass-conservation inferred ice thickness drops. We find that mass conservation works well for regions where the ice velocity is greater than 30-40 m yr−1, which is 30% of the ice sheet area. In this new approach, we use a QRF (Meinshausen and Ridgeway, Reference Meinshausen and Ridgeway2006) across the entire ice sheet. We model the subglacial bed elevation directly as our target variable, rather than ice thickness. This has the effect of making predictions from the terminal nodes of each tree in our QRF physically horizontal, rather than physically perpendicular to the surface of the ice, as would be the case when predicting ice thickness. We found this resulted in better generalisation, accounting for ∼10 of the overall ∼40 metre reduction in root-mean-squared-prediction-error of our regression forest approach compared to BedMachine.
The task of mapping Greenland’s subglacial bed topography from point-sampled airborne radar observations of ice thickness (or equivalently of bed elevation, by subtracting ice thickness from surface elevation) is inherently a spatial interpolation problem. However, more information is available than just the airborne radar observations and their spatial locations; we also have spatially-continuous knowledge of the ice sheet’s surface in the form of gridded satellite data of surface elevation (Howat and others, Reference Howat, Negrete and Smith2014) and surface ice velocity (Joughin and others, Reference Joughin, Smith, Howat and Scambos2015). Previous studies have shown that these covariate datasets contain information about subglacial topography (Gudmundsson, Reference Gudmundsson2003; Smith and others, Reference Smith, Raymond and Scambos2006), so we include them as input features to our QRF model (Fig. 2). We deduce that the degree to which surface features are informative of the subsurface will vary across the ice-sheet, and that there may be different interactions between different surface features at different locations. The desire to accommodate this flexibility, as well as to handle the scale of data involved (over 30 million airborne radar observations are available), motivates our investigation of machine learning for subglacial bed elevation prediction in this study.

Figure 2. Covariates used in the QRF: ice flow speed (a), ice flow angle (b), surface roughness (c) and roughness bandpass (d).
To implement our QRF, we used the Ranger package Wright and Ziegler Reference Wright and Ziegler(2017) for the R language for statistical computing R Core Team (2023). The QRFs algorithm Meinshausen and Ridgeway Reference Meinshausen and Ridgeway(2006) extends the popular ‘random forest’ machine learning algorithm to enable provision of conditional quantile predictions in addition to the random forest’s typical prediction of the conditional mean. In a QRF, conditional quantiles are calculated from the empirical distribution of target variable observations that, for each tree, fall within the same leaf as the new prediction candidate. This enables the construction of prediction intervals to communicate prediction uncertainty; the calibration of which we assess as part of this study.
3.1. Data preparation
The collection of ice depth observations over Greenland by airborne radar survey has produced a spatially biased sample; all points on the map have not had equal probability of being observed, but rather observations are concentrated in flight lines, the densities of which vary significantly across the map, with some areas being ‘hot spots’ for observations while others are relatively unobserved. Any attempt to minimise prediction error equally across all observations is therefore not an attempt to minimise prediction error equally across the map, which should really be our aim.
To mitigate the effect of this sampling bias in our modelling, we first resampled the radar observations according to the following scheme: For n desired number of observations to use in modelling (we chose
$n = $ 3.7 million in this study; equivalent to 1/10th of all the radar observations, and a good balance between computational cost and model quality), randomly generate easting and northing coordinates (NSIDC North pole stereographic, EPSG: 3413) of a location over Greenland, then take the nearest neighbouring radar observation to be observation
$i\in[1,\;n]$. Allow the same observation to be used whenever it is nearest to subsequent randomly generated coordinates (i.e. allow replacement). Following this scheme produces a resampled set of radar observations in which the probability of each observation being included is inversely proportional to the spatial sampling intensity of the original survey, as advocated for by De Bruin and others Reference De Bruin, Brus, Heuvelink, van Ebbenhorst Tengbergen and Wadoux(2022). This effectively re-weights the observations, by resampling, to have equal representation across the map. For small n (up to about 10,000), the resultant resampled observation sets appear indistinguishable from a spatially random sample. For large n (over about 100,000), the influence of flight lines is unavoidable, but at least observations in sparsely sampled locations are up-weighted, meaning that metrics calculated against the resampled observation dataset are more representative of the map as a whole, rather than being biased to over-represent observation hotspots.
Since BedMachine utilised all radar observations that were available to it (i.e. no observations were held out as a test set for the final BedMachine product), we have a challenge in making a fair comparison between BedMachine and other approaches. Fortunately, an additional radar observations dataset from the PROMICE project (Sørensen and others, Reference Sorensen2018; shown in blue in Fig. 1b) was not used to produce BedMachine, and so we utilise it here as our test set for comparison to BedMachine. This is also the reason we compare against BedMachine, since more recent versions of BedMachine have assimilated the PROMICE dataset. The PROMICE radar observation dataset is spatially biased in that it was acquired using flight missions oriented parallel to the ice sheet margin around the periphery of Greenland, with no observations in the interior. As such, the evaluations we make of our QRF, and of BedMachine, using this PROMICE dataset are biased towards the performance of the models around the ice periphery (although as we shall see, this tends to be a difficult area to predict). To provide some reassurance of the performance of our QRF in the ice interior, we also report metrics from a cross-validation scheme, but it is not possible to compare this with BedMachine; to do so would require running the BedMachine procedure on the same folds.
3.2. Feature engineering
The target variable, y, for our QRF is subglacial bed elevation. We wish to make predictions of y as a function of inputs x. These inputs consist of the spatial coordinates of each observed location (i.e. easting and northing in EPSG:3413) along with their corresponding values of satellite-observed covariates, which in this study are surface elevation, ice flow speed and ice flow angle. Thus, in raw form (without feature engineering), our input xraw would be five-dimensional: easting, northing, elevation, flow speed and flow angle would be our features.
However, in this study, we make three key feature engineering decisions to produce xeng upon which our QRF is constructed:
(1) We rotate our spatial axes to add new compass orientations in addition to easting and northing. In total, we provide 16 equal-angled rotations about the compass as our spatial input features. This allows our QRF to perform spatial interpolation that is smoother and less ‘blocky’ than if the decision trees were forced to make partitions on easting and northing only.
(2) We take the sine and cosine of the ice flow angle raster, so that the ‘circular’ nature of this variable is better represented (otherwise, 5 degrees and 355 degrees appear to be 350 degrees apart, when in reality they are only 10 degrees apart).
(3) We filter the surface elevation raster to generate two derived features (although many more are possible); A) a simple surface roughness measure as the standard deviation of 5 × 5 window passed across the raster, and B) a simple bandpass of surface elevation for which we first subtract the mean of a 5 × 5 window (leaving only residual high frequency information) and then take the mean of this residual in a 5 × 5 window.
The benefits of feature engineering to tailor the inputs, x, towards the machine learning task at hand (to learn f(x) such that
$y = f(x)$) are well documented (e.g. Kuhn and Johnson (Reference Kuhn and Johnson2019)). Feature engineering helps guide a machine learning model by shaping its assumptions about how input variables relate to the target outcome. In Bayesian terms, it can be seen as helping to represent our prior belief about the possible relationships between the input variables xraw and the predicted outcome y. The features we have engineered to include in this study are by no means exhaustive, and there are many more that could be tried. We suggest that the engineering, or learning, of optimal features for this task would be a worthwhile focus for a future research project.
3.3. Hyperparameter tuning
With our observations resampled to compensate for the spatial sampling bias of flight lines (Bartlett and others, Reference Bartlett, Palmer, Schroeder, MacKie, Barrows and Graham2020), and with our features engineered to include additional compass orientations and derivatives of our satellite observed covariates, we proceeded to find suitable hyperparameters for the construction of our QRF. Hyperparameters are a priori configuration values that influence how the QRF model learns from the training data. Our aim is to balance the trade-off between overfitting the data (i.e. spuriously fitting to noise) and underfitting the data (i.e. failing to fit to all of the signal). To assess our performance in this task, we use a k-fold cross-validation procedure (where k = 6), such that we split the training data (our radar observations excluding the PROMICE dataset) into six folds, and then train 6 QRFs, with the same hyperparameter values, each on a different set of 5 of the folds, so that their performance can be tested on the fold they did not see during training (see Appendix: Cross validation by field season). In this way, as we repeat the procedure using different hyperparameter values, we can assess the ability of the QRF algorithm with different hyperparameter values to make good predictions of subglacial bed elevation at locations it has not observed during training.
Typically, a k-fold cross-validation procedure would split the data into k folds at random (Kohavi and others, Reference Kohavi1995). However, to do so in our case would result in having observations from different folds side-by-side together in the same flight lines. Tuning hyperparameters on such folds would result in optimising for interpolation along flight lines, rather than between flight lines. Since large swathes of the map lie between flight lines (Fig. 1b), it makes sense to prioritise between-flight-line predictive performance, as this is more representative of the challenge of mapping Greenland’s subglacial bed topography as a whole. In order to achieve this, we create our six folds for cross-validation by splitting the observations by the years in which they were collected; the aircraft in different years flew different routes, and so this results in six folds of distinct flight lines. The folds are assembled by years as follows: Fold 1 - years 1993 to 2001, Fold 2 - years 2002 to 2010, Fold 3 - years 2011 to 2012, Fold 4 - years 2013 to 2014, Fold 5 - years 2015 to 2017, Fold 6 - years 2018 to 2019. Splitting the data in this way produces 6 folds with close-to-equal number of observations.
By manually tuning hyperparameters to minimise the root-mean-squared error of predictions in cross-validation, we arrived at the following hyperparameters for our QRF: max.depth = 0 (unlimited), min.node.size = 9, splitrule = “variance”, replace = TRUE, sample.fraction = 0.95, mtry = 5 (default for our number of inputs). We do not limit the maximum depth of the trees, but instead specify a minimum node size of 9 observations. Each tree is grown on a 95% random subsample of the data, sampling with replacement. Trees are grown using the typical ‘variance’ split rule, such that the intra-node variance of the resultant partitions is minimised. We do not change the ‘mtry’ hyperparameter, which controls how many variables are considered for each decision split in a tree. By default, it is set to the square root of the total number of input features, rounded to the nearest integer. Since our model has 22 input features, this results in an ‘mtry’ value of 5. For the cross-validated hyperparameter tuning, we grew 20 trees per forest, this achieved a cross-validated R2 of 0.97 and root-mean-squared-error of 172 metres—these cross-validation metrics are of interest mainly in the context of comparison with results for the PROMICE test set, which we report in the results section.
For the final forest trained on all folds of radar observations (but excluding the PROMICE dataset, which we reserved for testing only), we grew 100 trees using the hyperparameters described above.
4. Results and discussion
4.1. Test metrics
Our QRF model outperforms BedMachine in both deterministic and probabilistic predictions when tested on the circum-Greenland PROMICE dataset, unseen by either method during training (Fig. 3). In terms of ice thickness (surface elevation minus predicted bed elevation), the QRF achieves a root-mean-squared error (RMSE) of 190 m, representing an 18% reduction compared to BedMachine’s 232 m. This improvement, though modest, is significant in terms of increasing prediction accuracy, particularly when applied across large-scale ice-sheet models where even small gains can lead to more accurate estimates of ice-sheet volume and sea-level rise projections.

Figure 3. QRF predictions versus observations (a), BedMachine predictions versus observations (b), QRF calibration (c), BedMachine calibration (d).
Additionally, the QRF model exhibits a higher R2 value of 0.83 compared to BedMachine’s 0.79 when compared with the PROMICE data, representing an increased correlation between observed and predicted ice thickness values. This modest improvement in deterministic performance translates into better fidelity in capturing subglacial features, which has implications for understanding Greenland’s ice dynamics using numerical modelling (Fig. 4).

Figure 4. QRF predicted bed elevation (a), BedMachine minus QRF (difference) (b), Bed Machine uncertainty (c), QRF uncertainty (d).
4.2. Predictive performance insights
Examining scatter plots of predicted versus observed ice thickness (Fig. 3), the QRF model reveals a more consistent performance across all observed depths compared to BedMachine, which demonstrates a tendency to under-predict ice thickness in shallow-ice regions. Notably, BedMachine often predicts zero or near-zero ice thickness in regions where the radar observations suggest depths of up to 500 m (Fig. 4). This systematic underprediction, particularly along the shallow periphery, could have profound effects on estimates of ice-sheet dynamics and mass loss, which rely on accurate thickness measurements in these crucial areas. In contrast, the QRF model provides a more balanced prediction distribution, with residuals scattered symmetrically around the diagonal for all observed ice depths.
These improvements in prediction accuracy are particularly relevant in regions of Greenland where the bedrock topography is complex, and ice dynamics are heavily influenced by subglacial features. Since BedMachine uses methods that have a tendency to smooth out important subglacial features, the QRF model’s approach of using high-resolution satellite-derived surface and velocity data allows it to capture individual topographic features in finer detail, as can be seen in Fig. 5. This advantage is crucial for predicting the movement of subglacial water and ice flow, as accurate topographic representation can impact our understanding of ice discharge and basal hydrology, both of which are important for predicting sea-level rise.

Figure 5. Comparison of bed elevations predicted by BedMachine v3 (a) and our new QRF model (b) for the region near Kangerlussuaq Glacier in East Greenland.
4.3. Uncertainty quantification and practical relevance
A key strength of the QRF model lies in its ability to produce well-calibrated prediction intervals. BedMachine, on the other hand, exhibits underdispersed prediction intervals, with its 90% prediction interval capturing only 66.8% of test observations, indicating an apparent miscalibration in its predictions when compared with the PROMICE data (Fig 3d). This is problematic for practical applications, as models used to predict the response of ice sheets to climate change require not only accurate boundary conditions but also a robust representation of uncertainty. In contrast, the QRF model captures 89.8% of test observations within its 90% prediction interval, making it a more reliable tool for assessing uncertainties in ice-sheet behaviour.
The continuous ranked probability score (CRPS) measures the difference between the predicted and observed cumulative distributions (Hersbach, Reference Hersbach2000). The CRPS is widely used in probabilistic model evaluation owing to it being a proper scoring rule; one that rewards not only accuracy but also well-calibrated and sharp quantification of uncertainty (Gneiting and Raftery, Reference Gneiting and Raftery2007), with the CRPS’ ultimate minimal value of zero only being achieved for a perfect deterministic prediction. The lower CRPS of our QRF model (92 metres compared to 130 metres for BedMachine) further highlights its advantage in balancing accuracy and uncertainty. This improved performance is especially valuable for quantifying the uncertainty in projections of sea level rise arising from the subglacial topography.
4.4. Mapping subglacial topography
The map generated by the QRF model (Fig. 4a) displays a more detailed and nuanced subglacial landscape compared to those produced by BedMachine (Fig. 5). The QRF model reveals a rich diversity of subglacial features, such as river channels, highlands and lowland plains, particularly in Greenland’s interior, where BedMachine uses kriging to interpolate ice thickness observations - sometimes over distances in excess of many kilometres. The use of satellite-derived ice surface and velocity data potentially enables the QRF to predict bed elevation with greater precision in regions where BedMachine relies solely on spatial interpolation of sparsely-sampled ice thickness measurements.
The differences between the two models’ predictions are most pronounced at the ice-sheet periphery, where BedMachine relies on mass conservation for grounded ice and gravity inversion for fjord bathymetry. These methods, while useful, impose strict assumptions that may smooth out ravines and other important subglacial structures. In contrast, the QRF model offers more flexible predictions, which reveal deeper ice and lower bed elevations in subglacial ravines around Greenland’s margins (Fig. 5). These features are important because they play a critical role in controlling ice flow, basal water distribution and ice discharge into the ocean, factors that ultimately influence the rate of ice loss from Greenland.
In some regions, such as the southwest of the GrIS, BedMachine predicts deeper ice in subglacial ravines than the QRF model, but this appears to be the exception rather than the rule. Notably, large differences in estimated bed elevation exist between the QRF and BedMachine at floating ice shelves in northern Greenland due to the fact that the QRF is not able to represent the correct ice/sea/bed configuration. Such discrepancies highlight a limitation in the approach used here. Future work is needed to refine our approach in regions where the ice is floating or otherwise decoupled from the bedrock.
4.5. Ice volume estimates
One of the most significant applications of improved bed elevation estimates is in calculating total ice volume and, by extension, potential future sea-level rise contributions. While our QRF model does not explicitly model spatial dependence between locations, its structure inherently captures some spatial relationships through the arrangement of regression tree nodes. This allows for realistic estimates of total ice volume and uncertainty, a feature that is vital for large-scale models predicting Greenland’s contribution to future sea-level rise. Our estimated ice volume is (3.011 ± 0.004) × 106 km3, while the bedmachine estimate is (2.99 ± 0.02) × 106 km3. This makes our QRF estimate of the total Greenland ice sheet volume to be 0.7% greater than that estimated by BedMachine, though we place more importance on the differences in the shape of the bed features, especially near the ice margin.
As we have seen, our QRF appears to be well calibrated when assessed against held-out test observations, i.e. a 90% prediction interval provided by the QRF will contain the true bed elevation in (close to) 90% of cases, and likewise for other prediction intervals. However, being able to reasonably quantify ice-thickness prediction uncertainty at individual locations does not necessarily mean that one can also reasonably quantify the uncertainty in an estimate of the total ice volume. If prediction errors are modelled entirely independently at each location, then when summed together to obtain a total, the errors will cancel out, resulting in a spuriously overconfident estimate of the total (Wadoux and Heuvelink, 2023). Conversely, if prediction errors have an unrealistically strong spatial dependence, they may not cancel out enough when summed together, potentially resulting in an unrealistically under-confident estimate of the total. While our QRF does not explicitly model spatial dependence, spatial dependence is implicitly captured in the structure of the quantile regression trees: Each tree has a number of terminal nodes, each of which contains a set of training observations. The within-node spatial dependence is not modelled; we can ‘simulate’ observations from a node simply by sampling at random from the observed values it contains, which equates to an uncorrelated error distribution, like nugget variance in traditional geostatistics. The spatial dependence between nodes is, however, captured by the arrangement of their boundaries; regression trees will produce smaller terminal nodes where the ‘length scale’ of observations is shorter, i.e. where there is more high-frequency information to capture. Conversely, where the ‘length scale’ of observations is large, terminal nodes can also be large.
While a single decision tree can only crudely approximate the spatial dependence between observations, when combined together in the QRF, the quality of this approximation improves. We can think of the distribution of observations within each node of each tree as a nugget-effect-like independent noise (the data distribution) that each tree estimates. This is our aleatoric uncertainty (data uncertainty representing the inherent randomness in the observations, e.g. including measurement error). Meanwhile, the ensemble of mean values predicted by each tree throughout the feature space approximates a distribution over functions representing our epistemic uncertainty (reducible uncertainty representing our lack of knowledge). By treating our QRF as an approximate Bayesian model (with the ensemble of regression trees representing the ‘poor man’s posterior’ as described on p.272 of Hastie and others, Reference Hastie, Tibshirani and Friedman2009), we can ‘simulate’ different maps of bed elevation from the ensemble, by first sampling one tree from our ensemble of trees (i.e. sampling a configuration of parameters from the parameter distribution which our forest approximates) and then, at each location of the map, sampling one value from the relevant within-node empirical distribution of training observations (i.e. sampling from the data distribution).
Comparing the spatial autocorrelation properties of these ‘simulated’ maps to the spatial autocorrelation properties of held out test data (at the same locations) using variograms reveals them to be largely similar (Fig. 6), although there is a (somewhat rare) tendency for the regression trees to exhibit too much spatial variability at ranges of about 5 km. Despite this elevated 5 km spatial variability for some of the trees, the average of the forest as a whole (and therefore the overall spatial autocorrelation properties of our output bed map) matches quite well the spatial autocorrelation properties of the PROMICE test set (shown as the red line in Fig. 6h). The exaggerated 5 km spatial variability of some of the trees is unlikely to affect estimates of the total volume of ice, because for each tree, this small-scale variability cancels out when summing over the full 250 km extent of Greenland. We therefore propose that it is reasonable to use the QRF to estimate total ice volume over Greenland.

Figure 6. Three example QRF ‘simulated’ ice thickness maps (a–c), Semivariograms of QRF ‘simulations’ (blue) vs held-out test observations (red)(d) and a histogram of QRF ‘simulated’ ice volumes (e). The vertical dashed red line shows the mean ice volume (
$3.011 \pm 0.004) \times 10^6\,\mathrm{km}^3$. BedMachine’s estimate is (
$2.99 \pm 0.02) \times 10^6\,\mathrm{km}^3$.
4.6. Future work
There are several aspects of the work that warrant further investigation. Geophysical covariate observations, such as gravitational field strength, could be used in the model. This may lead to improved predictions of bed elevations in areas of floating ice. Additionally, it would be worth attempting to optimise the input features, including exploring the sensitivity of the predicted bed elevation to the hyperparameters used, e.g. the number of compass orientations. Future work is needed to explore the suitability of this approach for estimating bed elevations for the Antarctic ice sheets and Arctic ice caps. We suggest the next steps are to focus on quantifying the differences between BedMachine and the QRF at the outlet glaciers that drain the ice sheet interior, as estimates of ice sheet mass balance are sensitive to the geometry of these outlet glaciers.
5. Conclusions
The QRF model offers significant advancements over the widely used BedMachine data in predicting Greenland’s subglacial topography and ice thickness. The QRF model provides improvements to both deterministic accuracy and probabilistic predictions, achieving an 18% reduction in RMSE and a better representation of uncertainty, as demonstrated by its superior R2 and continuous-ranked-probability score (CRPS). These improvements, while modest, hold substantial practical relevance for understanding ice dynamics in a warming climate and predicting future sea-level rise.
Our results indicate that the QRF model provides a more balanced and consistent prediction of ice thickness than BedMachine, which has a tendency to underpredict the width and depth of subglacial troughs at the ice sheet margin. This is crucial for studies of ice sheet mass balance because the cross-sectional area of these outlet glacier troughs determines the estimated rate of mass loss to the surrounding ocean.
One of the key contributions of the QRF model is its ability to produce well-calibrated uncertainty estimates, which are more reliable than those from BedMachine. The QRF model’s prediction intervals captured 89.8% of test observations within a 90% confidence interval, a stark contrast to BedMachine’s underdispersed intervals, which only captured 66.8%. This is crucial for decision-makers who rely on accurate risk assessments in scenarios where ice-sheet behaviour is difficult to predict. For example, policymakers focused on sea-level rise adaptation would benefit from models that can account for the inherent uncertainty in ice-thickness predictions while minimising errors in those predictions (Siegert and Pearson, Reference Siegert and Pearson2021). These more robust uncertainty estimates are also useful for identifying priority areas for future RES surveys to target.
In terms of mapping subglacial topography, the QRF model unveils a richer and more detailed landscape than BedMachine, especially in regions where traditional methods like mass conservation and gravity inversion tend to over-smooth high-relief subglacial features. The model’s ability to preserve the linearity of crucial features of the subglacial landscape, such as ravines and channels, enhances our understanding of how these features influence ice flow and the routing of basal water.
Despite its strengths, the QRF model does have limitations, particularly in accounting for spatial dependence across locations. However, its structured ensemble approach partially captures these spatial relationships, leading to realistic estimates of total ice volume. This feature holds promise for improving future ice-sheet models that require robust volume predictions and well-calibrated uncertainty estimates. We suggest that the optimisation of input features would be a worthwhile focus for a future research project.
Overall, we have demonstrated that a QRF approach represents a computationally low-cost option for estimating ice sheet basal topography, offering both improved accuracy and more robust uncertainty quantification when compared with BedMachine in grounded-ice areas.
Data availability statement
Our new subglacial elevation dataset for the Greenland Ice Sheet, and the code used to generate it, are available for download at: https://github.com/charliekirkwood/greenlandice
Acknowledgements
This work was supported by a seedcorn research grant from University of Exeter’s Institute of Data Science and AI, which was awarded to S. Palmer. We thank the anonymous reviewers whose critical and insightful comments greatly improved the paper.
Appendix. Cross-validation by field season
To evaluate the ability of our QRF to make reasonable predictions in the (sometimes large) spaces between radar observation flight lines, and in order to tune its hyperparameters towards this purpose, we split our training data into six different folds according to their years of collection: 1993–2001 fold one, 2002–2010 fold two, 2011:2012 fold three, 2013:2014 fold four, 2015:2017 fold five, 2018:2019 fold six. These groupings produce folds of approximately equal size, and with little overlap between their respective flight lines (Fig. A1).

Figure A1. Our six cross-validation folds (a–f). The folds were created by grouping observations by years of collection in order to produce six folds of approximately equal size with minimal overlap in flight lines, so that cross-validation reveals our QRF’s ability to make reasonable predictions in the spaces between flight lines.








