What is it about?

Very large spatial data sets present problems for statistical modeling, such as regression, and for prediction, such as making maps. The problem occurs because we need to invert large matrices, and that is very time consuming, even for very large and fast computers. In this article, we show a fast and simple way to work with these large datasets, and still have valid regression models. We can also make maps where the uncertainty about the predictions is valid. These methods are available as open source software in an R package called spmodel.

Featured Image

Why is it important?

From satellite images to automated instruments, datasets keep getting larger and larger. Simple methods can be used to summarize these data. However, there is uncertainty in understanding relationships among variables, and in making predictions. This requires a statistical model that quantifies uncertainty in estimates and predictions so that we can make decisions in the face of risk. Thus, it is important to advance statistical methods for big spatial datasets as found in this paper.


This article has an interesting example. I wanted to have a method for big spatial data on stream networks, and that motivated the development of this method. It can work with any type of spatial data, and the ideas are easily extended to spatio-temporal models too.

Dr. Jay M. Ver Hoef
Alaska Fisheries Science Center, NOAA Fisheries

Spatial indexing provides a complete set of tools for model fitting and prediction of large spatial data and is readily available in the spmodel R package via the local argument to splm() (model fitting) and predict() (prediction).

Michael Dumelle
United States Environmental Protection Agency

Read the Original

This page is a summary of: Indexing and partitioning the spatial linear model for large data sets, PLoS ONE, November 2023, PLOS,
DOI: 10.1371/journal.pone.0291906.
You can read the full text:

Open access logo



The following have contributed to this page