What is it about?
Climate datasets are usually enormous in size, but researchers are mostly interested in small chunks of the dataset at a time. Currently, researches rely on hand-written Python or Julia scripts to filter and analyze large datasets. The direct alternative, i.e. dedicated spatiotemporal data management systems, are difficult to set up and use. Northlight is an extension for the popular SparkSQL data analysis framework, and allows researchers to efficiently access and process large distributed climate datasets using standard SQL queries.
Featured Image
Photo by Liane Metzler on Unsplash
Why is it important?
Northlight includes a novel algorithm to convert a (possibly non-convex) query predicate into a set of convex, non-overlapping regions, which can then be loaded from the dataset. The algorithm does not rely on a pre-built index structure. This process would take a lot of effort to perform manually for each query.
Read the Original
This page is a summary of: Northlight: Declarative and Optimized Analysis of Atmospheric Datasets in SparkSQL, July 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3538712.3538715.
You can read the full text:
Resources
Contributors
Be the first to contribute to this page