What is it about?

Earlier work of Bárány et al. proposed a language based on Datalog (a recursive database query language) that allows drawing random samples during the querying process. This connects the core Datalog language with principles of probabilistic programming. In our paper, their language is extended to allow sampling from continuous distributions, such as the Gaussian distribution. The main effort is to provide the resulting language with a well-defined semantics by modelling its behavior in rigorous mathematical terms.

Featured Image

Why is it important?

The original design of Bárány et al. was limited to discrete distributions. This is a severe limitation for the practical flexibility of the language. Extending it to continuous distribution is the natural next step but was left as an open question in their work. In general, techniques for dealing with, arguing about and introducing uncertainty becomes increasingly important for the management and usage of large-scale data. A language such as the one discussed is not only be used to extend Datalog querying with statistical facilities, but also offers a way of representing rich probabilistic database models. Such probabilistic toolkits should also support continuous distributions: Gaussian distributions, for example, provide a standard way of modelling error in discrete data.

Perspectives

For the future, more research is needed for exploring some theoretical properties. The first that come to mind are conditioning program outputs, and arguing about the termination of a program. Moreover, for putting the language on the desk of practitioners, efficient implementations are needed that go beyond the naive sampling procedure underlying the semantics. Theoretical research, especially with respect to termination and equivalence, could assist in achieving this goal.

Peter Lindner
Ecole Polytechnique Federale de Lausanne

Read the Original

This page is a summary of: Generative Datalog with Continuous Distributions, Journal of the ACM, November 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3559102.
You can read the full text:

Read

Contributors

The following have contributed to this page