What is it about?

When trying to identify specific types of values within a text corpus, one common approach is to use regular expressions (regexes). However, this normally requires writing a regex for each type. Instead, we leverage a large corpus of regular expressions from a regex playground as features to a machine learning algorithm to automatically identify regexes that can be used to classify types.

Featured Image

Read the Original

This page is a summary of: Learning from Uncurated Regular Expressions for Semantic Type Classification, June 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3596225.3596226.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page