What is it about?
GouDa is a test data generator. It can be used to generate data with specific errors as well as a cleaned version of the data. These data sets can then be used for comprehensive analysis and evaluation of tools for data preprocessing and data cleaning.
Featured Image
Photo by Mika Baumeister on Unsplash
Why is it important?
There is a lack of appropriate data sets for data preparation and cleaning. Preprocessing and cleaning of data is an important component to ensure data quality. A variety of different methods and tools exist for this purpose. However, there is often a lack of data with which such tools can be comprehensively analyzed. There is also a lack of appropriate data sets for the development and testing of new approaches. For this reason, we have implemented GouDa, which can be used to generate suitable data sets.
Read the Original
This page is a summary of: GouDa - generation of universal data sets, June 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3533028.3533311.
You can read the full text:
Resources
Contributors
The following have contributed to this page