What is it about?
Functional dependencies (FDs) and candidate keys are essential for table decomposition, database normalization, and data cleansing. In this paper, we present FDTool, a command line Python application to discover minimal FDs in tabular datasets and infer equivalent attribute sets and candidate keys from them. The runtime and memory costs associated with seven published FD discovery algorithms are given with an overview of their theoretical foundations. We conclude that FD_Mine is the most efficient FD discovery algorithm when applied to datasets with many rows (> 100,000 rows) and few columns (< 14 columns). This puts it in a special position to rule mine clinical and demographic datasets, which often consist of long and narrow sets of participant records. The structure of FD Mine is described and supplemented with a formal proof of the equivalence pruning method used. FDTool is a re-implementation of FD Mine with additional features added to improve performance and automate typical processes in database architecture. The experimental results of applying FDTool to 12 datasets of different dimensions are summarized in terms of the number of FDs checked, the number of FDs found, and the time it takes for the code to terminate. We find that the number of attributes in a dataset has a much greater effect on the runtime and memory costs of FDTool than does row count.
Featured Image
Why is it important?
Here is an all-in-one executable that is to be used to find functional dependencies, equivalent attribute sets and candidate keys directly from data. The application reads in a data file and outputs a text file composed of such association rules. These association rules are essential for database normalization, data cleaning and schema design.
Read the Original
This page is a summary of: FDTool: a Python application to mine for functional dependencies and candidate keys in tabular data, F1000Research, October 2018, Faculty of 1000, Ltd.,
DOI: 10.12688/f1000research.16483.1.
You can read the full text:
Contributors
The following have contributed to this page