What is it about?

Metabonomics has been applied for predictive modeling in diverse fields of research ranging from toxicology and nutrition to parasitology and molecular epidemiology,1 including disease diagnosis and therapy monitoring. Though vast developments in metabonomics over the last decade were observed particularly in modelling methods, further contributions are needed in these areas, including enhanced mathematical analysis. Currently, partial least squares regression (PLSR) and its variants are the preferred modelling approaches in metabonomics due to their flexibility and accuracy in catering to the complexity of these data including their suitability in handling the issue of multicollinearity. However, PLSR typically requires a large training sample size and a large number of indicators of each latent variable which may be disadvantageous for rare metabonomic datasets such as those of rare diseases. In addition, it would be of interest to reduce PLSR's training complexity and hence the processing time when dealing with metabonomics data such as gas chromatography/mass spectrometry (GC/MS) total ion chromatograms (TICs) which tend to be very large. A common method to reduce the computational complexity of classification in general is to use dimensionality reduction approaches prior to classification. Dimensionality reduction techniques can be broadly divided into variable selection and transformation. Variable selection approaches can identify the significant variables but may not perform well when the data are highly correlated. Transformation based approaches tend to combine variables without selecting a subset of significant variables. There are many different dimensional reduction approaches and this increases the complexity of finding an optimum dimensionality reduction approach for PLSR and its variants for each metabonomics dataset. Hence it would be useful to develop a simpler modelling approach to address these problems. A study has shown variable ranking via the correlation based feature selection8 which uses the magnitude of the Pearson's correlation coefficient between the class values and variable values for each feature to be promising. In this study, we extended from correlation based feature selection, and created a new automated Pearson's correlation change classification (APC3) technique which have high computational efficiency. The aim of this study is to evaluate the performance of APC3 by comparing it with other classification algorithms, classification algorithms in combination with transformation techniques and classification algorithms in combination with variable selection approaches using TICs of binominal GC/MS data.

Featured Image

Why is it important?

A fully automated and computationally efficient Pearson's correlation change classification (APC3) approach is proposed and shown to have overall comparable performance with both an average accuracy and an average AUC of 0.89 ± 0.08 but is 3.9 to 7 times faster, easier to use and have low outlier susceptibility in contrast to other dimensional reduction and classification combinations using only the total ion chromatogram (TIC) intensities of GC/MS data. The use of only the TIC permits the possible application of APC3 to other metabonomic data such as LC/MS TICs or NMR spectra. A RapidMiner implementation is available for download at http://padel.nus.edu.sg/software/padelapc3.

Perspectives

In this study, we developed APC3, which is a fully automated, computationally efficient method based on correlation based feature selection for the development of models in metabonomics. We compared APC3 with various common dimensionality reduction and classification combinations and the results show that APC3 has similar performance to the top few dimensionality reduction and classification combinations. The advantage of APC3 over these dimensionality reduction and classification combinations is that it is fully automated and is 3.9 to 7 times faster than dimensionality reduction and classification combinations. This would minimize user interactivity and allow efficient processing of extremely large datasets in a high throughput approach.7 The successful application of APC3 in processing GC/MS data suggests its potential application in analysing other forms of biological chromatographic data such as LC/MS TICs.

Dr Eric Chun Yong Chan
National University of Singapore

Read the Original

This page is a summary of: An automated Pearson's correlation change classification (APC3) approach for GC/MS metabonomic data using total ion chromatograms (TICs), The Analyst, January 2013, Royal Society of Chemistry,
DOI: 10.1039/c3an00048f.
You can read the full text:

Read

Contributors

The following have contributed to this page