What is it about?
In this work, we analysed the visual words (codebook) of protein distance matrices. We studied the relationship between the size of the vocabulary and the classification accuracy. The result was that codewords with higher relative frequency are generally closer to the main diagonal of the distance matrix. We also showed that solenoid domains have a much lower proportion of unique codewords compared to globular proteins, and that the feature vector (codeword histogram) together with a support vector machine classifier can be used very efficiently to discriminate between globular and solenoid proteins.
Featured Image
Photo by ANIRUDH on Unsplash
Why is it important?
We also showed that solenoid domains have a much lower proportion of unique codewords compared to globular proteins, and that the feature vector (codeword histogram) together with a support vector machine classifier can be used very efficiently to discriminate between globular and solenoid proteins.
Perspectives
Read the Original
This page is a summary of: Quantitative analysis of visual codewords of a protein distance matrix, PLoS ONE, February 2022, PLOS,
DOI: 10.1371/journal.pone.0263566.
You can read the full text:
Contributors
The following have contributed to this page