In this study, we introduce a new set of features, which are extracted from the pre-cursor sequence only that based on distance between k-mers. The new set of those features is named k-mer distance. The new set are capturing the distance between each k-mer and the rest of the k-mers. The distance is calculated to be the average. The final value is normalized by the length of the sequence. Surprisingly the new set of features works as well as other hundreds of published features.
The new suggested set of features will help to explore the importance of the location of k-mer and its relation to other k-mers. At other studies, we have shown that the frequency of k-mers are important for miRNA categorization and for classification. The frequency doesn’t provide any information about the k-mer locations in the sequences.
This study supports our hypothesis that the sequence of miRNA contains a hidden message that allows the specific enzyme to recognize for further processing.
Malik Y. Yousef is a data scientist, with focus on bioinformatics with applications to various biomedical/biological problems. He has published more than 55 peer‐reviewed articles in top journals and proceedings with over 2400 citations and an H-index of 18 and i10-index of 20 (based on Google scholar).
His international experience includes 3 years as a postdoc at The Wistar Institute, Cancer Center, USA [Prof Louise Showe Cancer Biology lab] and one year at the University of Pennsylvania [UPENN-Bioinformatics Center]. Currently, he is an Assistant Professor at the Zefat Academic College in Israel.