1996-2018 All Rights Reserved. Online Journal of Veterinary Research. You may not store these pages in any form except for your own personal use. All other usage or distribution is illegal under international copyright treaties. Permission to use any of these pages in any other way besides the before mentioned must be gained in writing from the publisher. This article is exclusively copyrighted in its entirety to OJVR. This article may be copied once but may not be, reproduced or re-transmitted without the express permission of the editors. This journal satisfies the refereeing requirements (DEST) for the Higher Education Research Data Collection (Australia). Linking: To link to this page or any pages linking to this page you must link directly to this page only here rather than put up your own page


Online Journal of Veterinary Research

Volume 21(9):600-615, 2017



Clustering dairy cattle genes by Kullback-Leibler divergence


Houshang Dehghanzadeh1, Seyed Ziaeddin Mirhoseini*2, Mostafa Ghaderi-Zefrehei3, Hassan Tavakoli4, Saeid Esmaeilkhaniyan5


1,2 Department of Animal Science, Faculty of Agricultural Sciences, University of Guilan, Rasht, Iran, 3 Department of Animal Science, Faculty of Agricultural Sciences, University of Yasouj, Yasouj, Iran, 4 Department of Electrical Engineering, Faculty of Electrical Engineering, University of Guilan, Rasht, Iran, 5 Department of Biotechnology, Animal Science Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran.

*Corresponding Author: Seyed Ziaeddin Mirhoseini, Email:




Dehghanzadeh H, Mirhoseini SZ, Ghaderi-Zefrehei M, Tavakoli H, Esmaeilkhaniyan S., Clustering dairy cattle genes by Kullback-Leibler divergence, Onl J Vet Res., 21(9):600-615, 2017. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. DNA sequence of 30 genes involved with milk protein production were extracted ad hoc from NCBI genome database and stored in FASTA format. A C algorithm base 2 to calculate Shannon entropy of gene DNA sequences was used to extract cluster genes governing milk production in dairy cows by Kullback-Leibler (KL) divergence. KL was based on nucleotide similarity (KLA), difference (KLB) and different order of Relative Entropy (KLH). AdaBoost algorithm was used to interpret clustering results. Examples of results: STX3(nnucleotide =79347) and CD14 (nnucleotide = 1417) were longest and shortest genes, respectively. 258 exons were identified wherein exon 1 of HSPA1A(nnucleotide =2101) and HSPA5(nnucleotide = 20) were longest and shortest. LCP1 and ABCG2 genes had highest number of exons (nexon=16) and HSPA1A and YWHAG(nexon = 1) had shortest number exons for this set of genes. Findings suggested that exons with maximum entropy value are likely to be suitable for genotype analysis using molecular markers and that both coding and non-coding sequences had low or high complexity. KL divergence can be used to cluster large sets of dairy cattle genes with other methods to group biologically relevant sets of genes.


Key words: Information theory, Dairy cattle, Kullback-Leibler divergence, Gene clustering.