CCK (Clustering-Classification-Kappa) a new validation index to assessing clustering results of gene expression data

Shakeri, MT; Sabaghian, E; Esmaeili, H

doi:10.29252/jnkums.3.5.S5.67

Volume 3, Issue 5 And S5 (monograph2011 2012) 2012, 3(5 And S5): 67-78 | Back to browse issues page

‎ 10.29252/jnkums.3.5.S5.67

Mendeley

Zotero

RefWorks

Shakeri M, Sabaghian E, Esmaeili H. CCK (Clustering-Classification-Kappa) a new validation index to assessing clustering results of gene expression data. North Khorasan University of Medical Sciences 2012; 3 (5) :67-78
URL: http://journal.nkums.ac.ir/article-1-251-en.html

CCK (Clustering-Classification-Kappa) a new validation index to assessing clustering results of gene expression data

MT Shakeri

, E Sabaghian

, H Esmaeili ^*

Abstract: (5848 Views)

Abstract Background& Objective: The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. Method & Material: We present CCK index for assessing clustering result of gene expression data. This index was made by combining two arbitrary classification and clustering algorithms result and finally. the first large-scale analysis of nine different clustering methods, Hierarchical clustering with Single, Average, Complete and Ward linkages, UPGMA, Diana, K-means, PAM and CLARA methods for the analysis of 5 cancer gene expression data sets. Afterward we use Margin Trees method for assessing quality of result of clustering methods. Ultimately we calculate quality of result of clustering methods via Kappa coefficient between result of clustering methods and result of Margin Tree method for each clustering methods. Results: Our results reveal that the PAM, followed closely by CLARA, exhibited the best performance in terms of recovering the true structure of the data sets. Also we found that Partitioning clustering methods (PAM, CLARA and K-means) have better performance than Hierarchical clustering methods (Hierarchical clustering with Single, Average, Complete and Ward linkages, UPGMA and Diana). Conclusion: The validation technique was used in this paper (Margin Trees) can aid in the selection of an optimal algorithm, for a given data set, from a collection of available clustering algorithms.

Keywords: Keyword: Clustering, Microarray, Bootstrap, Indicator to assess the of clustering methods

Full-Text [PDF 444 kb] (2975 Downloads)

Type of Study: Orginal Research | Subject: Basic Sciences
Received: 2015/02/5 | Accepted: 2015/02/5 | Published: 2015/02/5

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Designed & Developed by: Yektaweb

Journal of North Khorasan

University of Medical Sciences

Related Websites