A Feature Selection Based on One-Way-Anova for Microarray Data Classification
High dimensionality of microarray data and expressions of thousands of features in a much smaller number of samples is a challenge affecting the applicability of the analytical results. However Support Vector Machine (SVM) has been commonly used in the classification of microarray datasets, yet the problem of high dimensionality of the feature space of data still exist. This study deals with the reduction of gene expression data into a minimal subset of genes, by introducing feature selection, to greatly reduce computational burden and noise arising from irrelevant genes that can perform a classification of cancer from microarray data using machine learning. Various statistical theory and Machine Learning (ML) algorithms to select important features, remove redundant and irrelevant features have been proposed, but it is unclear how these algorithms respond to conditions like small sample-sizes. This paper presents combination of Analysis of Variance (ANOVA) for feature selection; to reduce high data dimensionality of feature space and SVM algorithms technique for classification; to reduce computational complexity and effectiveness. Computational burden and noise arising from redundant and irrelevant features are eliminated. It reduces gene expression data to a lesser number of genes rather than thousands of genes, which can drop the cost for cancer testing significantly. The proposed approach selects most informative subset of features for classification to obtain a high performance accuracy, sensitivity, specificity and precision.