Classification and gene selection of Triple negative breast cancer subtype embedding gene connectivity matrix in deep neural network

Abstract

Triple-negative breast cancer (TNBC) has been a challenging breast cancer subtype for oncological therapy. Normally, it can be classified into molecular subtypes. Accurate and stable classification of the six subtypes is essential for personalized treatment of TNBC. In this study, we proposed a new framework to distinguish the six subtypes of TNBC and this is one of the handful studies that completed the classification based on mRNA and long non-coding RNA (lncRNA) expression data. Particularly, we developed a gene selection named DGGA, which takes correlation information between genes into account in the process of measuring gene importance, and then effectively removes redundant genes. A gene scoring approach that combined GeneRank scores with gene importance generated by deep neural network (DNN), taking inter-subtype discrimination and inner-gene correlations was came up to improve gene selection. More importantly, we embedded a gene connectivity matrix in the DNN for sparse learning, which takes additional consideration with weight changes during training when obtaining the measurement of the relative importance of each gene. Finally, genetic Algorithm (GA) was used to simulate the natural evolutionary process to search for the optimal subset of TNBC subtype classification. We validated the proposed method through cross-validation, and the results demonstrate that it can use fewer genes to obtain more accurate classification results. The implementation for the proposed method is available at https://github.com/RanSuLab/TNBC.

Publication
In Briefings in Bioinformatics