Identification of expression signatures for non-small-cell lung carcinoma subtype classification

Abstract

Non-small-cell lung carcinoma (NSCLC) mainly consists of two subtypes, lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD). It has been reported that the genetic and epigenetic profiles vary strikingly between LUAD and LUSC in the process of tumorigenesis and development. Efficient and precise treatment can be made if subtypes can be identified correctly. Identification of discriminative expression signatures has been explored recently to aid the classification of NSCLC subtypes. In this study, we designed a classification model integrating both mRNA and long noncoding RNA (lncRNA) expression data to effectively classify the subtypes of NSCLC. A gene selection algorithm, named WGRFE, was proposed to identify the most discriminative gene signatures within the recursive feature elimination (RFE) framework. GeneRank scores considering both expression level and correlation, together with the importance generated by classifiers were all taken into account to improve the selection performance. Moreover, a module-based initial filtering of the genes was performed to reduce the computation cost of RFE. We validated the proposed algorithm on The Cancer Genome Atlas (TCGA) dataset. The results demonstrate that the developed approach identified a small number of expression signatures for accurate subtype classification and particularly, we here for the first time show the potential role of LncRNA in building computational NSCLC subtype classification models. The R implementation for the proposed approach is available at https//github.com/RanSuLab/NSCLC-subtype-classification.

Publication
In Bioinformatics