Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity

Chinese Chemical Letters,Volume 34, Number 1,Jan,2008

guangxuan Min  , Yanzhi Guo  , Menglong Li*  , Tuanfei Zhu 


The knowledge of subnuclear localization in eukaryotic cells is indispensable for understanding the biological function of nucleus,
genome regulation and drug discovery. In this study, a new feature representation was proposed by combining position specific scoring matrix (PSSM) and auto covariance (AC). The AC variables describe the neighboring effect between two amino acids, so that they incorporate the sequence-order information; PSSM describes the information of biological evolution of proteins. Based on this new descriptor, a support vector machine (SVM) classifier was built to predict subnuclear localization. To evaluate the power of our predictor, the benchmark dataset that contains 714 proteins localized in nine subnuclear compartments was utilized. The total jackknife cross validation accuracy of our method is 76.5%, that is higher than those of the Nuc-PLoc (67.4%), the OETKNN
(55.6%), AAC based SVM (48.9%) and ProtLoc (36.6%). The prediction software used in this article and the details of the SVM parameters are freely available at predict_SubNL/index.htm and the dataset used in our study is from Shen and Chou’s work by downloading at bioinf/Nuc-PLoc/Data.htm.

Chinese Chemical Letters