方亚平

博士

【方亚平】个人简历
生日:
籍贯:
毕业院校:
E-Mail:
通信地址:
兴趣爱好:






近期发表文章
  • Identification of RNA-binding sites in proteins by integrating various sequence information

  • Amino Acids,s00726-010-0639-7

    CuiCui Wang  , Yaping Fang  , JiaMin Xiao  , Menglong Li

    Abstract:  

    RNA–protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew’s correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Predict_RBP.rar.



  • Investigation of the proteins folding rates and their properties of amino acid networks

  • Chemometrics and Intelligent Laboratory Systems,(2010) 123–129

    Yaping Fang  , DaiChuan Ma  , Menglong Li*  , Zhining Wen  , YuanBo DIAO 

    Abstract:  

    The mechanism of protein folding is an important problem in molecular biology. It is usually thought that protein folding is a complex system process related to the entire molecule. In this article, we have investigated 78 structures of folding proteins in native state, from complex networks perspective, to understand the role of topological parameters in proteins folding kinetics. The 31 parameters were calculated based on the amino acid networks of the folding proteins. The relationship between those parameters and protein folding rates has been systematically analyzed. Our results show that the significant parameters between two-state and multi-state folding proteins correlate well with the folding rates of proteins. It is also found that classifying the proteins into different classes can improve the correlation coefficient from 0.926 to 0.983 between the parameters and folding rates of two- and multistate proteins, respectively. Genetic Algorithms–Multiple Linear Regression (GA–MLR) was adopted to select the best subset parameters from the whole 31 parameters to construct the MLR model to avoid overfitting. Ourmethods showa correlation coefficient of 0.921 for the all folding proteins based on the classification of the folding proteins. The results indicate that the general topological parameters of the amino acids networks of the folding proteins can effectively represent the structural and functional properties, such as the rates of folding.



  • Optimal QSAR Analysis of the Carcinogenic Activity of Aromatic and Heteroaromatic Amines

  • QSAR & Combinatorial Science,200710077

    Yaping Fang  , Yi Feng  , Menglong Li

    Abstract:  

    Aromatic and heteroaromatic amines are widely used in industrial chemicals and can be found in cooked foods and in tobacco smoke. In this study, Quantitative Structure –Activity
    Relationships (QSARs) are developed that correlate the observed carcinogenic activities of 80 aromatic and heteroaromatic amines. Principal Component Regression and
    stepwise linear regression techniques have been applied to construct the QSAR models. The performance of these two models is slightly superior compared to the previous
    reported based on the same dataset by multiple linear regression techniques. To improve the performance, Support Vector Regression (SVR) has been used to construct the
    QSARs and Genetic Algorithm (GA) has been used to select the most informational descriptors. Additionally, by introducing the concept of the weighting technique into the
    model, a new SVR, optimized sample-weighted SVR is proposed. The optimal weighted coefficient is 0.2. The results suggest that approaches using GA selecting descriptors and
    weighting the descriptors can effectively improve the performance of the SVR models. The optimal Root Mean Square Error in Prediction is 0.799, which is relative smaller
    than other models. Jackknife-testing procedure has been used to validate the models. The results indicate that the selected descriptors by GA and weighting technique are
    important and necessary to improve the performance of QSAR models by SVR.



  • Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features

  • Amino Acids,(2008) 34: 103–109

    Yaping Fang  , Yanzhi Guo  , Yi Feng  , Menglong Li

    Abstract:  

        DNA-binding proteins play a pivotal role in gene regulation.It is vitally important to develop an automated and efficient method for timely identification of novel DNA-binding proteins. In this study, we proposed a method based on alone the primary sequences of proteins to predict the DNA-binding proteins. DNA-binding proteins were encoded by autocross-covariance transform, pseudo-amino acid composition, dipeptide composition, respectively and also the different combinations of the three encoded methods; further, these feature matrices were applied to support vector machine classifiers to predict the DNA-binding proteins. All modules were trained and validated by the jackknife cross-validation test. Through comparing the performance of these substituted modules, the best result was obtained from pseudo-amino acid composition with the overall accuracy of 96.6% and the sensitivity of 90.7%. The results suggest that it can efficiently predict the novel DNA-binding proteins only using the primary sequences.