Rapid generation of multi-omics data has promoted our understanding of the nature of life, rising from the level of a single molecule or a single gene to the systematic level of the entire biological system. Multi-omics data brings opportunities for a deeper understanding of complex diseases, and also offers a  powerful toolkit for functional genomics. However, multi-omics data integration brings enormous challenges for method development, and gene editing tools used for functional genomics also have problems, including potential off-target effects and low on-target efficiency. Therefore, we will apply computational biology to develop new methods for multi-omics data integration, identify new biomarkers based on multi-omics data integration, and develop new gene editing tools by using bioinformatic analysis. 

Biomarker discovery of complex diseases based on multi-omics data integration  

Early detection and prognosis prediction have been the major interests in clinical research. We developed an edge-based data integration method that uses the correlation between molecules to construct edge strength. Edge strength can be used as a new molecular feature in subsequent analysis to distinguish differences between different phenotypes. Compared with methods based on expression differences, edge strength can better integrate multi-omics data, reflect the regulatory relationship between biological networks, and more sensitively capture small network disturbances in disease states. We applied the edge-strength method in the prediction of breast cancer prognosis and prediction of type II diabetes outcome by building machine learning models, and both achieved high prediction accuracy. In addition, kinase-phospho-substrate network based on edge strength features and machine learning models could accurately predict the drug response of mCRC to kinase inhibitors. These studies have elucidated the significance of edge strength in multi-omics data integration, and could also be applied in the prognosis, prediction, and treatment of other complex diseases like neurodegenerative disorders. 

Development of gene editing tools 

We also focused on the development and improvement of gene editing tools. We and our cooperators established an off-target detection technology called GOTI to identify potential off-target edits of gene editing tools with high accuracy and sensitivity. We applied GOTI to evaluate three commonly used gene editing tools, CRISPR-Cas9, cytosine base editor (BE3) and adenine base editor (ABE7.10), and found BE3 could induce substantial off-target edits. The corresponding bioinformatic pipeline GOTI-seq is a valuable method that is expected to help improve the current gene editors and generate new genome editing tools with higher specificity. By exploring the characteristics of BE3 induced off-target edits, we found the off-target effects were probably induced by the deaminase APOBEC1. Considering that APOBEC1 was also RNA deaminase, we next explored whether base editors could induce off-target effects on RNA level. Not surprisingly, we found both BE3 and ABE7.10 induced substantial RNA off-target effects by RNA-seq analysis. Then to improve the fidelity of base editors, we predicted key amino acids responsible for the ssDNA binding ability of APOBEC1 and introduced mutations on these loci. We totally screened 23 mutants and found 4 mutants could significantly reduce the DNA and RNA off-target edits to base level. By combing with BE3-FNLS with higher editing efficiency, we obtained a new base editor “YE1-BE3-FNLS” that retains high on-target editing efficiency while causing extremely low off-target edits and bystander edits. The research achievement has been selected as one of the “Top 10 Bioscience Achievements of 2019 in China”.  

    

    

SUN Yidi,Ph.D.

Young Investigator