Home | Research | Machine Learning and Cryo-EM Data Processing


Machine Learning for Cryo-EM

Single-particle analysis by cryo-electron microscopy (cryo-EM) is evolving into a powerful generic approach in structure determination. However, the noisy nature of cryoelectron micrographs hinders the maximal extraction of various structural information recorded in the micrographs. Conventional data analysis for single-particle cryo-EM has largely benefited from applications of multivariate data analysis approaches, such as principal component analysis (PCA), K-means clustering and linear regression. Further applications of statistical approaches such as maximum likelihood estimation and Bayesian theorem have led to improved single-particle reconstructions and reduced subjectivity in structure refinement. However, as a matter of fact, those advanced algorithms and statistical approaches developed in the areas of machine learning and artificial intelligence in computer science over last several decades have not yet been adapted to cryo-EM data analysis. There are great chances that the adaptation, evolution and innovation of cutting-edge machine learning approaches shall release great potential of single-molecule cryo-EM approaches and expand the applicability of this technology to meet the future challenges in life sciences and medicine. Our current research will attempt to innovate the cutting-edge approaches in machine learning and artificial intelligence for cryo-EM data analysis to address the future challenges in structural biology discovery. We are particularly interested in developing these approaches in the studies of biomolecular complex dynamics and structural systems biology.

Further Reading

J. Wu, Y. Ma, C. Congdon, B. Brett, S. Chen, Q. Ouyang, Y. Mao. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. PLoS ONE 12, e0182130 (2017). https://doi.org/10.1371/ journal.pone.0182130. arXiv: 1604.04539. Read

Y. Zhu, Q. Ouyang, Y. Mao. A deep convolutional neural network approach to single-particle recognition in cryo-electron microscopy. BMC Bioinformatics 18, 348 (2017). Read

Y. Xu, J. Wu, C.C. Yin, Y. Mao. Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm. PLoS ONE 11, e0167765 (2016). doi: 10.1371/journal.pone.0167765. arXiv: 1609.02213 [q-bio.QM]. Read


ROME: A machine-learning based HPC software package for cryo-EM data processing. Learn More

DeepEM: A deep-learning based particle recognition program. Learn More

ACK-means: Adaptively constrained K-means algorithm for data clustering. Learn More

Complex Dynamics of Soft Matter

Proteins are molecular machines. No individual molecular machines work alone in cells. How are the subunits in each machine work together? How are they dynamically coupled together? Answering these questions has been limited to small proteins or truncated simplified molecular constructs, in the history of biophysical studies by both nulear magnetic resonance and molecular dynamics simulation. Single-molecule cryo-EM opens the possibility of observing biomolecules in their action in a physiological condition. The single-molecule images, although noisy, contain information that reflects spatial organization of atoms, as well as their position along the paths of conformational transitions. The challenge is how one may extract the dynamic information along with the atomic organization from the noisy projection images in a bulk. Theory regarding methods and procedures is incomplete and lacking. The current methods of studying complex dynamics are either through classification of datasets based on their difference of conformations, or time-dependent sample preparation, or combination of both. However, the limitation of current computational tools restricts the processing of big data on TB level, or over millions of molecular images. Large-scale classification of massive image data for thousands of conformations is practically prohibited using exiting tools. Scientists working at LCMMB are exploring advanced knowledge and tools in mathematics, physics and computer sciences to address these challenges in solving atomic-level dynamics of biomolecular complexes.