We work on a range of projects, with the collective goal of pushing the limits of computational and theoretical methods and applying them to important problems in biophysics and biophysical chemistry.
Applications of physical simulation and Bayesian statistics/Machine Learning to biologically and biomedically important questions
Currently drug discovery is a very expensive and time-consuming process––on average about 12 years and over a billion dollars to bring a drug to market––and only a tiny fraction of drug candidates ever receives approval for human use, since many fail for efficacy and toxicity issues. Hit identification in drug discovery often involves expensive high-throughput screens of large compound collections. However, if we know the structure of the target or a set of active compounds, we can use computational shortcuts––virtual screening tools, such as docking and molecular similarity––to prioritize experiments and reduce screening costs. Our lab develops new methods for virtual screening and molecular similarity calculations. We then apply chemical similarity methods to identify therapeutics for disease. We are especially interested in repurposing existing approved drugs for different diseases, to circumvent toxicity issues and to speed up moving these therapies into the clinic. We have ongoing projects involving a variety of diseases, including neglected diseases like Malaria, Chagas disease, and Dengue fever.
Many diseases result from protein misfolding, i.e. their in ability to self-assemble (or “fold”) into the right structure. For example, AD is result of the undesired aggregation of Aβ peptides. Surprisingly, the toxic element is the small oligomers (4-16 monomers) of Aβ. Since Aβ itself is small (42 res), simulation approaches using our advanced methodology for kinetics and thermodynamics should be able to shed light on the nature of the structure and stability of these oligomers. We are also applying these methods to study protein misfolding diseases more broadly, first by applying these methods to Huntington’s Disease.
The future of much of biology and chemistry lies at the connection between genes, small molecules (drugs), and biochemical pathways. To unravel the connections between them, we apply machine learning, Bayesian statistics, atomistic simulation, bio-informatics, and chem-informatics methods to the application to problems of linking drug efficacy and side effects to genomics and systems biology. My group also has expertise in related synergistic areas, such as theoretical physical chemistry, structural biology, computer science, and large-scale distributed computing. By combining our methods with the Folding@home distributed computing project (currently the most powerful supercomputer in the world, with almost 10 petaflops of performance), we have a unique opportunity to push the state of the art in these and related areas. Finally, via collaborations with biotechs, pharmaceutical companies, and experimental groups interested in drug design, we can directly test our predictions, thus strengthening our methods as well as the direct impact of our results.
While understanding the nature of folding in vitro is a challenging biophysical question, understanding folding in vivo is the dominant biological question. In collaboration with several experimental groups, we are now performing simulations of folding in biologically relevant conditions, i.e. models of cellular conditions, and with the relevant, important biochemical machinery. While these simulations will be extremely demanding, they should shed insight in ways that were previously impossible.
Moreover, we feel that it is clear that the future of Biophysics is in its connection directly to cellular environments, and we are working to pioneer new methods to be able to directly tackle this set of challenging problems.
Feynman famously stated in his 1963 Lectures, “Everything that living things do can be understood in terms of the jiggling and wiggling of atoms.” Modern biophysics, however, has only begun to connect the microscopic motions and forces of atoms to biological function. For instance, protein folding – the last step in Biology’s central dogma – remains poorly understood despite over half a century of study. While certain principles of protein folding are well understood, no satisfactory description exists of how robust, fast folding emerges from chemical physics.
Our lab aims to bridge this gap using atomic molecular simulation to study protein dynamics, both folding and conformational change. These simulations play as a counterpart to experiment; we act in two roles, both to generate hypotheses to test in the lab, and to aid interpretation of complex experimental results. For example, in one case simulations we performed predicted a low-amplitude slow process in lambda repressor that was later verified by experiment. Intriguingly, this process was due to non-native beta-sheet rich structures; current research in the lab is ongoing as to if similar mechanisms can explain the beta driven aggregation common to many protein systems.
We study protein folding, conformational change in GPCRs and kinases, aggregation in amyloidogenic systems, and the physical properties of engineered proteins, hoping to answer questions such as: What are the physical foundations of protein dynamics? How has biology engineered proteins to implement both folding and function? Can we leverage this knowledge to design new pharmaceuticals and impact human health?
For several decades, understanding how proteins self-assemble (or “fold”) has been a challenging problem in physical chemistry with important ramifications for structural biology and nanotechnology. Moreover, understanding protein folding is an important paradigm for many other difficult problems in structural biology and physical chemistry.
Our goals have been to develop novel computational methods for greatly pushing the envelope in folding simulation, with a goal of directly and quantitatively predicting all possible experimental observables. Using novel algorithms and the power of Folding@home, we have been able to, for the first time, simulate folding dynamics directly from the sequence.
We have recently been using applying our MSM methods (allowing for all-atom simulations on the millisecond timescale) to key questions of how proteins undergo conformational change in accordance with their function. Our first targets have been key proteins in biology and disease. For example, roughly half of all known drugs work by impacting a G-Protein Coupled Receptor (GPCR). Also, kinases are key proteins in cell signaling and key targets in cancer therapies. Finally, we have also been applying these methods to study and re-design enzymes more broadly.
Water and other cosolvents (such as urea) play an active role on biomolecular self-assembly. Indeed, the hydrophobic effect is a dominant driving force. How does water influence the nature of biomolecular structure formation and does it play a structural (rather than general continuum) role? Using full-atomistic simulation with quantitative comparison to experiment, we can now start to detail the answers to these questions.
New paradigms for supercomputing: worldwide distributed computing and Graphics Processing Units (GPUs)
Current atomistic simulations are greatly limited by the available computational power. In order to even attempt a direct comparison to experiment, many simulations would need to be run for thousands of years.
Distributed computing opens the door to new possibilities. Using 100,000 CPUs distributed throughout the world (“Folding@home” and well designed algorithms, one can turn 100,000 CPU days (= 300 years!) into one day of simulation.
Starting in 2005, we have pushed to a new computing paradigm: Graphics Processing Units (GPUs). Our efforts have lead to extremely powerful molecular dynamics software, which is also part of Folding@home. Moreover, this work lead to Folding@home on game consoles, such as the PS3. The combined power of the GPUs and PS3s has made Folding@home the most powerful supercomputer cluster in the world, approaching the 10 petaflop scale.
While the fastest proteins fold in tens of microseconds to milliseconds, atomistic simulations are limited to the nanosecond regime. How can we break this fundamental impasse? While using many 100,000 CPUs with distributed computing can give the raw horsepower, clearly well designed algorithms are needed to efficiently use distributed computing. Indeed, just as 100,000 grad students can’t work together finish 300 years of work in one day, folding simulations must be designed in order to be parallelized to this scale. By taking advantage of the nature of folding kinetics (single exponential behavior of single domain proteins), one can devise natural ways to speed folding simulation 100,000x using distributed computing. This allows us to simulate folding on the millisecond timescale.
Another great challenge in physical chemistry simulation is the ability to calculate free energies to chemical accuracy and precision (e.g. to 1 kcal/mol). With such capabilities, one could use simulation in drug lead discovery and refinement. We are developing novel means to use distributed computing to make a fundamental advance, with a 1kcal/mol accuracy in absolute ΔG calculation as our goal.