The more, the better

Newswise — "Gavin Rice, the co-first author of the publication, states that TomoTwin facilitates the automated detection and positioning of proteins within their cellular surroundings, thereby broadening the possibilities of cryo-ET. Cryo-ET holds promise in unraveling the inner workings of biomolecules in cells, thereby shedding light on the foundations of life and the roots of diseases."

In a cryo-ET test, researchers employ a transmission electron microscope to acquire 3D tomograms of the cellular space housing intricate biomolecules. For enhanced protein visualization, they aim to accumulate a maximum number of protein copies, akin to photographers capturing multiple shots at different exposures and later merging them into a flawlessly exposed image. Significantly, prior to averaging, it is imperative to accurately detect and locate the distinct proteins in the image. Rice remarks, "While scientists can obtain numerous tomograms daily, we lacked the means to comprehensively identify the molecules present within them."

Hand-picking

Thus far, scientists have relied on algorithms that utilize templates of preexisting molecular structures to seek matches within the tomograms. However, these methods are prone to errors. Another alternative is manual identification, which guarantees accurate picking but is time-consuming, taking several days to weeks for each dataset.

An alternative approach could involve employing supervised machine learning techniques. While these tools can achieve high accuracy, their current usability is limited. This is because they necessitate the manual labeling of thousands of examples to train the software for each new protein, which proves to be an arduous task, particularly for small biological molecules within a densely populated cellular environment.

TomoTwin

TomoTwin, the recently developed software, overcomes numerous challenges by utilizing an innovative approach. It learns to identify proteins with similar shapes within a tomogram and maps them to a geometric space. The system is designed to reward placing similar proteins close to each other and penalize other placements. In this new map, researchers can isolate and accurately identify distinct proteins, enabling precise localization within the cell. Rice explains, "One advantage of TomoTwin is that we offer a pre-trained picking model," eliminating the need for extensive training. As a result, the software can even run on local computers, significantly reducing processing time. While processing a tomogram typically takes 60-90 minutes, on the MPI supercomputer Raven, the runtime is reduced to only 15 minutes per tomogram.

TomoTwin significantly enhances the efficiency of researchers by enabling them to pick dozens of tomograms within the timeframe it would take to manually pick just one. This remarkable improvement in data throughput accelerates the averaging rate, leading to the generation of higher-quality images. Presently, the software can locate globular proteins or protein complexes larger than 150 kilodaltons within cells. However, the Raunser group's future objectives involve expanding the capabilities of TomoTwin to encompass membrane proteins, filamentous proteins, and smaller-sized proteins. This ongoing development promises to further broaden the application and utility of the software.

Journal Link: Nature Methods