Uncharted genetic territory offers insight into human-specific proteins

When researchers working on the Human Genome Project completely mapped the genetic blueprint of humans in 2001, they were surprised to find only around 20,000 genes that produce proteins. Could it be that humans have only about twice as many genes as a common fly? Scientists had expected considerably more.

Now, researchers from 20 institutions worldwide bring together more than 7,200 unrecognized gene segments that potentially code for new proteins. For the first time, the study makes use of a new technology to find possible proteins in humans — looking in detail at the protein-producing machinery in cells. The new study suggests the gene discovery efforts of the Human Genome Project were just the beginning, and the research consortium aims to encourage the scientific community to integrate the data into the major human genome databases.

The study recently published story in Nature Biotechnology, was co-led by Dr. Jorge Ruiz- Orera from Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC) in Germany, Dr. Sebastiaan van Heesch from the Princess Máxima Center for pediatric oncology in the Netherlands, Dr. Jonathan Mudge from the European Molecular Biology Laboratory — European Bioinformatics Institute (EMBL-EBI) in the United Kingdom, and Dr. John Prensner from the Broad Institute of MIT and Harvard in the United States.

New gene sequences remained out of reach

In the past few years, thousands of frequently very small open reading frames (ORFs) have been discovered in the human genome. These are spans of DNA sequence that may contain instructions for building proteins. Several authors of the current study have previously found ORFs and described them in scientific journals: Van Heesch, together with MDC-Professors Norbert Hübner and Uwe Ohler described new mini-proteins in the human heart and reported on them in “Cell in 2019; Prensner also published on ORFs in “Nature Biotechnology” in 2021. Yet none of these previously virtually unexplored segments were included afterwards in reference databases. Other sequences were reported in journals such as “Science” or “Nature Chemical Biology,” but remained largely out of reach for most members of the scientific community — despite evidence that they produce RNA molecules that subsequently bind to ribosomes, the cell’s protein factories.

Traditionally, protein-coding regions in genes have been identified by comparing DNA sequences from multiple species: the most important coding regions have been preserved during animal evolution. But this method has a drawback: coding regions that are relatively young, i.e., that arose during the evolution of primates, fall through the cracks and are therefore missing from the databases.

Source: Read Full Article