Illuminating the Dark Matter of the Genome


Human biology starts with DNA. The letters A, C, T and G are the bases contained in this stunning little constellation of chemicals we call DNA. Tightly packed inside cells of your body is your genome... these bases are strung together to form one of the longest most complex molecules in the world. In fact, if you took each strand of DNA from each cell of your body, laying them end to end, it would be long enough to stretch to the sun and back, 300 times. That's over 93M miles times X 300, or about 100 days of travel time in a spaceship traveling at light speed. 3.3 Billion base pairs of A, C, T, and G.  How does it all fit in your body? Scientists all over the world have marveled over the length and complexity for decades. And hidden beneath this seemingly haphazard sequence of DNA bases lies the secrets to life.  

The Human Genome Project. Sometime in the late 1980’s the NIH decided to fund a 15-year $200M/year project aimed to sequence the human genome in its entirety.  This was a significant investment to a project as ambitious as sending a man to the moon. 

Critics argued that the project was sold on hype and glitter and that it would drain talent and money from smaller, scientifically meritorious biomedical research efforts. They argued that this new huge foray in big ticket science would be divisive, and that $200 million would be better disbursed in the form of 1,000 NIH grants focused on a diversity of collectively worthier projects. Others wondered philosophically whether we might be lurking where no human should go, since successfully deciphering the veritable blueprint of life might amputate the intrinsic mysteries of humanity. They argued that some things should remain a mystery, like your partner’s sexual history or the contents of a sausage.  

Its proponents argued that deciphering the blueprint would provide a reference for all sorts of future scientific and medical inroads. At the time they reasoned that defining the predicted 100,000 human genes was the first step in understanding fundamental biology and human disease. Unlike sending a man to the moon, perhaps an occasional event, they argued that visiting the human genome would be something that every student and every doctor would use every day in the next century and the century after that. 

The initial draft of the human genome was available in 2000, with the final polished blueprint published in 2003.  What did we learn?  A lot. First, nearly half of the genome if composed of repetitive elements and remnants of evolutionary ancient virus like fossils. Second, the genome contains only about 20,000 genes—not 50,000 or 100,00 or more genes as was previously thought.  That shocked a lot of people, since the flies and worms contain about the same number.  And I heard that fifty percent of the genes in a banana are in us.  A lot of scientists stood in disbelief knowing that fact.  We are such complex and magnificent creatures and fruit flies...well they're fruit flies.  Our DNA sequence also tells us that we are more closely related to worms and to yeast than most of us would ever have imagined. But the real truth is we don’t really know exactly how many genes there are. Third, only ~2.5% of the genome encodes sequences that make proteins, the building blocks of life. Proteins are fundamental molecules like keratin (fingernails) and collagen (skin and other connective tissues).  The functions for the remaining >90% of your genome is considered "Genomic Dark Matter".

What is Genomic Dark Matter?  Well if we knew what it is, we wouldn’t call it ‘Dark Matter’.  It’s the stuff ‘between the genes’. It’s the stuff that scientists used to call ‘junk DNA’.  Maybe a lot of it, or even most of it is junk, we just don’t know for sure. But we do know that amongst all this 3 billion some odd base pairs of Dark Matter lies a vast treasure.

We have discovered hidden genes that lie between genes and among genes.  Indeed one of the more remarkable discoveries that emerged after sequencing the genome is the discovery of some of the world’s tiniest genes.  These genes encode tiny RNAs called ‘microRNAs’ —indeed only ~20 bases.  There are hundreds of these tiny genes hidden across all of our genome, every chromosome.  And these genes are some of the most powerful genes to be sure.

And what do microRNAs do?   They act like tiny switches.  Each microRNA has the ability to turn off the expression of tens, hundreds, perhaps thousands of other genes. Some have said that the nearly the entire genome is controlled by these RNAs—these tiny switches.  If life is a rock concert, these tiny RNAs would be akin to the soundboard controls the music, with each microRNA acting like tiny little volume dials for a particular instrument.  

These tiny genes help tell each gene in your genome to be made in a specific cell type.  It would do no good if all of your genes were expressed all at once, it would be a veritable cacophony. Imagine going to a concert and hearing 20,000 instruments all playing at once.  You need the heart genes to be expressed in the heart, brain genes to be expressed in the brain, and so on. MicroRNAs help each cell to assume its unique identity.  This is obviously a pretty important job for such a tiny RNA, and these are conserved across multicellular organisms, hundreds of millions of years of evolution.

And microRNAs aren’t the only hidden genes amongst the dark matter, in fact there are many more- and more are being discovered every year. And in general, the concept on how each gene works are the same—they interact with other genes to do a particular job.  No gene really does all the work along—it needs friends. Some genes may be connected to only one or two genes, while others are connected to thousands. Its like twitter or facebook. Some of us have only a few friends, others a roomful.

In this way, complex pathways act like circuits in a radio, each doing their little part to make us human. And they help each other too—in fact you have two of every gene, sometimes more.  If one of your genes is lost or mutated, you have another healthy one to take its place.  Many recent studies that have sequenced more than tens of thousands of humans have found that each of us is walking around with lots of gene mutations and lots of complete gene knockouts.  And most of us are pretty much OK, save for bad sense of humor or penchant for alternative facts and fake news.  But that’s biology and biology is robust against breakdown. And it’s actually important that each of us are different.  We’re not machines with broken parts, each of us is part of a process of evolution.  Some mutations make us shorter or taller, slower or faster, hairier or less hairy. And that’s part of the grander scheme. Each of us are part of the grand tinkering machine, helping the human race to adapt and evolve.

And here is the cool part—most of these mutations—they lie in the dark matter of the genome. 

You see-- the sequence of your DNA is really a tally of your parts list.  Like a blueprint really.  Take a car, a single car can have something like 30,000 parts, counting every part down to the smallest screws.  If I gave you a list of all the parts, I’ll bet most of you wouldn’t know how to put the car together.  Sure, most of you would have some general ideas, like what a steering wheel and tires do, but there are thousands of little parts that for all intensive purposes are just plain unknown to most of us.  That’s what the DNA sequence is to us scientists.  We know a bit about some of the obvious parts, but it’s the mysterious parts that I think are the most exciting --- those parts lurking about in the dark matter of the genome.  The parts that make each of us an individual. The parts that make us human. The parts that make us a family.