What is ENCODE? 

ENCODE is the Encyclopedia of DNA Elements, a project that is funded by the National Human Genome Research Institute “to identify all regions of transcription, transcription factor association, chromatin structure and histone modification in the human genome sequence.” 

As Ed Yong notes, “Over the last 10 years, an international team of 442 scientists have assailed 147 different types of cells with 24 types of experiments. Their goal: catalogue every letter (nucleotide) within the genome that does something. The results are published today in 30 papers across three different journals, and more.”

What makes this so fascinating to me? 

Two words, virtual machine.

From Ed Yong once more,

“With these really intensive science projects, there has to be a huge amount of trust that data analysts have done things correctly,” says Birney. But you don’t have to trust. At least half the ENCODE figures are interactive, and the data behind them can be downloaded. The team have also built a “Virtual Machine” – a downloadable package of the almost-raw data and all the code in the ENCODE analyses. Think of it as the most complete Methods section ever. With the virtual machine, “you can absolutely replay step by step what we did to get to the figure,” says Birney. “I think it should be the standard for the future.”

Genetics, genomics, bioinformatics, etc.. is all about big data and ideally, “transparency”. I’m not as interested by the medias focus on the “junk” that isn’t really “junk” [though still debatable] since it’s been acknowledged by every biology professor that I’ve had since I started my undergraduate education that “junk” was a bad way to describe it. I’m far more interested in how the work was put together and how it’s being presented. 

As Daniel MacArthur notes

Today’s announcements serve as a model for future large-scale science: a model that transcends the traditional publication approach where a paper is the endpoint, and that emphasizes reproducibility, transparency and accessibility over impact factors alone as metrics for success.

I like these things.

Additional Links

Also check out this report on the ENCODE project by the Intelligent Design crowd, it’s hilarious and sad at the same time.