The DNA60IFX genome sequencing contest draws 1000 competitors worldwide

by Michael Schatz**

This past April 25th was the 60th anniversary of the publication of Watson and Crick’s famous paper describing the double helix structure of DNA. Many events were held around the world to celebrate the historic day, including a contest organized by James Taylor of Emory University and myself, and sponsored by the journal Genome Biology.

DNA60 IFX color puzzle

The contest consisted of a series of 5 DNA-related analysis problems released over 4 days, organized so that the solution to one stage was needed to unlock the clues to the next.

The first and easiest problem was to identify the most overrepresented sequence motif from a set of 1000 DNA sequences. This type of analysis is commonly used to identify transcription factor binding sites and other regions where proteins bind to DNA. Unlike English or other natural languages, the “language” of DNA has no spacing or punctuation, so participants broke the sequences down into artificial short words called k-mers and looked to see which k-mers occurred the most.

The final and most complex problem required participants to locate and decode a secret message that we had embedded in an unnamed bacterial genome. The secret message was a quote by Professor Ray Gosling on the profound significance of Watson and Crick’s discovery, calling it “amazing.” Gosling’s work 60 years ago photographing the X-ray diffraction patterns of crystallized DNA had been instrumental for deducing the double helix structure.

The first person to decode the secret message won an iPad and the second and third place participants won their choice of a free subscription to the journal or free registration to the conference “Beyond the Genome” to be held in the fall.

By all accounts, the contest was an astounding success. Nearly one thousand participants from all over the world completed the first stage of the contest, hundreds completed the first 4 stages, and dozens completed the last before we announced the winners. Taking first place was Sven-Eric Schelhorn of the Max-Planck-Institut für Informatik, Germany, who completed the contest in just 19 minutes! Just seconds behind was undergraduate student Kevin Wang at the University of Chicago, while Gustavo Lacerda at the Campinas State University, Brazil finished third, arriving at the solution in a mere 24 minutes.

Our goals for holding the contest were twofold: to celebrate the historic day, and to encourage students and postdocs to learn a few new techniques. All 5 of the stages required analyzing nucleic acid sequences to find the patterns and messages hidden within them, and all 5 required applying different computer algorithms to figure out the solutions.

In this sense, the contest reflected a broader trend in biology to use more and more quantitative techniques, so much so that today’s biology students need deep mathematical and computer skills to interpret their projects and advance the state of the art.

Even the very organization of the contest reflected this trend: the contest was announced, executed, and awarded entirely online. The organizers and judges never met a single participant face to face. Instead, almost all of the contest discussion was held over Twitter (using hashtag #DNA60IFX), allowing rapid questions and answers and even a little boasting by the winners, on the side. Whatever sense of head-to-head competition that was lost in this forum was far outweighed by the increased participation that we could achieve.

Given the extraordinary success of the event, we are planning similar contests for the future, and are considering making this an annual event. The next will be held on Oct 1-3, 2013 to coincide with this year’s “Beyond the Genome” conference in San Francisco. Then we’ll hold another on Oct 30 – Nov 2 to coordinate with CSHL’s “Genome Informatics” meeting.

Look for me on twitter using @mike_schatz to learn more as we announce the contests.

The full event record for the contest was published online today at: http://genomebiology.com

**Michael Schatz is an Assistant Professor in the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory.

This entry was posted in Bioinformatics, Genomics and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>