Sunday, December 26, 2010

The Strawberry Genome: The Story Behind the Story

Today we have witnessed something that many of us thought we'd never see- the completed publication of the strawberry genome.  The story appeared today in Nature Genetics.

But what is the story behind the story?  As someone that was there from the beginning, I think it is helpful to recap the highlights and lowlights that did not reach the journal article. It adds much more texture to the news release and gives a much better understanding of the process of getting from crazy idea to final publication.

When next-generation sequencing came into vogue, there was immediate buzz about sequencing strawberry.  It was late 2005.  Arabidopsis and rice were fully sequenced, others were in progress and other plants were in line for genome sequencing.  At the time we solicited various government agencies for the funds to use the new 454 sequencer at the University of Florida.  We were one of the first places with the platform, so we dreamed of using it in a revolutionary way.  The tiny strawberry genome was an obvious target.  We asked for around $100,000 to start the process, soliciting mostly through earmarks and initiatives that UF put forward to the State of Florida.  Heck, it costs $200K to improve an intersection, so half of that to get an accounting of the nuts and bolts of an economically important crop plant should be of some priority.

Funds never materialized.  However, the opportunity to apply for funds from DOE-JGI came to the table.  The funding solicitation was broken down to two levels, genomes over and under 200 Mb.  Since strawberry was estimated to be just around 200Mb, it seemed to be a no brainer for the under 200 Mb solicitation.  Dr. Tom Davis from University of New Hampshire submitted the letter of intent for the January 13, 2006 deadline, days before the Plant Animal Genome meeting in San Diego.

At the Rosaceae Executive Committee annual meeting on Jan. 15 Tom announced that he had submitted the letter. The news was not well received.  The broader Rosaceae community had discussed sequencing a plant species, but interest swirled around peach and apple, mostly peach.  Those organisms, while possessing larger genomes, had good physical maps and substantial Sanger sequencing support. After a rather pointy discussion, Tom was convinced to withdraw his strawberry sequencing proposal, differing to the eventual DOE-JGI support of peach.

It was a blow to those of us that hoped that a strawberry/peach combo platter would be possible, but the broader feeling was that we would not get both, we'd get neither.  Who knows?  I'm from the camp that if you have a compelling scientific argument you have to ask if you are going to get.  There's also some merit in thinking big- and an opportunity to get two for the price of one seemed legit to me, and certainly to Tom. Oh well.

Time went on.  It was clear that there was going to be little/no support for strawberry sequencing.

At PAG in San Diego, 2008, Vladimir Shulaev and Richard Veilleux from Virginia Tech attended the Rosaceae Executive Committee meeting.   Vladimir announced that Virginia Tech had thrown support behind the idea of genome sequencing- both some financial support as well as technical and facility support.  This was the seed that was needed.  The discussion had a pure "pass-the-hat" flavor to it.  Nobody had big funds, but Vlad and Richard had a big idea.  That proved to eliminate the first major barrier to a complete sequence.

After that meeting Vladimir, Janet Slovin, Tom Davis, Todd Michael and I met in a corner of the conference hall and talked about where to get the rest of the funds needed.  We wrote a quick grant proposal to the North American Strawberry Growers Association, asking for $8000.  We brainstormed on other options.

Months later, the NASGA grant was declined, but other funds were coming in.  Most of all, Roche 454 was giving an excellent break on reagents. It was important for them that we succeeded.

Many other pools of funds materialized, including money from IASMA (italy), the USDA (J. Slovin), Virginia Tech, Driscoll's Strawberry Associates, Plant and Food Research (New Zealand), and the Dean for Research at UF.  Our strawberry breeding program pitched in as well.  Of course, many labs donated time and expertise.  The value of this contribution can not be understated, as literally thousands of human hours were committed to this project with no guarantee of reward.

Sequencing proceeded almost exclusively at Virginia Tech, with some paired ends being done at Roche 454. The runs were being performed as funds would come in and substantial coverage was brewing.  Weekly conference calls would tell us of increasing coverage.

There were skeptics.  Many in genomics, including some friends, predicted failure.  They told us that there was no way that a draft sequence could be obtained without a physical map, and especially with a purely short-read based approach.  Time would prove them incorrect.

Soon after new people joined the consortium, including many experts in genome annotation.  Mark Borodovsky, Paul Burns, Todd Mockler, Keithanne Mockaitis and others all came aboard, sequencing mRNA libraries that my lab put together from various tissues, then annotating the genome accordingly.

Assembly was facilitated by Steven Salzberg's lab at U Maryland.  Art Delcher really advanced this project forward, as Newbler itself was not providing ample collapse of contigs.  Finally advanced assemblies were reported during our conference calls.  Scaffolds were getting larger.

The genome browser came online with expert input from Ross Crowhurst at Plant and Food Research.  PFR had a huge role, with experts like Andy Allan and Roger Hellens contributing.  The final version is at

The scaffolds were anchored to the genetic linkage map by Dan Sargent at East Malling Research in the UK. This exercise was a massive undertaking and involved input from Jasper Rees' lab.  This was really important because it provided organization of the scaffolds that placed them into pseudochromosomes.

Functional annotation was performed with the expert care of Pankaj Jaiswal and his lab.  Aaron Liston assembled a chloroplast and ran some excellent phylogenetic analyses with Allan Dickerman.  These data suggest a rethinking of how poplar clusters with other taxa.

The last year was a grind.  Every 4pm we'd have a conference call with fewer and fewer members.  Even pivotal people, there from the beginning, were losing interest or were consumed with other priorities.  Things were uninteresting and heading towards collapse.  All of us were burned out and bored.

I tried to stir some momentum and interest with daily email updates, but even this futile effort ended in about a week.  It was sad that we had all of the data in one place.  No more sequencing was being done, annotation and predictions were complete and the genome browser was up and running to those with credentials.

Talk about frustration.  We had the genome sequence in a pile, but the report was a disorganized mess of fragmented ideas looking for codification.

Luckily some enthusiasm was found at PAG 2010.  Vladimir presented the genome work at a major symposium.  It looked awfully sweet on the big screen and many of us felt a sense of prime time.  It was energizing.

Todd Mockler, Todd Michael and Tom Davis met.  Later that night we had a strategy-n-pizza meeting with all of those present in the consortium.  Many of us met in person for the first time.  Our decision was that we'd assemble a "writing team", a small group to put the massive outline and verbose rantings of thirty eggheads into a publishable format.  Tom, Tia-Lynn, Richard, Aaron, Todd, Andy, Mark, Herman, Lee, Dan, and I attacked this charge. We worked fast and with purpose, submitting a pre-submission inquiry to Nature in February of 2010.

Our solicitation was declined about two weeks later.  We formatted for Nature Genetics. The slow re-submission was due to additional data coming in, and we finally submitted in June.  Reviews were back in August. High quality, appropriately critical reviews led us to reshape to a stronger version of the work and then resubmit in September, 2010.

Final acceptance didn't come until late November.  Most people, even those close to the project, don't know that we were down to the wire for addressing editorial concerns toward publication.  I have no finger nails left.  Even this last week there were serious concerns if it would be published because some of our data were not accessible online.

The work found needed final touches by an outstanding team of re-writers and proofers.  Some of those in the consortium somehow mustered up another awesome critical read. We had no warning- and a 24 hour deadline. A good paper got a final makeover.  Richard,  Keithanne, Tom D., Janet, Aaron, Todd Mockler, Lee, Herman and others gave volumes of excellent suggestions. It was a relief to see the point when every error was found at least by two independent parties, suggesting that all was in order.  The "final" galleys were returned to Nature Genetics and were a turbulent sea of yellow-highlighted adjustments.

I'll also pat myself on the back for at least a few sleepless nights near the end where my desire to see this complete gave me intense focus and drive. I wanted it done, flawlessly.

While it is beautiful to see it in print, it is more a testament to the thousands of person-hours on conference calls, Vlad and Richard's vision and persuasion in the beginning, a mega-talented team, generous funding from non-traditional sources and an expert, supportive editorial job at Nature Genetics, among many things of course.

Most of all I gained a new respect for people I already admired.  It was a joy to work on a common project, but also to endure the ups and downs together. And I apologize to those that were not acknowledged here. Add a note in the comments if you would please.

It was quite a journey, and a journey only to the beginning.  Now the real fun begins...