Wednesday, February 24, 2021

Ten Years Ago- A Strawberry Genome

 One of the joys of publishing a scientific manuscript is the correspondence from the journal that the paper has finally been accepted.  Peer review and high journal standards are a slow and deliberate maze to navigate that stand in the way of sharing your prized work.

There is one monumental publication in the hundred plus I’ve authored where the research, writing and review processes became a delicate managerial dance between negotiation, combat, finesse, psychology, and arm twisting. This week we celebrate its 10 year birthday, with two sturdy gin and tonics for every piece of birthday cake.  




The publication of the woodland strawberry genome in February of 2011 was the culmination of efforts from at least 77 scientists.  It was a battle from the beginning, and story that few people know and the rest tried to forget. Somehow I became the manager of the project, so the successes and frustrations are still a little fresh even after a decade.

The genome sequenced was not that belonging to the big red commercial strawberry.  It was its relative, a tiny yellow-fruited cousin that shared similar genetic makeup.  It was a great choice to sequence.  In 2007 at the Plant-Animal Genome Meeting in San Diego, CA there were only several key species sequenced—things like rice, and the model plant Arabidopsis thaliana. Strawberry was a good choice to add to that rarified group. It was the red-fruited weirdo of the rose family, a group of plants containing apples, pears, peaches, blackberries and, well, roses. We knew the woodland strawberry’s simple genome was tiny, and likely didn’t contain much repetitive DNA, a problem that still confounds genome assembly efforts.

But as usual, politics wrecks everything. While there were many merits in obtaining strawberry sequence, there were vocal supporters of sequencing peaches and apples, tree crops with larger genomes that didn’t have the same lab value as the readily transformable and diminutive diploid strawberry. Other crops obtained funding and support from federal agencies and international bodies. We had a dumb little plant.

Six strawberry scientists huddled in the best privacy we could find at a conference, sitting on folding chairs behind a faux wall room divider in the lobby.  How would we do it?  How would we pay for it?  The best we could do is pass the hat, get the ball rolling, and see if we could recruit additional experts to make it happen.

The effort took off like cold molasses.  A few bucks here and there, some support from institutions like Virginia Tech and the University of Florida.  National strawberry organizations wanted nothing to do with it, despite a genomes immense value to breeding. Nor did the companies that would one day mine the data for every last nugget of value. It was frustrating. The deepest pockets that could make this a drop-in-the-bucket effort saw no value. Eventually they would contribute.

The beginning-beginning was gorgeous. I purified genomic DNA using an old-school technique, a cesium chloride gradient. The snotty threads of life were as white as unviolated snow, and that few micrograms of perfect starting material would seed the effort.

 To make a long story less long, that virgin DNA blob would be squeezed, interrogated and processed for information, trickling in a little at a time, all being assembled into longer threads as best could be done at the time. Eventually Roche/454 would join the effort, providing significant sequence at low cost, simply to prove they could do more than bacterial genomes. Additional experts joined the party, each lending their skills to unraveling part of the mystery. Soon, little stretches of information piled up, it became obvious that we were a few obligatory Venn diagrams away from submitting a draft genome sequence for publication.

The activities in that paragraph spanned 2008 and 2009, with bi-weekly phone calls that grew less and less enthusiastic with time. I can only thank my lucky stars that Zoom calls were still lost somewhere in the future.

As time went on the calls grew shorter and had fewer participants. Other genomes were being sequenced, had funding support, and were executed by teams of scientists whose full-time job was working on a genome. The diploid strawberry effort had no central funding source, so everything done was on donated time and materials.

It was really the efforts of Dr. Daniel J. Sargent that pushed this effort over the top. He undertook a massive campaign to understand the spatial relationships between DNA ‘markers’, little signatures that were present on the different stretches of DNA that were sequenced. That information allowed the pieces to be put together in the right order and orientation. That was the key, as Dan’s data allowed the piñata to be built so that other scientists could beat it and pick up some candy.

Other prominent figures on the author team vanished. No contact, no participation. Gone.  Others played major roles and I felt were not appropriately credited.  Authorship order can be a delicate issue. Dr. Daniel J. Sargent should have been first author, as his efforts and ingenuity provided the data to elevate a skeletal work to near-publication form.

The original manuscript was written by a team, and it read like a string of personal spins on the data each felt was most important. The manuscript was probably 400% too long, and the few standing as an author team were divided on where to send it. While I wanted it anywhere and done fast, others demanded it be shopped to one of the prominent weekly science journals.

We sent it to Science, we sent it to Nature. Reject, reject. Another few months burned from revision and submission. At the time there were probably six or seven genomes published, including apple, so strawberry was looking like the really cool guy that got to the party right when everyone else was leaving.

Rejection, burnout, and being sick of a project that was becoming less and less significant scientifically led most of the team to disconnect.  The bi-weekly conference calls consisted of me and maybe another person talking about a chili recipe, if they were not cancelled altogether.

It needed one last push.  I started with an almost blank sheet and smashed the author team’s clunky manuscript into the tight template for Nature Genetics. It was the middle of 2010, three years after a tiny team of strawberry scientists decided to start the ball rolling.

The next months were a cycle of review and revise, review and revise. Tweak, crunch, edit, chop.  I remember those nights thinking that I should also punt this project as so many others clearly did. But there was maybe a light at the end of the tunnel, and after round after round of revision we were close.

I remember fielding at least a dozen calls with the Associate Editor, as she kept finding problems and generating requests from reviewers and other editors. I dreaded the conversations, as each request for more data, reformatting, additional experiments were going to sink the project. 

Somehow I navigated that maze with a skillful persuasion and dumb luck.  The work would eventually find acceptance at Nature Genetics, a decent journal where it fit quite nicely. The Editor relayed the good news that the work would be published in February of 2011.  It was November of 2010, so it seemed a million years away.

There were a few things that made this accomplishment unique, aspects that were largely unappreciated.

It was published in the same issue as the cacao genome, the 12th and 13th plant genomes sequenced.  Here in 2021 there are literally tens of thousands of plant genome sequences known. What took $350,000 and three years then can now almost be accomplished in a few days for a few thousand dollars.

It was assembled without a physical map. Knowing where genes or DNA sequences are located relative to one another helps put the little smudges of DNA sequence data in the right order and orientation. The strawberry genome did not have this guiding luxury as other crops did, and Dan Sargent’s efforts made it possible to assemble strictly from short reads. Later the panda genome would also be assembled from short reads with quite a bit more fanfare. Pandas are cute.  

There were no relevant reference maps.  Today genome assembly is less taxing because of the wealth of information that already exists.  It is easy to draw a map of the USA because everyone from settlers to satellites has already defined where the parts belong. The strawberry genome sequence was a pioneer.

It was done without a centralized funding source.  The work was done on a shoestring, digging in science’s couch cushions to capture enough scratch to push out more data.

Overall, it was a great experience to work with experts and learn a lot about how the tools of genome sequence assembly and analysis work.  Soon after I would move into university administration and forgot everything I knew. 

But I didn’t forget that phone call, the news that the work was finally accepted.

And I won’t forget the efforts of the scientists that really made it possible, as there were a few key players that carried the vast majority of the weight. You know who you are. Pat yourself on the back and celebrate, as your efforts allowed this seminal discovery to be translated to the commercial crop, and eventually influence genetic improvement efforts.  And that was our ultimately mission all along.