Ten Years Ago- A Strawberry Genome
One of the joys of publishing a scientific manuscript is the correspondence from the journal that the paper has finally been accepted. Peer review and high journal standards are a slow and deliberate maze to navigate that stand in the way of sharing your prized work.
There is one monumental publication in the hundred plus I’ve
authored where the research, writing and review processes became a delicate managerial
dance between negotiation, combat, finesse, psychology, and arm twisting. This
week we celebrate its 10 year birthday, with two sturdy gin and tonics for
every piece of birthday cake.
The publication of the woodland strawberry genome in
February of 2011 was the culmination of efforts from at least 77 scientists. It was a battle from the beginning, and story
that few people know and the rest tried to forget. Somehow I became the manager
of the project, so the successes and frustrations are still a little fresh even
after a decade.
The genome sequenced was not that belonging to the big red
commercial strawberry. It was its
relative, a tiny yellow-fruited cousin that shared similar genetic makeup. It was a great choice to sequence. In 2007 at the Plant-Animal Genome Meeting in
San Diego, CA there were only several key species sequenced—things like rice,
and the model plant Arabidopsis thaliana.
Strawberry was a good choice to add to that rarified group. It was the
red-fruited weirdo of the rose family, a group of plants containing apples,
pears, peaches, blackberries and, well, roses. We knew the woodland
strawberry’s simple genome was tiny, and likely didn’t contain much repetitive
DNA, a problem that still confounds genome assembly efforts.
But as usual, politics wrecks everything. While there were
many merits in obtaining strawberry sequence, there were vocal supporters of
sequencing peaches and apples, tree crops with larger genomes that didn’t have
the same lab value as the readily transformable and diminutive diploid strawberry.
Other crops obtained funding and support from federal agencies and
international bodies. We had a dumb little plant.
Six strawberry scientists huddled in the best privacy we
could find at a conference, sitting on folding chairs behind a faux wall room
divider in the lobby. How would we do
it? How would we pay for it? The best we could do is pass the hat, get the
ball rolling, and see if we could recruit additional experts to make it happen.
The effort took off like cold molasses. A few bucks here and there, some support from
institutions like Virginia Tech and the University of Florida. National strawberry organizations wanted
nothing to do with it, despite a genomes immense value to breeding. Nor did the
companies that would one day mine the data for every last nugget of value. It
was frustrating. The deepest pockets that could make this a drop-in-the-bucket
effort saw no value. Eventually they would contribute.
The beginning-beginning was gorgeous. I purified genomic DNA
using an old-school technique, a cesium chloride gradient. The snotty threads
of life were as white as unviolated snow, and that few micrograms of perfect
starting material would seed the effort.
To make a long story
less long, that virgin DNA blob would be squeezed, interrogated and processed
for information, trickling in a little at a time, all being assembled into
longer threads as best could be done at the time. Eventually Roche/454 would
join the effort, providing significant sequence at low cost, simply to prove
they could do more than bacterial genomes. Additional experts joined the party, each lending their skills to unraveling part of the mystery. Soon, little stretches of information piled up, it became obvious that we were
a few obligatory Venn diagrams away from submitting a draft genome sequence for
publication.
The activities in that paragraph spanned 2008 and 2009, with
bi-weekly phone calls that grew less and less enthusiastic with time. I can
only thank my lucky stars that Zoom calls were still lost somewhere in the
future.
As time went on the calls grew shorter and had fewer
participants. Other genomes were being sequenced, had funding support, and were
executed by teams of scientists whose full-time job was working on a genome.
The diploid strawberry effort had no central funding source, so everything done
was on donated time and materials.
It was really the efforts of Dr. Daniel J. Sargent that
pushed this effort over the top. He undertook a massive campaign to understand
the spatial relationships between DNA ‘markers’, little signatures that were
present on the different stretches of DNA that were sequenced. That information
allowed the pieces to be put together in the right order and orientation. That
was the key, as Dan’s data allowed the piñata to be built so that other
scientists could beat it and pick up some candy.
Other prominent figures on the author team vanished. No
contact, no participation. Gone. Others
played major roles and I felt were not appropriately credited. Authorship order can be a delicate issue. Dr.
Daniel J. Sargent should have been first author, as his efforts and ingenuity
provided the data to elevate a skeletal work to near-publication form.
The original manuscript was written by a team, and it read
like a string of personal spins on the data each felt was most important. The
manuscript was probably 400% too long, and the few standing as an author team were
divided on where to send it. While I wanted it anywhere and done fast, others
demanded it be shopped to one of the prominent weekly science journals.
We sent it to Science, we sent it to Nature. Reject, reject.
Another few months burned from revision and submission. At the time there were
probably six or seven genomes published, including apple, so strawberry was
looking like the really cool guy that got to the party right when everyone else
was leaving.
Rejection, burnout, and being sick of a project that was
becoming less and less significant scientifically led most of the team to
disconnect. The bi-weekly conference
calls consisted of me and maybe another person talking about a chili recipe, if
they were not cancelled altogether.
It needed one last push.
I started with an almost blank sheet and smashed the author team’s
clunky manuscript into the tight template for Nature Genetics. It was the
middle of 2010, three years after a tiny team of strawberry scientists
decided to start the ball rolling.
The next months were a cycle of review and revise,
review and revise. Tweak, crunch, edit, chop.
I remember those nights thinking that I should also punt this project as
so many others clearly did. But there was maybe a light at the end of the
tunnel, and after round after round of revision we were close.
I remember fielding at least a dozen calls with the Associate
Editor, as she kept finding problems and generating requests from reviewers and
other editors. I dreaded the conversations, as each request for more data,
reformatting, additional experiments were going to sink the project.
Somehow I navigated that maze with a skillful persuasion and
dumb luck. The work would eventually
find acceptance at Nature Genetics, a decent journal where it fit quite nicely.
The Editor relayed the good news that the work would be published in February
of 2011. It was November of 2010, so it
seemed a million years away.
There were a few things that made this accomplishment
unique, aspects that were largely unappreciated.
It was published in
the same issue as the cacao genome, the 12th and 13th
plant genomes sequenced. Here in
2021 there are literally tens of thousands of plant genome sequences known. What
took $350,000 and three years then can now almost be accomplished in a few days
for a few thousand dollars.
It was assembled
without a physical map. Knowing where genes or DNA sequences are located relative
to one another helps put the little smudges of DNA sequence data in the right
order and orientation. The strawberry genome did not have this guiding luxury
as other crops did, and Dan Sargent’s efforts made it possible to assemble
strictly from short reads. Later the panda genome would also be assembled from
short reads with quite a bit more fanfare. Pandas are cute.
There were no relevant
reference maps. Today genome
assembly is less taxing because of the wealth of information that already
exists. It is easy to draw a map of the
USA because everyone from settlers to satellites has already defined where the
parts belong. The strawberry genome sequence was a pioneer.
It was done without a
centralized funding source. The work
was done on a shoestring, digging in science’s couch cushions to capture enough
scratch to push out more data.
Overall, it was a great experience to work with experts and
learn a lot about how the tools of genome sequence assembly and analysis
work. Soon after I would move into
university administration and forgot everything I knew.
But I didn’t forget that phone call, the news that the work
was finally accepted.
And I won’t forget the efforts of the scientists that really
made it possible, as there were a few key players that carried the vast
majority of the weight. You know who you are. Pat yourself on the back and
celebrate, as your efforts allowed this seminal discovery to be translated to
the commercial crop, and eventually influence genetic improvement efforts. And that was our ultimately mission all
along.