|Next-Gen Sequencing: Lost in Translation?|
“Genomics has certainly been overhyped – and so far failed to deliver on its promises. Many intelligent people have relegated the idea to the dusty corner shared by hopes for cold fusion, world peace and World Series rings for the Chicago Cubs,” writes the author of a recent Forbes article, “Gene Machine.” But while some people pooh-pooh the potential payoffs of genomic sequencing, others are working hard to help make them happen – but in so doing, have identified some formidable obstacles.
Data, Data Everywhere
The first draft sequences of the human genome were published a decade ago at a cost of about $3 billion. With the advent of next-generation (“next-gen”) DNA sequencing platforms over the last five years, costs have plummeted; today, an entire human genome can be sequenced in a matter of days or weeks for about $10,000, according to Forbes, while others suggest the cost could be as low as $4,000. New systems – the third generation – promise to drive costs down further, rapidly approaching the “hundred dollar genome” that the U.S. Department of Defense says “is nearly upon us.”
But low sequencing costs alone won’t result in the practical knowledge scientists in various disciplines are seeking. “The cost of sequencing has come down so tremendously fast that people are scaling up how much they sequence – but they’re not investing as much as they should in the computational power to take that information and do something with it,” says SLAS Laboratory Automation Section Executive Council member Chris Detter, group leader of the Genomics Group at Los Alamos National Laboratory (LANL) and center director of the LANL partner portion of the U.S. Department of Energy Joint Genome Institute. “It used to be very expensive to generate the data; now it’s very expensive to handle the data.” Indeed, an article that ran in Genome Medicine late last year is titled, “The $1,000 genome, the $100,000 analysis?”
“The caboose is ahead of the engine right now,” Detter emphasizes. “People are focusing on the thing that's cheapest and easiest to do, the low-hanging fruit, which is generating data. So now everybody’s just getting inundated.”
But not all sequencing data are created equal, according to Detter, who chairs the organizing committee of LANL’s Sequencing, Finishing, and Analysis in the Future meeting held annually in Santa Fe, NM. In an article published in Science a couple of years ago, Detter and colleagues underscored the need for quality.
“Having benchmarks and standards is very important. Every mom and pop shop now is generating data, but the data they generate may not be the same quality as that of larger institutional facilities, and that could affect how they ultimately get interpreted. Consistent standards will be key as we move forward from data generation to data analysis.”
Who and What are Doing the Decoding?
There also are differences among the major sequencing platforms – Illumina, Roche and Life Technologies, among others. Each has a different error profile, and some are better for resequencing, while others are better for de novo sequencing, Detter says.
“Resequencing means the sequence for a particular genome is already out there – a strain of E. coli, for example – and you're sequencing another strain that's a very close relative to that reference genome. De novo sequencing means you’re sequencing something from scratch, without a reference genome. The type of sequencing you’re doing, as well as the platform, affect results.”
Teams doing resequencing projects against a reference genome generally have more flexibility in the type of sequencing platform they can use because the reference genome serves as a kind of “crutch” against which they can assemble data, Detter explains. While differences between the resequenced genome and the reference genome with respect to single-nucleotide polymorphisms, insertions, deletions and other variations “are fairly easy to grasp, regardless of the platform you’re using,” techniques such as long-range paired-end sequencing, which involves sequencing the ends of defined long segments, are more difficult to do on certain platforms versus others.
“For the most part, genome assemblers don’t do the job very well,” Detter continues. “You have to understand the type of sequence you’re feeding into the assemblers and also what you get out of it. Smaller shops may not have the resources to generate complex assemblies from multiple platforms for de novo or metagenomic sequencing. Collaborations and teamwork will be key to generating high-quality, high-confidence data."
Adding to the complexity is metadata – the so-called “background” information about a particular genome. Metadata includes where a particular isolate is found, the depth it’s found in, the temperature it grows at, the media it grows on, the protocols used to isolate the DNA – “basically, a thousand different things,” Detter says. “It’s extremely important to capture that metadata in a database and relate it to the data that's generated; otherwise, you're trying to draw conclusions on the sequence data without understanding the context.”
For example, if a lab in California generates a sequence on a particular genome and a lab in Germany generates a sequence on the same genome, they each draw their own conclusions based on their analyses. “If you compare the samples, you find out they're really very different. And while it’s often challenging to go back and get that ‘background’ information, it really provides important information.”
Ramping Up the Lab
The skill set of lab technicians running the sequencing platform also must be factored in, according to Stacey Gabriel, head of Cancer and Medical Sequencing at the Broad Institute. “Until about five years ago, there was basically one platform – capillary electrophoresis (CE) – for doing DNA sequencing. Things changed dramatically with the advent of the new sequencers that generate millions or tens of millions or now billions of reads per run. That affects what people do and how they do it.”
Tasks such as materials management and materials quality control have remained the same, in that the people who used to load the CE sequencers continue to load samples on the new sequencers, Gabriel says. “But now, with thousands and thousands-fold more data out there, the FTE (per-person unit of work) has increased tremendously.”
The new platforms are more sophisticated than CE, and require a better-trained work force, even for the routine tasks of setting up and running the machines, Gabriel continues. “Individuals have to have a higher level of knowledge because on most of these machines, the runs take a number of days, and thousands of dollars are spent on reagents before we even see the data. So mistakes are a lot more costly.”
At the Broad, “we also have to be open to the fact that we have multiple vendors, platforms and applications. Although one company might be the big platform today, it might not be tomorrow. So we have a big culture of cross training – learning the different technologies out there so we’re not focused on just one.”
The Broad’s director of operations instituted a “sabbatical” system with laboratory staff, Gabriel explains. “People spend several months doing a routine operation such as loading a machine or setting up a cluster station. Then they cycle off in a systematic way into doing a rotation in a technology development experiment, for example. So they're learning some new skills, they're learning some new molecular biology, and then they go back to their original production environment.”
Companies also need to bring in many more computational biologists to make sense of all the data, says Detter – but that’s a daunting task. “The United States simply is not turning out many computational biologists. While we can hire foreign nationals at my lab, clearance paperwork often results in significant delays.” That means the market is wide open for people who want to enter the field of bioinformatics, he emphasizes.
“It's often easier for a computational scientist to learn biology than it is for a biologist to learn computational science,” Detter acknowledges. “When I talk with students, I tell them, "If you like computers and computer games, and if you like biology, you need to become a bioinformaticist. You can write your own ticket, meaning you’ll have your choice of a well-paying job anywhere across the U.S.
“Computational biology may not be as sexy as sequencing, because with sequencing, you get a data set immediately,” he continues. “But computational biology is very specialized, and it’s where genomics is going. Sequencing is low-hanging fruit, and the low-hanging fruit are being picked right now. We need to start going up the ladder to the higher hanging fruit to start understanding the data that we're generating.”
There’s a “big push” to use next-gen sequencing for identifying and characterizing unknown pathogens (see JALA Online in late November for the original research report, "Automated Digital Microfluidic Sample Preparation for Next Generation DNA Sequencing" by 2011 SLAS Innovation Award winner Kamlesh Patel. Patel is also a track and session chair for SLAS2012.) and for plant and microbial genomics to clean up environmental toxins and create new energy sources, according to Detter.
But arguably the biggest hope is that the information obtained ultimately can be used to hone in on candidate drugs and inform therapy. That’s the aim of translational or convergence medicine, and specifically, “personalized” medicine. But here, too, the bottleneck is interpretation.
“It’s not about the technology or the data,” says SLAS member Angelika Niemz, Arnold and Mabel Beckman professor and director of research at Keck Graduate Institute. “The fundamental challenge is correlating genotype with phenotype – taking the information and deriving meaningful conclusions with the appropriate positive predictive value. And that’s true whether the data are generated from genotyping or genome sequencing.”
In the personalized medicine arena, two markets are building for sequencing, Niemz says. One, whole genome sequencing, involves looking for mutations, inherited at birth, which are related to disease risk – for example, people with certain APOE4 variants may be at greater risk for developing heart disease or Alzheimer’s disease. The other market involves identifying somatic mutations, acquired during cell division in certain tissues, and often a causal factor in cancer – which may inform whether a particular tumor is likely to respond to a specific drug therapy.
Decision-making is complicated in both cases. Most diseases are complex; only rarely – in Huntington’s disease or sickle cell anemia, for instance – does a single germline mutation mean a person will get a particular disease. In most cases, researchers are identifying “susceptibility” genes, where environmental factors also enter into the picture, as do a host of genetic factors such as whether a gene is turned on or off, interacts with other genes or the environment, etc.
The significance of somatic mutations is also “very complicated,” Niemz emphasizes. An example is the drug warfarin. “Different patients require different dosing of the anticoagulant drug, and although part of this is related to a patient’s genetic profile, environmental factors also have a significant influence,” she explains. “There are a handful of mutations currently used in helping doctors adjust initial drug dosing. Some are in the enzyme that breaks down the drug, others are in the enzyme the drug actually inhibits, and everything taken together can explain 50 to 60 percent of the variability of an individual’s response. The remaining 50 or 40 percent involve general health status, other comorbidities, drug-drug interactions, allergies and things like how much broccoli you ate in the day, because the warfarin dose is also vitamin K-related.”
The New York Times recently reported on a cancer drug trial that went awry. Although therapy was purported to be based on a genetic analysis of tumor cells, “the research…turned out to be wrong. Its gene-based tests proved worthless, and the research behind them was discredited.” The patient died a few months after treatment.
“Part of the challenge is that companies are coming out with products based on genotyping or genome sequencing that only capture part of the biological complexity, and therefore may not be able to accurately predict whether you're going to respond to this drug or another. But the overall biology involved in disease progression and response to therapy is incredibly, incredibly complicated. There are epigenetics and other regulatory mechanisms, nuances in the proteome, such as post-translational modifications, and other secondary effects such as changes in the metabolome, which describes the fine balance that exists amongst different metabolic processes, and other things we may not understand yet.”
What can SLAS members expect with respect to sequencing going forward? On the research side, Niemz anticipates a continuing upsurge in biomarker discovery, which could help in understanding the sequencing data. “On the automation side, I think the technology will continue to move stunningly fast,” she says. “Second-generation sequencing is pretty much state of the art now, but the wave of third-generation sequencing technologies could rapidly make those technologies obsolete. It’s like the electronics industry; there’s always something new coming out.”
Also similar to the electronics industry, smaller, portable platforms for DNA decoding are emerging that promise to make sequencing more accessible and drive costs down even further. Jonathan Rothberg of Ion Torrent Systems in Guilford, CT, recently used his silicon-based “Portable Genome Machine” to sequence the genome of Gordon Moore, cofounder of chipmaker Intel. That feat represented the first time DNA has been decoded by a semiconductor.
Meanwhile, SLAS member Patrick Merel, founder and president of Portable Genomics and molecular core facility leader at the Biomedical Innovation Platform, University Hospital of Bordeaux, has developed a way of visualizing full genome data on mobile devices. His portable device uses an iTunes-like interface to enable consumers and their doctors to access specific information from the reams of data produced by the genomic analysis.
Merel’s team is in the process of moving to the United States because restrictive policies in France (by law, patients are not allowed to have access to their genomic data) preclude the launching of the project there. “My vision is that the molecular diagnostics industry will soon switch to next-generation sequencing diagnostics, because as soon as genome sequencing costs drop to between $100 and $500, you will get a full genome for the current cost of a single molecular diagnostic test,” Merel says. Registration is open for Merel’s short course on next-generation sequencing, to be held at SLAS 2012.
Detter is optimistic that the “bottlenecks” in data analysis and interpretation can be overcome as vendors work together with scientists to hone products. As that happens, “we’ll be pushing the puck down the ice a little bit further, which will create a new bottleneck,” he says. “But as long as we stay proactive, and keep pushing it down there, I think things will naturally get better and easier. It’s happening now, but I would like it to happen a little faster.”