Genetics researchers around the globe have access to a comprehensive record of all sequenced DNA, thanks to an international effort to share massive amounts of information between databases in Japan, the United States, and Europe.
The International Nucleotide Sequence Database Collaboration (INSDC) relies on high-bandwidth network connections to copy the genetic data between DNA Data Bank of Japan (DDBJ), operated by the National Institute of Genetics (NIG); the United States-based GenBank, provided by the National Centre for Biotechnology Information; and the European Nucleotide Archive, operated by the European Bio-informatics Institute in the United Kingdom.
The combined database is accessible to genetics researchers around the world, who use it across many disciplines, including biology, medicine and agricultural sciences. The ability to access and share sequencing information helps them make life- and world-changing findings: from identifying disease-causing genetic variants, to producing disease-resistant plants and animals, to using the data as forensic evidence in criminal cases.
Within the INSDC, each piece of sequenced DNA is assigned a unique ID – which must be referred to in any academic publication – and is organised based on the location the sample was taken, and the part of the organism. This allows researchers to easily access, re-use and re-test the data, improving the accuracy and efficiency of their research.
The National Institute of Genetics is an inter-university research institute for all of Japan’s genetics advanced studies and education. In addition to the DNA Data Bank, it also offers supercomputing services to analyse genomic data using a next-generation sequencer. The NIG supercomputer is a key infrastructure supporting Japan’s DNA research. It has almost 2,500 registered researchers, who can access the combined International Nucleotide Sequence Database and upload their own sequencing information.
Osamu Ogasawara, Project Associate Professor at the DDBJ Center, explains how the international database came to fruition.
“Soon after the Sanger method of DNA sequencing had been developed (in 1977, which made sequencing much less labour intensive than it had been previously) it was expected that a considerable amount of sequenced data would be generated in the future. In response to this, there was increasing momentum to create an international database, which included our DNA Data Bank.”
Since the introduction of the Sanger method, dramatic and ongoing improvements in the performance of DNA sequencers have resulted in substantial increases in the amount of data that the Data Bank stores, shares and analyses.
Yasukazu Nakamura, a Professor at the Genome Informatics Laboratory, explains how the National Institute of Genetics keeps pace.
“The performance of our newest sequencer – called the Next-Generation Sequencer improves so rapidly. Sequence analytical speed is 1000-fold from 2005 through 2014, and its cost of sequencing an average human genome is precipitously down from $10million to $100 only several years after next-generation sequencers entered the market in 2005.
And as a result, the data volume which Data Banks handle increased explosively”.
“We rely on ever-more powerful network connections to copy increasing volumes of sequencing information to the other DNA banks that form the International Nucleotide Sequence Database Collaboration,” he says.
Japan’s Data Bank is connected to SINET, the country’s high-speed research and academic network. Via its international connections, SINET provides the Bank with dedicated research connectivity on to Europe and the United States, for high-bandwidth connection on to the American and European databases.
“There is greater demand for access to the Data Bank from outside our organisation than from inside, which means that the network that connects the Data Bank to the rest of the world needs to demonstrate a very high level of performance and reliability,” explains Professor Nakamura.
“Thanks to our SINET connection, we provide a reliable connection that meets the needs of researchers, with average traffic flows of 3-4Gigabits per second.”
“As previously mentioned, there is no sign of a slow-down in the amount of data that is generated or in performance improvements in sequencers and computers. In future, we will need to make sure we are making the most of our network bandwidth and we have great expectations for the development of our SINET connection.”
For more information please contact our contributor(s):