Bioinformatics Cloud Refines Research

Nicole Hemsoth

A number of barriers on the compute and infrastructure side have been preventing bioinformatics researchers from fully exploiting new technologies, not the least of which are concerns about storage and access of patient samples and medical histories.
Above privacy and compliance worries, more general roadblocks in terms of access, collaboration across research sites, and variance in the type of data that is used in different analytics platforms have also been cited as slowdowns for biomedical advancement.

While there have been significant strides made on the part of the large federally-funded organizations like the National Institutes of Health, other research groups are bringing new possibilities for increased security, collaboration and access to comprehensive analytics tools to the table. For instance, last year, bioinformatics researchers at The Ohio State University began working on a cloud resource that would that allow global researchers to access and analyze vast amounts of biomedical data. 

The creators of the Translational Research Informatics and Data Management Grid (TRIAD) describe their project as “the middleware that addresses informatics challenges by enabling the creation of a scalable, secure and knowledge-anchored data sharing environment” that extends the existing caGrid infrastructure that was part of the National Cancer Institute’s caBIG program.

caGrid is basically a domain-agnostic software system made up of grid middleware, services and tools that are combined to create a service-oriented architecture is that interoperable and can operate in a distributed fashion. This is the system that powers the National Institutes of Health and National Cancer Institute Bioinformatics Grid (caBIG) by providing the necessary support structure.

It could solve some of the challenges of bioinformatics by specifically addressing privacy, analytics and sharing issues. For instance, the platform allows researchers to anonymously match tissue samples with medical record data that has been stripped of identifying information using what the TRIAD team calls an “honest broker protocol.” This allows for privacy to be maintained while removing the timely task of seeking constant approval for studies that do not require a patient’s identifying information.  

Aside from the critical privacy features, at the heart of its functionality is its ability to be the middleware 'translator' for diverse data sets. In essence, the TRIAD platform culls together different types of data into a central ‘cloud’ where it is then rendered into a language that is suitable to run on the user’s analytical platform. 

According to Philip Payne who heads the bioinformatics department at OSU Medical Center, “With the current technology, a researcher might dedicate more than 100 hours to connect the dots between a set of tissue samples, the individual medical histories for the patients who provided these tissues, and then analyzing the group as a whole. With the TRIAD platform, researchers can now execute this type of search and analysis in minutes.”

Payne continued, noting that “when it comes to biomedical research you have the digital equivalent of the Tower of Babel. One piece is written in French. And another is written in Russian. And maybe a third component is in Chinese…TRIAD acts like the ultimate interpreter between all the different ‘languages’ that biomedical data comes in so that researchers spend time figuring out how the information could improve the way we treat a disease rather than spend time finding and translating various data sets.”

As a report in GenomeWeb cited, “so far, 20 research institutes have adopted TRIAD and it’s expected that the number will increase due to the fact that its open source, is collaboratively designed, and bosts lots of technical documentation and software components.”

Call it a grid, call it a cloud—call it what you would like, but removing the “Tower of Babel” issue that many researchers face while offering privacy and access to open source analytics capabilities is a huge leap forward for the future of potentially life-saving research.