Next-generation bioinformatics: an open-source education for biologists

2009 December 23

I entered my PhD program in cell biology with some knowledge about molecular biology, a dose of scientific ambition and a shiny new Powerbook G4.  For the first half of my PhD research, the laptop mostly idled on my desk, waiting patiently for me to take breaks from my bench-work to check email, search for technical literature or assemble figures for a presentation.

It was after a year or two of such “wet-lab” work–biochemistry and microscopy–that my colleagues and I encountered pressing biological questions that, we hoped, might be answered using a new technology: “deep” DNA sequencing.  This process, much-anticipated in the past few years, allows for the simultaneous, parallel sequencing of millions of short lengths of DNA.  A single run produces gigabytes of pure data; analyzing and displaying that data in useful ways required me to revisit my relationship with my computer.  Beyond being my robotic secretary, research assistant and conduit to YouTube-style procrastination, my computer suddenly became the fulcrum of my research.

In short, I was introduced to the world of bioinformatics–and by that I don’t simply mean the protocols and theory of digital data A well-worn notebook computer.analysis applied to biology.  Computational biology is, significantly, a field composed of real computer geeks more aligned, culturally, with the modern silicon-based tech sector than with the traditions of biological science.  As a young biologist, communicating with bioinformaticians finally opened my eyes to technological and economic trends in software and networking.

Leaders in the field have long been dedicated devotees to the open-source model of software distribution, and have developed a software ecosystem based on collaborative communities of developers.  Accessing the collective knowledge of these communities is as easy as joining an email list or online forum.  For the purposes of even advanced data analysis, basic Perl scripting, command-line Linux administration and the willingness to find information online are the only costs of entry.  Some of the most useful and active open-source bioinformatics environments include BioPerl and Bioconductor (for the R statistical software).   There are many smaller, single-purpose tools (such as the fast alignment programs that have grown out of the sequencing revolution) that function on the same premise: an online community of users contributing advice and improvements based on access to source code.

Academic biomedical scientists, which I use as a catch-all term to encompass researchers in fields like molecular and cell biology, immunology and infectious disease research, do not usually consider themselves technologically conservative.  However, the reality is that many experiments rely on decades-old technology–gel electrophoresis, microscopy, automated cell-sorting.  After all, time-tested techniques have been carefully honed through generations of scientists, and their results are easily judged during peer-review.

My point is that inherent conservatism of scientists (no different from their peers elsewhere in academia) predisposes them to an ignorance toward the newer, technology-enabled forms of collaborative research represented by bioinformatics–and advanced information technology in general.  Obviously, developing code to analyze data is a much different process than experimenting on living systems in the lab–not to mention trying to convert biological knowledge to improvements in drugs and diagnostics.  However, as a younger generation of biologists responds to the computational demands of processing enormous piles of data, it will be interesting to see whether the FOSS ethos of academic bioinformatics pollinates the more traditional world of the wet-lab.

Related posts:

No comments yet

Leave a Reply


Subscribe to this comment feed via RSS

Additional comments powered by BackType