Scientific software is used in physical, environmental, earth and life sciences on a daily basis to make important discoveries. Due to its highly specialized nature, scientific software is frequently developed by scientists with deep domain knowledge, but not necessarily deep knowledge in technologies and tools used by software engineers and developers that build more mainstream applications. As a result, scientific software tends to be highly customized, less flexible, complex, poorly tested, less documented and even less maintained in the long run
Computational science: Error… why scientific programming does not compute (Zeeya Merali, Nature: 2010)
Many issues plaguing scientific software have been discussed in the literature, but the ability to reproduce computational discoveries has taken center stage in recent years. The term reproducible computational research has been coined, and used as an umbrella concept for identifying and proposing solutions to issues that affect the reproducibility of computational scientific research.
Scientific Reproducibility through Computational Workflows and Shared Provenance Representations (Yolanda Gil, NSF Workshop: 2010)
Although the challenge of reproducible computational research is multi-dimensional, some of the proposed solutions are rooted in existing, well established and robust software engineering solutions such as:
In addition, the organized and homogeneous tagging of scientific data with metadata (data about data) has been a well-established foundation for information retrieval and discovery. The development of consistent metadata and controlled vocabularies is another important component to searching, finding and using scientific data in a manner consistent with reproducible research.
Finally, (and to some degree an obvious requirement) reproducible computational research depends on the ability of other scientists or research experts to freely access the source code and scientific data used in generating new computational discoveries. These free and open access concepts have been championed by many in the software development community under the umbrella of the open-source community. Open-source code is meant to be a collaborative effort, where programmers improve upon the source code and share the changes within the community.
The BioUno open source project seeks to improve scientific application automation, performance, reproducibility, usability, and management by applying and extending software engineering (SE) best practices in the field of scientific research applications. Deliverables from the project have found a variety of applications in life science research (bioinformatics, genetics, drug discovery).
Check out our roadmap for a list of short and long term specific objectives.
BioUno has pioneered the use of continuous integration tools and techniques to create reproducible computational pipelines and to manage computer clusters in support of scientific research applications.
In addition, BioUno has adopted a variety of Software Engineering best practices, to achieve its objectives:
Finally, BioUno strives to minimize the open source proliferation problem. While the BioUno project covers a broad range of technologies and tools, it tries to avoid the Open-Source proliferation problem by actively contributing to existing open-source projects rather than releasing or starting a new project.