On June 18th, I attended my first ever Jenkins User Conference in Boston, USA.
Not only I attended, but I also had the privilege to present at this meeting my work on:
Using Jenkins as a scientific data and image management platform
The meeting attracted more than 400 people, a quite impressive number of attendees. Most of these people were of course supporting the development operations of various companies, and they were clearly interested in what I call the standard Jenkins functionality and software build workflows. So, I was a bit pleasantly surprised when at least 150-200 people attended my talk which was clearly outside the primary interest of most attendees. Furthermore, at least 10 people remained after my talk to ask questions, discuss ideas, provide feedback and inquire how they could get a hold of the presentation slides.
If you review some of my final slides, you’ll see that I’m making some recommendations on the types of improvements required to make Jenkins a better platform for use by scientists. The BioUno project is making significant contributions in this area. Finally, note that my slides on the JUC 2014 CloudBees site are an earlier version of my presentation, and do not include some last minutes updates.
In the post entitled The Secret of Building a scientific community, Manuel Corpas describe his experiences coordinating the BiosJS project. It is a great writing and if you are interested in Open Source communities, related to research or not, you will find it very interesting.
I learned about BioJS months ago, but never really used it. After reading Manuel’s post I followed some tutorials and was able to produce some neat HTML pages with sequences and trees. A few things that I learned:
- There are several components implemented by contributors
- Components have different dependencies
- Some components can use Java applets
- Some components can be use Ajax to retrieve remote content
Using Jenkins to serve BioJS artifacts
The JQuery Plug-in simply adds JQuery into Jenkins web page, but the same approach won’t work with BioJS since it is framework agnostic (which is great) and each component may have different dependencies (YUI, JQuery UI, …).
The simplest way to produce artifacts using BioJS in Jenkins, and serve the content from Jenkins is by using CDN’s for retrieving the JS files, and the HTML Publisher plug-in to archive and serve the HTML’s. As in this sample build.
Some components use Ajax requests to dynamically update the UI. For these components a callback would have to be implemented in a plug-in for Jenkins. So the simplest approach is to deploy these artifacts to a Web server with the callback.
Install necessary tools
You will have to install the following tools in your computer in order to run this example.
- R Plug-in for Jenkins
- Image Gallery Plug-in for Jenkins
We won’t go into detail here about how to install each tool. You will need Java for running Jenkins, R for running the example and the taxize R package. The R Plug-in will allow Jenkins to run R scripts and the Image Gallery will create a simple gallery using the image created by the build in Jenkins.
Setting up the Jenkins job
Once you have the plug-ins configured and Jenkins running, create a FreeStyle job and give it any name you want.
Now click the “Add build step” and select “Execute R script”. That will add a textarea where you can write your R script. Let’s copy and paste the taxize example from rOpenSci, with one modification to create a PNG with the tree.
I have just returned from Universidade de Sao Paulo, where I attended the [2º High Performance Computing Workshop](http://2whpc.lcca.usp.br/). Unfortunately I couldn't go in the morning, so I missed the first half of the event. From what I read in the schedule, in the morning USP and Rice University explained what was the current situation of the cooperation agreement - that is mainly about Rice's IBM Bluegene/P being used by USP.
My main reason to attend this event was to watch Professor Dr. João Carlos Setubal’s talk on HPC and Bioinformatics. His talk was great, but I’ll add a full report here, with all the talks that I watched (you can skip to his talk if you prefer).
GPU computing talk
The first talk was by Phd. Denis Tanigushi. He gave a great talk about GPU computing. He started with a history background, that was coincidentally similar to a recent Hacker News thread about shaders. He then exposed the problem, GPU architecture and its application in HPC. What was very interesting was that he used several GPUs and MPI too - I didn’t know it was possible.
I learned about Warps, and when/how to use GPU. Here are some of the software that appeared in Tanigushi’s talk: Matlab, NVidia libraries, Ansys and Gromacs. Oh, he also mentioned C thrust library and functors.
The bioinformatics talk
The next part of the event was very interesting too. It was a series of 4 lighting talks, the first one being the HPC and bioinformatics that I wanted to see. Prof. Setubal gave an excellent talk. He talked about his work with Genomics and Transcriptomics. He also mentioned that his work is of collaboration with other groups and Project Driven. Ah, and that it is also Big Data Driven.
He used Blast, MPI-Blast, SOAP Denovo and Abyss for his analysis. And found out that Abyss didn’t work well in Bluegene, but on the other hand, using MPI-Blast he was able to reduce the processing time from 2 months to 3 days only the time to blast his dataset.
He concluded his talk saying that most of his tools are made by other groups, but not always made to run in parallel. And that it almost always produces a pipeline to run everything. From what I could understand asking him later, his group doesn’t use any kind of pipeline tool - no Galaxy, Taverna, Mobyle, nor Jenkins (cough cough)
The rest of the event
DNADigest held a free Hackday this past Saturday, April 5th, in London. Luckily they also live transmitted the event via Internet, and it was definitely worth waking up at 5AM (UTC -3) to watch it!
DNADigest is a not-for-profit organization that is working on solutions for secure data sharing. This problem involves several fields (metadata, infrastructure, data encryption, genomics, …) and is obviously a very complicated one, so props to DNADigest for working on this.
They made sure the video for the Hackday was always working properly (big thanks Suraj!), but the audio wasn’t so good. From what I could tell the atmosphere was really nice and made the whole event a very productive meeting. I hope to be near London to participate in a future DNADigest Hackday.
The activities were coordinated by the DNADigest team and while I couldn’t listen to the audio very well, the Twitter #dnahd hashtag was being constantly updated, and everybody was hacking together using Hackpad.
My interest wasn’t exactly on data encryption, privacy or data sharing, but yes on metadata, since this is probably one of the items from BioUno roadmap that we’ll tackle next. The result was amazing. I couldn’t attend it, but after the event I had a list with tools, standards and papers to read about metadata.