In Documenting Data with Metadata we discussed how Jenkins lacks a built-in framework for relating arbitrary Jenkins projects, builds, and artifacts. This creates a challenge for linking data and metadata generated in independent builds.
Jenkins job and build configuration, parameters and artifacts are persisted as separate files on the server file system. When Jenkins starts, it builds an in-memory Jenkins object model from the XML configuration and build files of every project, as well as from the file structure of the ‘jobs’ folder. However, there is no dedicated RDBMS (relational database management system) backing up this Jenkins model, and no attempt is made to formally relate builds to each other. Once the server is shutdown, the object model is lost and needs to be rebuild from scratch on the next restart.
In this post, we will examine strategies for overcoming these limitations, and establishing build relationships that are important for data reuse, comprehension and provenance in research and data science applications.
The Active Choices plugin can be used to increase the reactiveness of your parameters. Using the plugin, you are able to better plan how your parameters will react when a user changes other parameters.
However, managing hundreds of Groovy scripts and several different job parameters may be quite a challenge.
You could use the Scriptler parameter, externalise the configuration to a configuration management tool such as Puppet, Ansible, or SaltStack, or simply build your own automation with some language such as Python, Perl, Shell script, and access Jenkins’ API via its Groovy console or remotely via REST services.
In today’s post I will show a way of achieving it with the Job DSL Plugin. With his plugin, you are able to use a domain-specific language (or DSL) to programmatically create Jenkins projects.
Participants: Bruno P. Kinoshita and Ioannis K. Moutsatsos. 2016-11-18 08:00 PM UTC using Google Hangouts.
- Regressions in Active Choices Plug-in
- Action: Bruno to try to write unit tests to prevent it from happening again
Data without associated annotation and metadata (documentation describing the data) is of little lasting value 1. It is imperative that each dataset used for processing and analysis includes sufficient metadata so that its origin, content, and processing state are clearly understood. It is only then that data becomes truly useful and trustworthy.
A friend of mine has called well-annotated data ‘civilized data’, others have called it ‘tidy data’ 2.
Here I establish some metadata vocabulary for Jenkins data science applications, so we can continue future blogs with a common vocabulary.