Jenkins as code, part 1: Setting up Jenkins in Docker

I hate doing things manually, I really do.

Log into an UI, do some clicks here and there to be able to have something created or configured. It is error prone (you can easily forget something or make a typo) and it is stupid and/or boring (Especially if you need to do this on a routine basis). If you can change something in a UI, then someone is able to change that as well and even do that without you knowing that it is changed (Or viceversa ;-)). So doing things manually is not the way forward and we should focus on automation. Automation is one of the pillars of doing DevOps, so we should always automate things right?

What people do probably not know, Jenkins is a tool that can be fully automated, you only have to know how. (And based on some posts on for example Reddit, I don’t think people knows that this is even possible).

Jenkins as code

So lets dive into, what I would say as: Jenkins as code.

This will basically be a 2 part blog post where we will discuss the following:

  1. This part where we will create a Docker image, containing Jenkins with its configuration files and plugins;
  2. The next part, where we will create a shared library and use that in a Jenkinsfile, with jobs we load via a “specifications” repository;

Before we do anything I just want to remind you that this is just 1 way to achieve a Jenkins as code setup. It does not mean that this is the only or the best way, it is just one way. Just like that there are 1000 ways to go to Rome. Next to that, these blog posts and the code in the Github repository will help you kickstart your own setup and by no means you can just run it on a production environment and blame me if something is not working fine. I am not a groovy expert and I can do some basic things, so don’t expect a new world wonder. During the blog posts I will tell you how I was able to do things, so you can redo it all yourself (and compare it with the code in the Github repository) and build it on your own terms/setup.

In both blog posts, you will see a lot of code and commands popping up. But no worries, all code is available on my Github in repository https://github.com/dj-wasabi/blog-jenkins-as-code . So lets start with the first part: Setting up Jenkins.

Docker(file)

We will create a Docker image, based on “jenkins/jenkins:lts” Docker image, install Docker and configure it with the configuration-as-code plugin included with several yaml files that are used for configuring Jenkins. And as part of the Github repository, we have a docker-compose.yaml file which we can use to boot our setup.

Lets start with the Dockerfile.

USER root
RUN groupadd docker && \
    curl -fsSL https://get.docker.com -o get-docker.sh && \
    sh get-docker.sh && \
    usermod -aG root jenkins && \
    usermod -aG docker jenkins
USER jenkins

Lets discuss this first, the “jenkins/jenkins:lts” Docker image does not contain the docker application, so we need to install that and make sure that the “jenkins” user is part of the “root” and “docker” group. We need Docker in this image, as each Jenkins job will run in its own Docker container.

ENV CASC_JENKINS_CONFIG=/var/jenkins_home/casc

We need set an environment variable named CASC_JENKINS_CONFIG, which we basically tell Jenkins where the configuration-as-code yaml files can be found.

COPY casc/ /var/jenkins_home/casc
COPY plugins.txt /usr/share/jenkins/plugins.txt
RUN /usr/local/bin/install-plugins.sh < /usr/share/jenkins/plugins.txt

Then we will copy the contents on the casc directory to the earlier mentioned directory and we place the file with all of our plugins into a specific directory. Then we run the install-plugins.sh script so we can download and install all the plugins that we need in our setup.

And that is our Dockerfile, easy right? This will allow us to build a Jenkins Docker image, with all of our files and configuration that will lead to an Jenkins environment we want to have. We can deploy this on some host running Docker or even make some additional changes to make it work in Kubernetes.

Plugins

Lets go to the plugins.txt file, as this one is a bit easier to explain than the casc files.

We need to create a plugins.txt that contains all of the plugins we want to make use, so how do we do that. I manually (oh yes, sorry! :)) started a Jenkins container and followed the installation steps and then I picked several plugins to install during the installation steps. When Jenkins is running and I have finnished nstalling all plugins (don’t forget the “configuration-as-code” plugin), I went to “manage” and then clicked on “Script Console“. There you see an textfield to execute groovy scripts and I used the following script:

def plugins = jenkins.model.Jenkins.instance.getPluginManager().getPlugins()
plugins.each {println "${it.getShortName()}:latest"}

This is a “script” that provides an overview of all plugins that are currenlty installed in Jenkins. I have used the “latest” version of the plugin which is fine for demo purposes, but you could also update the “latest” to ${it.getVersion()}. This will show the actual version of the installed plugin. I would suggest to use the versions in the plugins.txt file. This helps you in the future when someone create’s an PR that it shows you that there is an update in the version of a plugin, which you won’t see when it is using “latest”.

Then you hit the “Run” button and you will see some output appear. Select the output and place that in the plugins.txt file and you are done (I would also sort the contents of the file, so all plugins are order alphabetical).

Configuration as code

Let us first explain how we can get a yaml file. Go to “Manage” and then you will see “Configuration as Code” (And click on it please) in the “System Configuration” lane. There is a button called “Download Configuration” which will download the yaml file and with “View Configuration” you can see the yaml file in your browser. When you have downloaded the yaml configuration file, you can make use of it in your Docker image by placing it in the casc/ directory. I would suggest you split it into seperate files, so you won’t have 1 large file but smaller ones with each a specific set of configuration. For example, create a credentials.yaml file containing all Jenkins credentials. 

But before you commit all your changes, you can also update some values by using environment variables, see the following:

  securityRealm:
    local:
      allowsSignup: false
      enableCaptcha: false
      users:
      - id: "${JENKINS_ADMIN_USERNAME:-admin}"
        name: "${JENKINS_ADMIN_NAME:-Administrator}"
        password: "${JENKINS_ADMIN_PASSWORD}"

This piece of the configuration you see now is responsible for creating an admin user. I don’t want to hardcode the username and definitely not the password in this file, so I use environment variables for that. And this is also the case for the credentials that Jenkins use, see the following example of the Jenkins “credentials”:

credentials:
  system:
    domainCredentials:
      - credentials:
          - basicSSHUserPrivateKey:
              scope: GLOBAL
              id: "SSH_GIT_KEY"
              username: "git"
              description: "SSH Credentials for jenkins"
              privateKeySource:
                directEntry:
                  privateKey: ${JENKINS_SSH_GIT_KEY}

With the above credential configuration I won’t have to hardcode the SSH Private key in the Docker image, but can use it as an environment variable. Nice right? 🙂

When everything is done via code, we can also already configure the Security Matrix and allowing what user can and most importantly can’t do in Jenkins. As my Jenkins is running on premise and don’t allow traffic from outside the environment, I will allow people to start jobs (if they don’t want to wait on the triggering). So I will allow anonymous people to have read, build and cancel rights for the jobs. Why go for all that trouble for letting people authenticate against some source, so we can see that this person has started or cancelled a job? Most importantly, they can’t change anything unless they know the admin password. (But will be undone when Jenkins is restarted! :))

  authorizationStrategy:
    globalMatrix:
      permissions:
      - "Job/Build:anonymous"
      - "Job/Cancel:anonymous"
      - "Job/Read:anonymous"
      - "Overall/Administer:admin"
      - "Overall/Read:anonymous"

Now we are able to fully do Jenkins as code, as we will store the yaml files in the casc/ directory which are loaded when Jenkins is started. But when Jenkins is running, we also need to make sure that we will load the jobs from somewhere. We will do this with a “Seed Job“, which you can see in the “dsl-jobs.yaml” file in the casc/ directory in the Github repository.

            git {
                remote { 
                    url "${JENKINS_JOB_DSL_URL}"
                    credentials 'SSH_GIT_KEY' 
                }
                branch '*/main'
              }
        }
        triggers {
            scm('H/15 * * * *')
        }
        steps {
          dsl {
            external('${JENKINS_JOB_DSL_PATH:-jobs}/*.groovy')
            removeAction('DELETE')
          }
        }
      }

When Jenkins is started, we will automatically create the “Seed all DSL jobs” Jenkins job. And what it does is basically the following (Snippet is incomplete, see for the compete file on Github for full version):

  1. We use the credential ‘SSH_GIT_KEY‘ to checkout the repository mentioned in ${JENKINS_JOB_DSL_URL} (See the docker-compose.yaml file)
  2. We use the ‘main’ branch;
  3. The job is executed every 15 minutes;
  4. In the directory named ${JENKINS_JOB_DSL_PATH} we will find groovy files and if Jenkins has jobs which aren’t configured in these groovy files, we delete the jobs from Jenkins.

Before we finalise the Configuration as code part, we need to discuss one last file (and action). When we have the Jenkins server running, we will run each job in its own Docker container. So the Jenkins server will start a Docker container and do all of its action inside that container and the configuration is what follows:

jenkins:
  clouds:
    - docker:
        name: "docker"
        dockerApi:
          dockerHost:
            uri: "${DOCKER_HOST:-unix:///var/run/docker.sock}"
        templates:
        - connector:
            attach:
              user: "jenkins"
          dockerTemplateBase:
            bindAllPorts: true
            image: "jenkins/agent:latest"
            privileged: true
            environment:
              - "TZ=Europe/Amsterdam"
          instanceCapStr: "99"
          labelString: "worker"
          name: "worker"
          remoteFs: "/home/jenkins/agent"

This is also seen in the file “docker.yaml“. Here we have placed 1 template which we named “worker“, with the “jenkins/agent:latest” Docker image. As you know, this is just an example so you can modify this to your needs and use a Docker image that suits your needs. This Docker image should contain all the tools needed to run your jobs, so the “jenkins/agent:latest” might not be fit for your setup. And do know, as the “templates” key is a list, you can add a lot more templates with a unique name, settings and Docker image. For the dockerHost.uri, you will see the usage of a environment variable “DOCKER_HOST“. This is an variable we use in docker-compose.yaml file and if we don’t provide one, the default unix:///var/run/docker.sock is used.

You can go to “manage“, “Systems configuration” and scroll all the way down until you will see “Cloud“. It provides a link and when clicking on it, you’ll get the page where you can configure the “Cloud” configuration. When you make changes, don’t forget to export the yaml file on the “Configuration as Code” page mentioned earlier.

Build and ship it

So far we have discussed some basics on how we get our configuration, so lets build a Docker image. During the rest of this blog post, I will assume you will have the same layout as my Github repository. So lets go to the directory where we have the “Dockerfile“, “plugins.txt“, “docker-compose.yaml” file and the “casc/” directory. Here we will run the docker build command, to build the new Docker image.

cd server
docker build -t jenkins-as-code . --pull

I named it ‘jenkins-as-code‘ which works locally fine and if you want to push it into a Docker registry, you should prefix it with the correct registry name. If you prefix it with a registry or you named it differently, don’t forget to update the docker-compose.yml file with your new name. The –pull is there so if you already have a “jenkins/jenkins:lts” Docker image, you will get the latest one.

I think it is build now, otherwise we will wait a minute before we continue.

sleep 60 🙂

Ok, the Docker image is build and we can start it. If you see the docker-compose.yaml file, you will notice 2 ‘services’.

  1. socat;
  2. jenkins (The one will just build and want to start).

But lets describe the ‘socat‘ service. The ‘socat‘ service is used to make sure that our docker.sock file from our host can be used with Jenkins for starting the agents. If we do this from the Jenkins container itself and not using this ‘socat‘ service, we will get permission denied errors and Jenkins can not start any new Docker container (I am doing on a Mac, I don’t think people running it on Linux hosts will have issues ). 

The Jenkins service has several environment variables set. Before we start everything, we will need to create an environment variable first that contains the content of a private SSH key. I have used the following command for that:

export EXPORTED_PASSWORD=$(cat ~/.ssh/wd_id_rsa)

So this EXPORTED_PASSWORD contains the private SSH key and this one will be used in Jenkins as the SSH_GIT_KEY credential on multiple places. Also worth to mention is the JENKINS_ADMIN_PASSWORD environment variable, this is what is says: The password for the Admin user, so if you want to use something else here is the moment to change it.

We will start it with the following command:

docker compose up -d

I prefer starting it in the background, so that is why I added the -d argument. Once it is booted we open our favourite browser and go to http://localhost:8080 you will see something like the following:

So that is it for now. We started our newly build Docker image containing Jenkins, with the plugins we need in our environment and our own configuration!

We will go into the “Seed All DSL jobs” job with the next part of the blogpost. So stay tuned! 🙂

Finalizing the installation of Zabbix Agent with Ansible

I wrote this blog post I while back for Zabbix itself to be posted on the Blog site, see: https://blog.zabbix.com/finalizing-the-installation-of-zabbix-agent-with-ansible/13321/ (There are a lot of interesting posts on there, so please check the blogposts often!)

In the previous blog posts, we created a Zabbix Server with a new user, a media type, and an action. In the 2nd blog post, we continued with creating and configuring a Zabbix Proxy. In the last part of this series of blog posts, we will install the Zabbix Agent on all of the 3 nodes we have running.

This blog post is the 3rd part of a 3 part series of blog posts where Werner Dijkerman gives us an example of how to set up your Zabbix infrastructure by using Ansible.
You can find part 1 of the blog post by clicking here.

To summarize, so far we have a Zabbix Server and a Zabbix Proxy. The Zabbix Server has a MySQL instance running on a separate node, the MySQL instance for the Zabbix Proxy runs on the same node. But we are missing one component right now, and that is something we will install with the help of this blog post. We will install the Zabbix Agent on the 3 nodes.

A git repository containing the code used in these blog posts is available on https://github.com/dj-wasabi/blog-installing-zabbix-with-ansible. Before we run Ansible, we need to make sure we have opened a shell on the “bastion” node. We can do that with the following command:

$ vagrant ssh bastion

Once we have opened the shell, go to the “/ansible” directory where we have all of our Ansible files present.

$ cd /ansible

In the previous blog post, we executed the “zabbix-proxy.yml” playbook. Now we are going to use the “zabbix-agent.yml” playbook. The playbook will install the Zabbix Agent on all nodes (“node-1”, “node-2” and “node-3”). Next up, on both the “node-1” and “node-3”, we will add a user parameters file specifically for MySQL. With this user parameters file, we are able to monitor the MySQL instances.

$ ansible-playbook -i hosts zabbix-agent.yml

This playbook will run for a few minutes installing the Zabbix Agent on the nodes. It will install the zabbix-agent package and add the configuration file, but it will also make a connection to the Zabbix Server API. We will automatically create a host with the correct IP information and the correct templates! When the Ansible playbook has finished running, the hosts can immediately be found in the Frontend. And better yet, it is automatically correctly configured, so the hosts will be monitored immediately!

We have several configurations spread over multiple files to make this work. We first start with the “all” file.

The file “/ansible/group_vars/all” contains the properties that will apply to all hosts. Here we have the majority of essential properties configured that are overriding the default properties of the Ansible Roles. Each role has some default configuration, which will work out of the box. But in our case, we need to override these, and we will discuss some of these properties next.

zabbix_url

This is the URL on which the Zabbix Frontend is available and thus also the API. This property is for example used when we create the hosts via the API as part of the Proxy and Agent installation.

zabbix_proxy

The Zabbix Agents will be monitored by the Zabbix Proxy unless the Agent runs on the Zabbix Server or the host running the database for the Zabbix Server. Like with the previous blog post, we will also use some Ansible notation to get the IP address of the host running the Zabbix Proxy to configure the Zabbix Agent.

zabbix_proxy: node-3
zabbix_agent_server: "{{ hostvars[zabbix_proxy]['ansible_host'] }}"
zabbix_agent_serveractive: "{{ hostvars[zabbix_proxy]['ansible_host'] }}"

With the above configuration, we configure both the Server and ServerActive in the Zabbix Agent’s configuration file to use the IP address of the host running the Zabbix Proxy. If you look at the files “/ansible/group_vars/zabbix_database” and “/ansible/group_vars/zabbix_server/generic” you would see that these contain the following:

zabbix_agent_server: "{{ hostvars['node-1']['ansible_host'] }}"
zabbix_agent_serveractive: "{{ hostvars['node-1']['ansible_host'] }}"

The Zabbix Agent on the Zabbix Server and on its database is using the IP address of the Zabbix Server to be used as the value for both the “Server” and “ActiveServer” configuration settings for the Zabbix Agent.

zabbix_api_user & zabbix_api_pass

These are the default in the roles, but I have added them here so it is clear that they exist. When you change the Admin user password, don’t forget to change them here as well.

zabbix_api_create_hosts & zabbix_api_create_hostgroups 

Because we automatically want to create the Zabbix Frontend hosts via the API, we need to set both these properties to true. Firstly, we create the host groups that can be found with the property named “zabbix_host_groups”. After that, as part of the Zabbix Agent installation, the hosts will be created via the API because of the property zabbix_api_create_hosts.

Now we need to know what kind of information we want these hosts created with. Let’s go through some of them.

zabbix_agent_interfaces

This property contains a list of all interfaces that are used to monitoring the host. This is relatively simple in our case, as the hosts only have 1 interface available. You can find some more information about what to use when you have other interfaces like IPMI or SNMP: https://github.com/ansible-collections/community.zabbix/blob/main/docs/ZABBIX_AGENT_ROLE.md#other-interfaces We use the interface with the value from property “ansible_host” for port 10050.

zabbix_host_groups

This property was also discussed before – we automatically assign our new host to these host groups. Again, we have a fundamental setup, and thus it is an effortless property.

zabbix_link_templates

We provide a list of all Zabbix Templates we will want to assign to the hosts with this property. This property seems a bit complicated, but no worries – let’s dive in!

zabbix_link_templates:
  - "{{ zabbix_link_templates_append if zabbix_link_templates_append is defined else [] }}"
  - "{{ zabbix_link_templates_default }}"

With the first line, we add the property’s value “zabbix_link_templates_append”, but we only do that if that property exists. If Ansible can not find that property, then we basically add an empty list. So where can we find this property? We can check the files in the other directories in the group_vars directory. If we check, for example “/ansible/group_vars/database/generic”, we will find the property:

zabbix_link_templates_append:
  - 'MySQL by Zabbix agent'

So on all nodes that are part of the database group, we add the value to the property “zabbix_link_templates”. All of the database servers will get this template attached to the host. If we would check the file “/ansible/group_vars/zabbix_server/generic”, then we will find the following:

zabbix_link_templates_append:
  - 'Zabbix Server'

As you probably understand now, when we create the Zabbix Server host, we will add the “Zabbix Server” template to the host, because this file is only used for the hosts that are part of the zabbix_server group.

With this setup, we can configure specific templates for the specific groups, but there is also at least 1 template that we always want to add. We don’t want to add the template to each file as that is a lot of configuration, so we use a new property for this named “zabbix_link_templates_default”. In our case, we only have Linux hosts, so we always want to add the templates:

zabbix_link_templates_default:
  - "Linux by Zabbix agent active"

On the Zabbix Server, we both assign the “Zabbix Server” template and the template “Linux by Zabbix agent active” to the host.

But what if we have Macros?

zabbix_macros

As part of some extra tasks in this playbook execution, we also need to provide a macro for some hosts. This macro is needed to make the Zabbix Template we assign to the hosts work. For the hosts running a MySQL database, we need to add a macro, which can be found with the property zabbix_macros_append in the file “/ansible/group_vars/database/generic”.

zabbix_macros_append:
  - macro_key: "MYSQL.HOST"
    macro_value: "{{ ansible_host }}"

We will create 1 macro with the key name “MYSQL.HOST” and assign a value that will be equal to the contents of the property ansible_host (For the “node-2” host, the host running the database for the Zabbix Server), which is “10.10.1.12”.

User parameters

The “problem” with assigning the MySQL template is that it also requires some UserParameter entries set. The Zabbix Agent role can deploy files containing UserParameters to the given hosts. In “/ansible/group_vars/database/generic” we can find the following properties:

zabbix_agent_userparameters_templates_src: "{{ inventory_dir }}/files/zabbix/mysql"
zabbix_agent_userparameters:
  - name: template_db_mysql.conf

The first property “zabbix_agent_userparameters_templates_src” will let Ansible know where to find the files. The “{{ inventory_dir }}” will be translated to “/ansible” and here you will find a directory named “files” (and you will find the group_vars directory as well) and further drilling down the directories, you will find the file “template_db_mysql.conf”.

With the second property “zabbix_agent_userparameters” we let Ansible know which file we want to deploy to the host. In this case, the only file found in the directory named “template_db_mysql.conf”.

When the Zabbix agent role is fully executed, we have everything set to monitor all the hosts automatically. Open the dashboard, and you will see something like the following:

It provides an overview, and on the right side, you will notice we have a total of 3 nodes of which 3 are available. Maybe you will see a “Problem” like in the screenshot above, but it will go away.

If we go to “Configuration” and “Hosts,” we will see that we have the 3 nodes, and they have the status “Enabled” and the “ZBX” icon is green, so we have a proper connection.

We should verify that we have some data, so go to “Monitoring” and click on “Latest data.” We select in the Host form field the “Zabbix database,” and we select “MySQL” as Application and click on “Apply.” If everything is right, it should provide us with some information and values, just like the following screenshot. If not, please wait a few minutes and try again.

Summary

This is the end of a 3 part blog post in creating a fully working Zabbix environment with a Zabbix Server, Proxy, and Agent. With these 3 blogposts you were able to see how you can install and configure a complete Zabbix environment with Ansible. Keep in mind that the code shown was for demo purposes and it is not something you can immediately use for the Production environment. We also used some of the available functionality of the Ansible collection for Zabbix, there are many more possibilities like creating a maintenance period or a discovery rule. Not everything is possible, if you do miss a task or functionality of a role that Ansible should do or configure, please create an issue on Github so we can make it happen.

Don’t forget to execute the following command:

$ vagrant destroy -f

With this, we clean up our environment and delete our 4 nodes, thus finishing with the task at hand!

Installing and configuring the Zabbix Proxy

I wrote this blog post I while back for Zabbix itself to be posted on the Blog site, see: https://blog.zabbix.com/installing-and-configuring-the-zabbix-proxy/13319/ (There are a lot of interesting posts on there, so please check the blogposts often!)

In the previous blog post, we created a Zabbix Server setup, created several users, a media type, and an action. But today, we will install on a 3rd node the Zabbix Proxy. This Zabbix Proxy will have its database running on the same host, so the “node-3” host has both the MySQL and Zabbix Proxy running.

This blog post is the 2nd part of a 3 part series of blog posts where Werner Dijkerman gives us an example of how to set up your Zabbix infrastructure by using Ansible.
You can find part 1 of the blog post by clicking Here

A git repository containing the code of these blog posts is available, which can be found on https://github.com/dj-wasabi/blog-installing-zabbix-with-ansible. Before we run Ansible, we have opened a shell on the “bastion” node. We do that with the following command:

$ vagrant ssh bastion

Once we have opened the shell, go to the “/ansible” directory where we have all of our Ansible files present.

$ cd /ansible

With the previous blog post, we executed the “zabbix-server.yml” playbook. Now we use the “zabbix-proxy.yml” playbook. The playbook will deploy a MySQL database on “node-3” and also installs the Zabbix Proxy on the same host.

$ ansible-playbook -i hosts zabbix-proxy.yml

This playbook will run for a few minutes creating all services on the node. While it is running, we will explain some of the configuration options we have set.

The configuration which we will talk about can be found in “/ansible/group_vars/zabbix_proxy” directory. This is the directory that is only used when we deploy the Zabbix proxy and contains 2 files. 1 file called “secret”, and a file called “generic”. It doesn’t really matter what names the files have in this directory. I used a file called the “secret” for letting you know that this file contains secrets and should be encrypted with a tool like ansible-vault. As this is out of scope for this blog, I simply made sure the file is in plain text. So how do we know that this directory is used for the Zabbix Proxy node?

In the previous blog post, we mentioned that with the “-I” argument, we provided the location for the inventory file. This inventory file contains the hostnames and the groups that Ansible is using. If we open the inventory file “hosts”, we can see a group called “zabbix_proxy.” So Ansible uses the information in the “/ansible/group_vars/zabbix_proxy” directory as input for variables. But how does the “/ansible/zabbix-proxy.yml” file know which host or groups to use? At the beginning of this file, you will notice the following:

- hosts: zabbix_proxy
  become: true
  collections:
    - community.zabbix

Here you will see the that “hosts” key contains the value “zabbix_proxy”. All tasks and roles that we have configured in this play will be applied to all of the hosts that are part of the zabbix_proxy group. In our case, we have only 1 host part of the group. If you would have for example 4 different datacenters and within each datacenter you want to have a Zabbix Proxy running, executing this playbook will be done on these 4 hosts and at the end of the run you would have 4 Zabbix Proxy servers running.

Within the “/ansible/group_vars/zabbix_proxy/generic” the file, we have several options configured. Let’s discuss the following options:

  • zabbix_server_host
  • zabbix_proxy_name
  • zabbix_api_create_proxy
  • zabbix_proxy_configfrequency

zabbix_server_host

The first one, the “zabbix_server_host” property tells us where the Zabbix Proxy can find the Zabbix Server. This will allow the Zabbix Proxy and the Zabbix Server to communicate with each other. Normally you would have to configure the firewall (Iptables or Firewalld) as well to allow the traffic, but in this case, there is no need for that. Everything inside our environment which we have created with Vagrant has full access. When you are going to deploy a production-like environment, don’t forget to configure the firewall (Currently this configuration of the firewalls are not yet available as part of the Ansible Zabbix Collection for both the Zabbix Server and the Zabbix Proxy. So for now you should be creating a playbook in order to configure the local firewall to allow/deny traffic).

As you will notice, we didn’t configure the property with a value like an IP address or FQDN. We use some Ansible notation to do that for us, so we only have the Zabbix Server information in one place instead of multiple places. In this case, Ansible will get the information by reading the inventory file and looking for a host entry with the name “node-1” (Which is the hostname that is running the Zabbix Server), and we use the value found by the property named “ansible_host” (Which has a value “10.10.1.11”).

zabbix_proxy_name

This is the name of the Zabbix Proxy host, which will be shown in the Zabbix frontend. We will see this later in this blog when we will create a new host to be monitored. When you create a new host, you can configure if that new host should be monitored by a proxy and if so, you will see this name.

zabbix_api_create_proxy

When we deploy the Zabbix Proxy role, we will not only install the Zabbix Proxy package, the configuration file and start the service. We also perform an API call to the Zabbix Server to create a Zabbix Proxy entry. With this API call, we can configure hosts to be monitored via this new Zabbix Proxy.

zabbix_proxy_configfrequency

The last one is just for demonstration purposes. With a default installation/configuration of the Zabbix Proxy, it has a basic value of 3600. This means that the Zabbix Server sends the configuration every 3600 to the Zabbix Proxy. Because we are running a small demo here in this Vagrant setup, we have set this to 60 seconds.

Now the deployment of our Zabbix Proxy will be ready.

When we open the Zabbix Web interface again, we go to “Administration” and click on “Proxies”. Here we see the following:

We see an overview of all proxies available, and in our case, we only have 1. We have “node-3” configured, which has an “Active” mode. When you want to configure a “Passive” mode proxy, you’ll have to update the “/ansible/group_vars/zabbix_proxy” file and add somewhere in the file the following entry: “zabbix_proxy_status: passive”. Once you have updated and saved the file, you’ll have to rerun the “ansible-playbook -i hosts zabbix-proxy.yml” command. If you will then recheck the page, you will notice that it now has the “Passive” mode.

So let’s go to “Configuration” – “Hosts”. At the moment, you will only see 1 host, which is the “Zabbix server,” like in the following picture.

Let’s open the host creation page to demonstrate that you can now set the host to be monitored by a proxy. The actual creation of a host is something that we will do automatically when we deploy the Zabbix Agent with Ansible and not something we should do manually. 😉 As you will notice, you are able to click on the dropdown menu with the option “Monitored by proxy” and see the “node-3” appear. That is very good!

Summary

We have installed and configured both a Zabbix Server and a Zabbix Proxy, and we are all set now. With the Zabbix Proxy, we have installed both the MySQL database and the Zabbix Proxy on the same node. Whereas we did install them separately with the Zabbix Server. With the following blog post, we will go and install the Zabbix Agent on all nodes.

Installing the Zabbix Server with Ansible

I wrote this blog post I while back for Zabbix itself to be posted on the Blog site, see: https://blog.zabbix.com/installing-the-zabbix-server-with-ansible/13317/ (There are a lot of interesting posts on there, so please check the blogposts often!)

Today we are focusing more on the automation of installation and software configuration instead of using the manual approach. Installing and configuring software the manual way takes a lot more time, you can easily make more errors by forgetting steps or making typos, and it will probably be a bit boring when you need to do this for a large number of servers.

In this blog post, I will demonstrate how to install and configure a Zabbix environment with Ansible. Ansible has the potential to simplify many of your day-to-day tasks. As an alternative to Ansible, you may also opt in to use Puppet, Chef, and SaltStack to install and configure your Zabbix environment.

Ansible does not have any specific infrastructure requirements for it to do its job. We just need to make sure that the user exists on the target host, preferably configured with SSH keys. With tools like Puppet or Chef, you need to have a server running somewhere, and you will need to deploy an agent on your nodes. You can learn more about Ansible here:  https://docs.ansible.com/ansible/latest/index.html.

This post is the first in a series of three articles. We will set up a (MySQL) Database running on 1 node (“node-2”), Zabbix Server incl. Frontend, which will be running on a separate node (“node-1”). Once we have built this, we configure an action, media and we will create some users. In the following image you will see the environment we will create.

Our environment we will create.
The environment we will create.

In the 2nd blog post, we will set up a Zabbix Proxy and a MySQL database on a new but the same node (“node-3”). In the 3rd blog post, we will install the Zabbix Agent on all of the 3 nodes we were using so far and configure some user parameters. Where the Zabbix Agent on “node-3” is using the Zabbix Proxy, the Zabbix Agent on the nodes “node-1” and “node-2” will be monitored by the Zabbix Server.

Preparations

A git repository containing the code used in these blog posts is available, which can be found on https://github.com/dj-wasabi/blog-installing-zabbix-with-ansible. Before we can do anything, we have to install Vagrant (https://www.vagrantup.com/downloads.html) and Virtualbox (https://www.virtualbox.org/wiki/Downloads). Once you have done that, please clone the earlier mentioned git repository somewhere on your host. For this demo, we will not run the Zabbix Frontend with TLS certificates.

We have to update the hosts file. With the following line, we need to make sure that we can access the Zabbix Frontend.

10.10.1.11 zabbix.example.com

In the “ROOT” directory of the git repository which you cloned some moments ago, where you can also find the Vagrantfile, This Vagrantfile contains the configuration of the virtual machine of our setup. We will create 4 Virtual Machine’s running Ubuntu 20.04, each with 1 CPU and 1 GB of Ram which you can see in the first “config” block. In the 2nd config block, we configure our “bastion” host, which we discuss later. This node will get the ip 10.10.1.3 and we also mount the ansible directory in this Virtual Machine on location “/ansible”. For installing and configuring this node we will use a playbook bastion.yml to do this. With this playbook, we will install some packages like Python, git and Ansible inside this bastion virtual machine.

The 3rd config block is part of a loop that will configure and it will create 3 Virtual Machines. Each virtual machine is also an Ubuntu node, had its own ip (respectively 10.10.1.11 for the first node, 10.10.1.12 for the second and 10.10.1.13 for the 3rd node) and just like the “bastion” node, they have each 1 CPU and 1 GB of RAM.

You will have to execute the following command:

$ vagrant up

With this command, we will start our Virtual Machine’s. This might take a while, as it will download a VirtualBox image containing Ubuntu. The “vagrant up” command will start the “bastion” node and all other nodes as a part of this demo. Once that is done, we need to access a shell on the “bastion” node:

$ vagrant ssh bastion

This “bastion” node is a fundamental node on which we will execute Ansible, but we will not be installing anything on this host. We have opened a shell in the Virtual Machine we just created. You can compare it with creating an “ssh” connection. We have to go to the following directory before we can download the dependencies:

$ cd /ansible

As mentioned before, we have to download the Ansible dependencies. The installation depends on several Ansible Roles and an Ansible Collection. With the Ansible Roles and the Ansible Collection, we can install MySQL, Apache, and the Zabbix components. We have to execute the following command to download the dependencies:

$ ansible-galaxy install -r requirements.yml
Starting galaxy role install process
- downloading role 'mysql', owned by geerlingguy
- downloading role from https://github.com/geerlingguy/ansible-role-mysql/archive/3.3.0.tar.gz
- extracting geerlingguy.mysql to /home/vagrant/.ansible/roles/geerlingguy.mysql
- geerlingguy.mysql (3.3.0) was installed successfully
- downloading role 'apache', owned by geerlingguy
- downloading role from https://github.com/geerlingguy/ansible-role-apache/archive/3.1.4.tar.gz
- extracting geerlingguy.apache to /home/vagrant/.ansible/roles/geerlingguy.apache
- geerlingguy.apache (3.1.4) was installed successfully
- extracting wdijkerman.php to /home/vagrant/.ansible/roles/wdijkerman.php
- wdijkerman.php was installed successfully
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'community.zabbix:1.2.0' to '/home/vagrant/.ansible/collections/ansible_collections/community/zabbix'
Created collection for community.zabbix at /home/vagrant/.ansible/collections/ansible_collections/community/zabbix
community.zabbix (1.2.0) was installed successfully

Your output may vary because of versions that might have been updated already since writing this blog post. We now have downloaded the dependencies and are ready to install the rest of our environment. But why do we need to download a role for MySQL, Apache and php? A role contains all the neccecerry tasks and files to configure that specific service. So in the case for the MySQL Ansible role, it will install the MySQL-server and all other packages that MySQL requires on the host, it will configure that the mysqld service is created and is running, but it will also create the databases, create and configure MySQL users and configure the root password. Using a role will help us install our environment and we don’t have to figure out ourselves on installing and configuring a MySQL server manually.

So what about the collection, the Ansible Community Zabbix Collection? Ansible has introduced this concept with Ansible 2.10 and is basically a “collection” of plugins, modules and roles for a specific service. In our case, with the Zabbix Collection, the collection contains the roles for installing the Zabbix Server, Proxy, Agent, Javagateway and the Frond-end. But it also contains a plugin to use a Zabbix environment as our inventory and contains modules for creating resources in Zabbix. All of these modules will work with the Zabbix API to configure these resources, like actions, triggers, groups. templates, proxies etc. Basically, everything we want to create and use can be done with a role or a collection.

Installing Zabbix Server

Now we can execute the following command, which will install the MySQL database on “node-2” and installs the Zabbix Server on “node-1”:

$ ansible-playbook -i hosts zabbix-server.yml

This might take a while, a minute, or 10 depending on the performance of your host. We execute the “ansible-playbook” command, and then “-i” we provide the location of the inventory file. Here you see the contents of the inventory file:

[zabbix_server]
node-1 ansible_host=10.10.1.11

[zabbix_database]
node-2 ansible_host=10.10.1.12

[zabbix_proxy]
node-3 ansible_host=10.10.1.13

[database:children]
zabbix_database zabbix_proxy

This inventory file contains basically all of our nodes and to which group the hosts belong. We can see in that file that there is a group called “zabbix_server” (The value between [] square brackets is the name for the group) and contains the “node-1” host. Because we have a group called “zabbix_server,” we also have a directory containing some files. These are all the properties (or variables) that will be used for all hosts (in our case, only the “node-1”) in the “zabbix_server” group.

Web Interface

Now you can open your favorite browser and open “zabbix.example.com”, and you will see the Zabbix login screen. Please enter the default credentials:

Username: Admin
Password: zabbix

On the Dashboard, you will probably notice that it complains that it can not connect to the Zabbix Agent running on the Zabbix Server, which is fine as we haven’t  installed it yet. We will do this in a later blog post.

Dashboard overview

When we go to “Administration” and click on “Media types,” we will see a media type called “A: Ops email.” That is the one we have created. We can open the “/ansible/zabbix-server.yml” file and go to line 33, where we have configured the creation of the Mediatype. In this case, we have configured multiple templates for sending emails via the “mail.example.com” SMTP server.

Now we have seen the media type, we will look at the trigger we just created. This trigger makes use of the media type we just saw. The trigger can be found in the “/ansible/zabbix-server.yml” file on line 69. When you go to “Configuration” and “Actions,” you will see our created trigger “A: Send alerts to Admin”. But we don’t want to run this in Production, and for demonstrating purposes, we have selected to be triggered when the severity is Information or higher.

And lastly, we are going to see that we have also created new internal users. Navigate to “Administration” – “Users,” and you will see that we have created a user called “wdijkerman”, which can be found in the “/ansible/zabbix-server.yml” file on line 95. This user will be part of a group created earlier called “ops,”. The user type is Zabbix super admin and we have configured the email media type to be used 24×7.

We have defined a default password for this user – “password”. When you have changed the password in the Zabbix Frontend UI, executing the playbook would not change the password back again to “password.” So don’t worry about it. But if you would have removed – let’s say – the “ops” group, then, when you execute the playbook again, the group will be re-added to the user.

Summary

As you see, it is effortless to create and configure a Zabbix environment with Ansible. We didn’t have to do anything manually, and all installations and configurations were applied automatically when we executed the ansible-playbook command. You can find more information on either the Ansible page https://docs.ansible.com/ansible/latest/collections/community/zabbix/ or on the Github page https://github.com/ansible-collections/community.zabbix.

In the next post, we will install and configure the Zabbix Proxy.

The ethics of Pull Requests, improving the Pull Requests process

This is the 3rd and the last part blogpost series about Pull Request reviewing. I am writing this so I can give you my personal view on each side of the Pull Request saga, something I explain to every member joining my team. It contains about the following 3 parts:

We have discussed some aspects about being the author and reviewer when it is about Pull Requests. But are there any ways to improve the whole Pull Request process, so we can focus on what really matters. Like we want to merge quality code that brings functionality that someone wants to make use of it. But what ever we do, we still have to create an PR as being the Author and/or as a reviewer we still has to review it. But maybe we can do some things to make it easier for both parties even before we create a Pull Request.

Small changes

Making small changes in a code base are much easier to review. Not that you need to create lets say 5 Pull Requests with each 1 line changed, especially when these 5 lines together needs to work together for a specific funtionality. But when you are working on something big, try to think about splitting the work in smaller pieces. Think about if making something like a toggle to enable/disable the new functionality can work, so you can spread the work over various Pull Requests and knowing that your new functionality is not yet used. So eventually you have to enable it, but you can work more easily between all the pull requests and provide feedback to the rest of your code. And 5 smaller Pull Requests are much easier and faster reviewed than 1 big Pull Request.

precommit hooks

This is a bit technical, but it helps both you and the reviewer to focus on what matters. The functionality you want to add. Wehen the author is doing an “git commit” with precommit hook(s), several scripts and/or commands are executed on the files that have been changed. This can be for example linting commands that does some static analysis. If the linting fails, then the “git commit” also fails and then you can fix the issue and try again.

For an example, when you write Terraform code there is a tool called tfsec, which allows you to do static analysis for possible security related issues. If you want to create an Security Group in AWS and provides a cidr_range of “0.0.0.0/0” it will complain and fail the “git commit” execution, because it is widely open. When this happens, you can check and either update (Because it is always easy to just use “0.0.0.0/0” and not think about a smaller subnet) or validate again that this it is actually correct and add a line above with that it should be ignored (For example: “#tfsec:ignore:AWS009”). Once done and when you execute a “git commit” it will succeed.

This was just one example of possibilities with using pre-commit hooks, there are of course a lot more.

Pair programming

This is not for everyone, especially when everyone have to work from home during a pandamic and not everyone – especially in the beginning – likes that someone is looking “over the shoulders” to see you code. People will get nervous if they are “being watched” while coding and probably it adds another bit of extra pressure because you don’t want to make mistakes. But making mistakes are fine, you have a buddy that is that extra set of eyes to help you code, spot possible issues and you can discuss very easily together on the approache to take. As you are both working on it, you will both know what, how and why this new piece functionality is created.

When you create a Pull Request you can add either a comment or update the description that you both worked on it. This will let the reviewer also know that there was a 2nd pair of eyes on it and most of the actual review work was already done. So it then is a bit of a formality to approve the Pull Request. But if it is just a formality, why not commit into master|main?

Well, I think you should never merge into master. Always create a branch, create a Pull Request and merge that. When commiting to master|main directly the functionality is only known to both you and you both can still make errors. When you also have tests that are executed as part of the CI process with feature branches, you have an extra confirmation that you won’t break anything. And what I also think is very important with creating a Pull Request, it shows to the rest of your teammembers what you both have worked on. Teammember x knows that your are working on functionality y and is just be informed (I like knowing on a high level what my teammembers are working on, we are a team right?) or maybe (s)he has working on similar functionality that could affect him/her.

So this is the end of my 3 post of the ethics for Pull Requests. Are there any processes you missed on this page that I forgot? Please let me know in the comment and I will gladly update this page.

The ethics of Pull Requests, being the “Reviewer”

This is the second of a 3 part blogpost series about Pull Request reviewing. I am writing this so I can give you my personal view on each side of the Pull Request saga, something I explain to every member joining my team. It contains about the following 3 parts:

It is annoying right, someone has created some code and you need to do something with it. Can (s)he do it him/herself, now it cost my (precious) time that I can not spend on my work? We do this not to keep you from working, but we do this to either keep the current quality for the code and/or to improve it when you add new functionality (tests). The author of the Pull Request would like to get feedback on his or her work and you were one of the choosen!

But lets not go into details on that, let us focus on what to actually do with a Pull Request and how we should act on it. With a Pull Request, you can either be the Author (Creating a Pull Request, because you have written code that you wanted to be merged) or a reviewer (You are the extra set of eyes to take a look at it). This blogpost is about being the Reviewer.

Don’t make it personal

We start with the most important one: don’t make or take things personal towards others. We are all people doing the best we can, (most) probably working for the same company and thus having the same goal: Doing awesome work for an awesome software project for an awesome goal/service. So don’t make any comments to others like “You are doing this wrong.”, “You are stupid, it is like …” or “Just approve it dumbsh*t” etc. If the author is not understanding the comment you make, spend some extra time to help him/her by explaining it so (s)he will understand it. If you are working for the same company, have a (zoom)call or drop by, or have a chat personally to explain it so everyone is on the “same level”. You should never start or be part of a flaming war, this will help nobody and will only cause severe atmosphere issues in your team/community.

Understand what the Pull Request is about

So you have received an email that you were added as a reviewer, or someone have send you a link to a Pull Request, the first thing you should do is understand why this Pull Request exist and what it will solve. Check the userstory/issue in the ticket system to understand what is needed so you know how to proceed with the reviewing. If the userstory/issue only includes something like that the documentation needs to be updated, then it won’t make sense to make an comment about “the lack of tests”, but if it is about “Implement functionality x” then you will probably know that you should expect something like documentation and (integration) tests next to the code.

Understand the change

Now we know why this Pull Request exist and everything is clear for us, so we can actually review the Pull Request. With each file that is part of the Pull Request, try to understand what has been changed. If for example the documentation is updated, make sure that the documentation makes sense. Is it clear what the author is saying, do you think the targetted audience understands what is documented etc. The same is with tests, are these newly added tests useful (testing the correct “thing”) or are they added just to “satisfy” someones needs to add a lof of tests?

If things are unclear, just ask the author to clarify them.

Is it complete

Is the Pull Request complete? Is it included with the proper documentation? Does it contain tests and if tests are added make these new tests sense? If there are no tests added when you think there should be tests, just ask for it “Thank you for making the Pull Request. I do see there is new functionality added which I very like but I don’t see any tests to validate this. Can you add them?” Does it contain proper logging? When logging has been added, is the amount enough, or does it make sense. Is it something that you can use for monitoring purposes, or does it needs to be monitored?

Commenting

Now you have found something that you think that needs to be changed. What is the best way to do that? Well firstly, do not make it personally! I do hope that that was also your first suggestion ;-). And this is probably the most important and thoughest part

When you are reviewer of a public available repository, then it is a bit differently then when you are commenting on Pull Requests for coworkers. First of all, thank the author for being kind to making time to create the Pull Request. (s)he did not have to create a Pull Request ((s)he could also just move on) but did spend time on it to actual create one, so to start with a “thank you” is the very least you can do.

In one way, be direct on what needs to be changed, but also provide some information on the why it should be changed. It will show the author the reasoning on why it needs to be changed and (s)he can learn from it.

Focus on what matters

I know it might sound silly, but focus on the things that matters. Do you really want to comment on small typo in a comment in a script if it is totally clear what was meant with it? Do you really want to comment on a 2nd or 3rd empty line or anything that is related to style or view of the code? Instead focus on the actual code, does it work as you think it does. Does the tests are sufficient or do you think it can be improved. Is the documentation clear enough so the audience of the documentation understands it.

Decline is always an option

Most people don’t like it when someone declines their Pull Request. Probably because the author spends a lot of time to implement something and then someone just “decides” to decline it. But that is not the case. And in a way, you have a lot of power about this Pull Request. So don’t let it go to your head and “order” changes or decline it to show your power. I say to everyone and even when people that join my team that I follow the following rules to when declining a Pull Request (And they should too):

  1. When the Pull Request does not make any sense at all related to the userstore/issue. The userstory says “a” and for some reason, the author has implemented “z”.
  2. When it would be merged, there would be a possible (security) issue when it is deployed to any environment. Like when the build is running and is deployed to an environment like dev or test, that certain functionality stopped working, or that an api endpoint is openly available when there should be some form of authentication in front of it.
  3. When there is no activity and progress with the Pull Request and is already open for a while. Sometimes on for example on Github, someone creates a Pull Request and then totally “forgets” it. So when changes are requested and there is no activity for the last x time, I will decline it (Where “x” time could be 4, 5 or more months).

But when you are an reviewer on a public available repository then it might just be possible that the Author has created new functionality that you maybe don’t want to have merged at all. As an example, I have several Ansible Roles that only works for the major Linux Operating systems. But what if someone creates a Pull Request with changes that it also works on a Windows host? I don’t work with Windows, let alone that I have a Windows host available to test future changes, so should I merge that Pull Request? If I merge that Pull Request, I am also responsible for maintaining it, otherwise I can not keep the quality of my code to a specific standard.

But most importantly, when declining the Pull Request always provide a proper and good reason. An “This sucks” description is not correct, but I don’t have to tell you that. But when you decline the Pull Request, have an open discussion with the author so that you are both on the same line.

Summary

So if you properly are doing reviews it might take a while to go thru them, but better spend time now before it is merged than after. As a reviewer you have a lot of control if the Pull Request will get merged or not, so don’t act like that. Work together with the Author in a constructive way to get the Pull Request merged if there are any improvements that needs to be applied.

If you have any other ethics that you have missed on this page, please let me know and I can update this page.

With the next post, we will dive in some processes to hopefully makes this whole Pull Request process a bit smoother.

The ethics of Pull Requests, being the “Author”

This is the first of a 3 part blogpost series about Pull Requests. I am writing this so I can give you my personal view on each side of the Pull Request saga, something that I will explain to every member joining my team. It contains about the following 3 parts:

It is annoying right, not commiting on the master|main branch and that you should work on a branch, create a Pull Request that people should review before it can merged into the master|main branch. But it all has a good reason, but this process just takes a little bit more time to get your code merged into the master|main branch. We do this not because it is cool or a fun thing to do, but we do this to either keep the current quality for the code and/or to improve it when you add new functionality (tests).

But lets not go into details on that, let us focus on what to actually do with a Pull Request and how we should act on it. With a Pull Request, you can either be the Author (Creating a Pull Request, because you have written code that you wanted to be merged) or a reviewer (You are the extra set of eyes to take a look at it). This blogpost is about being the Author.

Don’t make it personal

We start with the most important one: don’t make or take things personal towards others. We are all people doing the best we can, (most) probably working for the same company and thus having the same goal: Doing awesome work for an awesome software project for an awesome goal/service. So don’t make any comments to others like “This is way over your head”, “You are stupid, it is like …” or “Just approve it dumbsh*t” etc. I know, I exaggerate a bit but you understand what I mean right? If a reviewer is not understanding the pr/change/etc, spend some extra time to help him/her by explaining it so (s)he will understand it. If you are working for the same company, have a (zoom)call or drop by, or have a chat personally to explain it so everyone is on the “same level”.

You should never start or be part of a flaming war, this will help nobody and will only cause severe atmosphere issues in your team/community with either people leaving or just stopping to work at the project.

Don’t fall in love with your code

You probably did all the best you can do on this piece of code you have written. You might think it is your best work, which could be, but keep in mind that a 1000 roads leads to Rome. This is also the case with code. You implemented 1 way of doing something, while there might be different ways to “get it done”. Just because you think it is your best code “evar”, it does not mean that it is the actual best way and can not be improved at all. You are most probably wrong. You wrote the code with the best intention, based on your knowledge and experience you have on the given subject. A reviewer might have more or different experience and knowledge about the subject and is able to give new insights for your code to improve it. Be open to these suggestions that the reviewers are giving and where needed, have a direct contact/chat with that person to explain/discuss it. When you are open to these comments, you are able to learn from it and will help your (future) coding practices.

This also applies when someone declines your Pull Request. Don’t go mad or insult the reviewer. (s)he is doing just his/her job and lets be honest, you asked for their input! But when the Pull Request is declined, have a chat/talk with the reviewer and discuss it properly. Try to understand the reasoning and try to resolve it correctly. If after the discussion the conclusion was that the reviewer was right, no worries and try to apply the reviewer comments and try again. If after the discussion the reviewer was wrong, no worries again and recreate the Pull Request. But when you have recreated the Pull Request, add a small comment with a (short) summary about the earlier decline so it is clear for everyone what the result is.

Reviewer is not always right

Just because a reviewer is making a comment about something, doesn’t mean (s)he is automatically right. (s)he does this with the information (s)he has at that moment and makes a comment about it. (s)he does not know anything about the past while you were developing the functionality that have lead to what you have right now. And maybe your Pull Request was the nth Pull Request that the reviewer was reviewing, so maybe (s)he was mixing things/knowledge with other Pull Requests? But in any case, talk/chat directly with the reviewer to sort it out and when you concluded things together, update the Pull Request with a comment with this information so everyone knows about it.

Do make sure the why

And this is something you should start with, create a good and understanding description of what the purpose of the Pull Request is. Provide for example a link to the userstore/issue with some information. When you have Jira and Bitbucket for example, you can create a branch in a git repository that is automatically linked to the userstore/issue so even from the name of the branch it is automatically clear on which issue you are working on.

When you are working on a public available repository then you only have access to the “issues” part of f.e. Github/Gitlab. Check if there is something like a contribution section in the readme or maybe there is a specific document for it. Most of them explains what is needed to be done to be a contributor to the repository, like creating an issue with a proper description and with reproducing steps (In case of an issue). Make sure that it is all clear for everyone and then create a Pull Request and make sure to link it to your earlier created issue, or the issue that you picked up to provide a fix.

So really make sure before any reviewer starts doing reviewing that for everyone it is clear why this Pull Request is created and what it does solve.

Do make your own comments

There is nothing wrong with making comments on your own Pull Request. And with this, I mean that you can make a comment on why you have taken this approach or maybe add some background information that would help the reviewer to properly evaluate the Pull Request. While you were developing or working on the change, it might be that you have created several different attempts to “get it done” before you went to the solution you have right now. With making a comment as an Author on your own Pull Request, you can describe these tasks and it helps the reviewer understanding why you took “this” approach so they won’t have to ask you.

Summary

There is also maybe the downside of doing Pull Requests, you can create 1 small Pull Request that took many hours or maybe days to complete, because you have tried different things which didn’t work. A reviewer will not see this, because (s)he only sees the end result and not the things that have lead to this Pull Request.

And that is why I suggest to focus on the earlier mentioned ethics, make sure that the reviewer knows why the Pull Reqiest is created, be open for suggestions from the comments from reviewers and comment on your own Pull Requests on the areas you will know that either helps the reviewer or you will know for sure someone will comment on it. This will make the whole Pull Request procedure a lot easer for everyone.

One of the most important aspects of working with Pull Requests is proper communication between the author and the reviewer(s). Be open to each other and respect each other opinions and keep in mind that the focus is on keeping the quality and not someones ego. Don’t use the comment section as a chat application but just contact the reviewer directly if there seems be a discussion going on. And again, don’t make things personal. It helps nobody if you make things personal and it will only work in the opposite way.

What do you think what are the important ethics for the author that I missed. Please add them in the comments below and I will update this blogpost.

Signing Docker images with Notary server

This is the second of 2 blogs that we will do something with Docker and Security. In the first blogpost (Click), we will start Clair and use a tool called clair-scanner to scan Docker images that are on your host. In the 2nd blogpost (This one) we will start a Registry and Notary Server|Signer to sign Docker images. Notary allows use to sign images and we can configure the Docker daemon to only start containers from signed images.

For both blogposts, we will be using a sample configuration from the following Github repository: https://github.com/dj-wasabi/clair-and-docker-notary-example

This repository contains a docker-compose.yml file and some necessary files that are needed to run the applications. The docker-compose file contains the following applications:

  • Docker Registry v2
  • Notary (Server)
  • Notary (Signer)
  • Notary (DB)
  • Clair (DB)
  • Clair
  • Clair-scanner

The rest of the files are configuration files specific to these applications and I provided some self-signed certificates. These SSL certificates can only used for demo purposes.

Before we continue, lets do a clone of the Github repository and make sure you have Docker and docker-compose installed and running.

Why do we want to make use of a Notary server? Once you have Docker running, you are able to download all kinds of Docker images and run them. Some of them are the official ones, like Debian, CentOS and/or Hashicorp’s Consul, but you can also download and run Docker images from some one else. But you don’t know for sure what is installed and running in an image when you download one. With the previous blog-post we used Clair which can help to find if there are vulnerabilities in an Docker image, but you don’t know if an image is tampered with.

Notary will not fix this problem, it doesn’t scan the image to see if it is tampered with, but with Notary we will be able to sign our own Docker images. When a specific environment variable is set, we can only use these signed Docker images to run on our host(s). If we do want to download a Docker image from for example Docker hub, it will provide an error message.

We need to prepare some things before we can start the containers. First we need to make sure we add some entries in the hosts file, so we can resolve 2 FQDN’s which are used for the Registry Server and for the Notary Server.

Add following to hosts file:

127.0.0.1     notary-server.example.local registry-server.example.local

Once we have done that, we have to copy a configuration file and the ca-root certificate to a directory in our home-dir.

mkdir -p ~/.notary && cp files/config/config.json files/certs/ca-root.crt ~/.notary

Now we are done preparing and we can start de containers. We start the Registry server and the Notary server. When starting the Notary server, we will also automatically start the Notary signer and the database. So don’t be confused when you see extra containers running.

docker-compose up -d notary-server registry-server

You can verify if Notary server is correctly started by executing the following command:

openssl s_client -connect notary-server.example.local:4443 -CAfile files/certs/ca-root.crt -no_ssl3 -no_ssl2

This will return some information about the SSL certificate that is configured for the Notary server. Example:

$ openssl s_client -connect notary-server.example.local:4443 -CAfile files/certs/ca-root.crt -no_ssl3 -no_ssl2
CONNECTED(00000005)
depth=1 C = EU, ST = Example, L = Example, O = Example, OU = Example, CN = ca.example.local, emailAddress = root@ca.example.local
verify return:1
depth=0 C = EU, ST = Example, O = Example, CN = notary-server.example.local
verify return:1
---
Certificate chain
 0 s:/C=EU/ST=Example/O=Example/CN=notary-server.example.local
   i:/C=EU/ST=Example/L=Example/O=Example/OU=Example/CN=ca.example.local/emailAddress=root@ca.example.local

So lets pull an image and retag it so we can push it later on to our newly started Registry server. Lets make sure the image does have a tag like latest or 1.2.1.

docker pull wdijkerman/clair-scanner
docker tag wdijkerman/clair-scanner registry.example.local:5000/wdijkerman/clair-scanner:latest

But don’t push it yet, we first have to make sure that we have set some environment variables before doing so.

export DOCKER_CONTENT_TRUST_SERVER=https://notary-server.example.local:4443
export DOCKER_CONTENT_TRUST=1

These 2 environment variables mean that we enable Docker Content Trust, so when we wants to do something with the image it will be checked with Notary server which is available on the provided URL.

The Registry server is configured with basic authentication, so we have to login first:

docker login registry.example.local:5000

Username: admin
Password: password

Now we are ready and we can now push our newly tagged image to the Docker Registry:

$ docker push registry.example.local:5000/wdijkerman/clair-scanner:latest
The push refers to repository [registry.example.local:5000/wdijkerman/clair-scanner]
4737f34f33f3: Pushed
5ff3301a32f4: Pushed
7bff100f35cb: Pushed
latest: digest: sha256:2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32 size: 946
Signing and pushing trust metadata
You are about to create a new root signing key passphrase. This passphrase
will be used to protect the most sensitive key in your signing system. Please
choose a long, complex passphrase and be careful to keep the password and the
key file itself secure and backed up. It is highly recommended that you use a
password manager to generate the passphrase and keep it safe. There will be no
way to recover this key. You can find the key in your config directory.
Enter passphrase for new root key with ID 98a3a53:
Repeat passphrase for new root key with ID 98a3a53:
Enter passphrase for new repository key with ID 3dd4fb6:
Repeat passphrase for new repository key with ID 3dd4fb6:
Finished initializing "registry.example.local:5000/wdijkerman/clair-scanner"
Successfully signed registry.example.local:5000/wdijkerman/clair-scanner:latest

Because this is the first time we have pushed an image, it asks us to enter a passphrase for the root key and for the repository. Generate a passphrase and enter these with the push command.

Ok, so now the Docker image is pushed in our Registry server and it is signed by the Notary server. We can verify this by executing the next command:

 $ notary -s ${DOCKER_CONTENT_TRUST_SERVER} -d ~/.docker/trust list registry.example.local:5000/wdijkerman/clair-scanner
NAME      DIGEST                                                              SIZE (BYTES)    ROLE
----      ------                                                              ------------    ----
latest    2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32    946             targets

Here you see that we have a single image pushed to the Notary server. The value in the DIGEST is the same as the Docker image ID.

Before we continue, in the home directory we have a hidden “.docker” directory. In one of the sub directories the keys that where generated with the first push a stored here. These are important, so make sure to backup these files. There is also a possibility to set some environment variables with the passphrase so you won’t have to backup these files, but couldn’t find them yet.

$ ls -l ~/.docker/trust/private/
total 16
-rw-------  1 wdijkerman  staff  477 Feb 23 20:00 3dd4fb64fbd1524884b02fefde0771d0708082c70201511f15580b42244f37cf.key
-rw-------  1 wdijkerman  staff  416 Feb 23 20:00 98a3a53715f98652478ab6cf0c58f56a720956cc405292a72fb7a97fb0fb4618.key

So lets remove the 2 images from the host so we can pull them later again.

docker image rm registry.example.local:5000/wdijkerman/clair-scanner:latest
docker image rm wdijkerman/clair-scanner

And now we will do 2 pulls, 1 from our Registry server to verify that it just works. After this, we download an image from Docker hub.

$ docker pull registry.example.local:5000/wdijkerman/clair-scanner:latest
Pull (1 of 1): registry.example.local:5000/wdijkerman/clair-scanner:latest@sha256:2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32
sha256:2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32: Pulling from wdijkerman/clair-scanner
cd784148e348: Pull complete
8297cc41e539: Pull complete
ef2f20c2497d: Pull complete
Digest: sha256:2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32
Status: Downloaded newer image for registry.example.local:5000/wdijkerman/clair-scanner@sha256:2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32
Tagging registry.example.local:5000/wdijkerman/clair-scanner@sha256:2f876d115399b206181e8f185767f9d86a982780785f13eb62f982c958151a32 as registry.example.local:5000/wdijkerman/clair-scanner:latest

And now we download an image from the Docker hub:

$ docker pull wdijkerman/clair-scanner:latest
Error: error contacting notary server: x509: certificate signed by unknown authority

Where you just as me happy that it failed? 🙂

As you see, the pull failed from the Docker hub, as the image is not registered with Notary server so there is now no chance of running an container from any other source than our own Registry server. Mission accomplished.

Links

Docker Content Trust

Docker Notary Server

Docker Registry Server

Scanning Docker images with CoreOS Clair

This is the first of 2 blogs that we will do something with Docker and Security. In the first blogpost (This one), we will start Clair and use a tool called clair-scanner to scan Docker images that are on your host. In the 2nd blogpost we will start a Registry and Notary Server|Signer to sign Docker images. Notary allows use to sign images and we can configure the Docker daemon to only start containers from signed images.

For both blogposts, we will be using a sample configuration from the following Github repository: https://github.com/dj-wasabi/clair-and-docker-notary-example

This repository contains a docker-compose.yml file and some necessary files that are needed to run the applications. The docker-compose file contains the following applications:

  • Docker Registry v2
  • Notary (Server)
  • Notary (Signer)
  • Notary (DB)
  • Clair (DB)
  • Clair
  • Clair-scanner

The rest of the files are configuration files specific to these applications and I provided some self-signed certificates. These SSL certificates can only used for demo purposes.

Before we continue, lets do a clone of the Github repository and make sure you have Docker and docker-compose installed and running.

Clair

Clair is an open source project for the static analysis of vulnerabilities in application containers (currently including appc and docker). Clair will analyze a layer to see if it finds any vulnerabilities. If vulnerabilities are found, Clair will provide information about the vulnerability. To let Clair scan these layers, we use a tool called “clair-scanner“. clair-scanner will get all layers from an Docker image on your host and provide these to Clair by uploading them 1-by-1. Once all layers have been scanned, the clair-scanner will provide the vulnerabilities (if there are any).

Lets start Clair by executing the following command:

docker-compose up -d clair

It will start a PostgreSQL container and the Clair container itself. Once Clair is started, it will fetch the vulnerabilities for the various operating systems that is configured in the file: files/config/clair-config.yaml in earlier mentioned repository. This might take a while (In my case it was 15 minutes).

The following is configured in earlier mentioned configuration file:

  updater:
    interval: 1m
    enabledupdaters:
      - debian
      - ubuntu
      - rhel
      - oracle
      - alpine
      - suse

As you see, Clair will download vulnerabilities information from the above mentioned operating systems.

Occasionally check the logfile of clair (docker logs -f clair) and see if you find the following log messages appear:

{"Event":"could not get NVD data feed hash","Level":"warning","Location":"nvd.go:137","Time":"2019-01-26 20:19:59.682956","data feed name":"2018","error":"invalid .meta file format"}
{"Event":"could not get NVD data feed hash","Level":"warning","Location":"nvd.go:137","Time":"2019-01-26 20:19:59.682956","data feed name":"2019","error":"invalid .meta file format"}

You’ll see them from year 2002 to 2019. Once these messages are logged, we are able to continue with scanning (a) Docker image(s).

Some basic Clair information

Some information about Clair while we are waiting.

When you want to scan an image, Clair expects to analyze each layer of a Docker image. This kind data needs to be POST’ed to the following endpoint: http://localhost:6060/v1/layers

Example of a POST data request (Can also be found on: https://coreos.com/clair/docs/latest/api_v1.html#layers)

{
  "Layer": {
    "Name": "523ef1d23f222195488575f52a39c729c76a8c5630c9a194139cb246fb212da6",
    "Path": "https://mystorage.com/layers/523ef1d23f222195488575f52a39c729c76a8c5630c9a194139cb246fb212da6/layer.tar",
    "Headers": {
      "Authorization": "Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.EkN-DOsnsuRjRO6BxXemmJDm3HbxrbRzXglbN2S4sOkopdU4IsDxTI8jO19W_A4K8ZPJijNLis4EZsHeY559a4DFOd50_OqgHGuERTqYZyuhtF39yxJPAjUESwxk2J5k_4zM3O-vtd1Ghyo4IbqKKSy6J9mTniYJPenn5-HIirE"
    },
    "ParentName": "140f9bdfeb9784cf8730e9dab5dd12fbd704151cf555ac8cae650451794e5ac2",
    "Format": "Docker"
  }
}

The most important keys are the Name, Path and the ParentName.

When an image has 3 layers (Layer A, is the base image lets say debian:latest, B is installing a package and C is adding a file), Clair expects first a POST request to the /v1/layers endpoint with the layer information of layer A. The Name should have the SHA256 value of layer “A”, and the Path should contain an URL on which the layer can be downloaded. Clair will download the layer and run the actual analysis.

When layer A is analysed, layer B should be uploaded to the endpoint. But now the ParentName should contain the layer SHA256 value of layer A. When layer C is analysed, the ParentName should contain the SHA256 value of Layer B and etc.

Yes, you read that correctly. Clair will download the layer from for example a Docker Registry. This means that an image should for example already been pushed to a Docker Registry. But this is a bit to late, as you should only push an image to a Docker Registry if it doesn’t contain any vulnerabilities. Here comes the clair-scanner tool into play. This tool will basically start a web server where Clair can download the layers from when analysing.

Once Clair is completely started we can continue. Lets download an Docker image:

docker pull wdijkerman/consul

(You can of course also download an other Docker image)

This is a very basic Alpine image running Consul and Python. So lets check that Docker image.

 $ docker-compose run --rm clair-scanner wdijkerman/consul
2019/01/26 19:43:52 [INFO]  Start clair-scanner
2019/01/26 19:43:55 [INFO]  Server listening on port 9279
2019/01/26 19:43:55 [INFO]  Analyzing 5491dce778832e33c284cd8185100e76d6daa18f8cbc32458c706776894127fc
2019/01/26 19:43:55 [INFO]  Analyzing 28a9cc8dcad2060c54ae345db266ad00e4d84b1f7526e5186f93844eb3bb426e
2019/01/26 19:43:56 [INFO]  Analyzing 746b97d6fd172bacbe51699e383b5a47ceb3d779c3580b9dd35dfb7bd4a72a83
2019/01/26 19:43:56 [INFO]  Analyzing 1e04d30b4435c531eafe3d3b17155f3f3f4a9b4874ca1f1d3115ad273db43d1e
2019/01/26 19:43:56 [INFO]  Analyzing c3453aa5ff961a1d1710c2f110a788d796a5456241c664489e65fd269f0e1687
2019/01/26 19:43:56 [INFO]  Image [wdijkerman/consul] contains NO unapproved vulnerabilities

It shows us that there are 5 layers in this image and no vulnerabilities where detected! (This was during time of writing this blogpost, it can always be the case when new vulnerabilities are found!)

This blogpost is not really successful if we only show things that are ok, so lets check an image that contains 1 or more vulnerabilities.

Lets check the postgres:latest image (It is part of the Clair installation if you where wondering why you have downloaded the image). So lets check that one.

 $ docker-compose run --rm clair-scanner postgres:latest
2019/01/26 19:25:46 [INFO]  Start clair-scanner
2019/01/26 19:25:54 [INFO]  Server listening on port 9279
2019/01/26 19:25:54 [INFO]  Analyzing 08bf86d6624450c487db18071224c88003d970848fb8c5b2b07df27e3f6869b2
2019/01/26 19:25:54 [INFO]  Analyzing f419c5f6b63090e31755da12d65829dfd90ac42b90c70a725fb5dc7856395fc7
2019/01/26 19:25:54 [INFO]  Analyzing 906fb3014e147615f2219607d99604bdc53d0a6cdb0f4886ebf99548df918073
2019/01/26 19:25:54 [INFO]  Analyzing 1439e9b10c58144ac2acb85fa9aab36127201d1b2550b45216a341fa32957d17
2019/01/26 19:25:54 [INFO]  Analyzing 75d637800b713ea9c0bcd3a19eed8c144598ef8477da147a50d10cd6e85d2919
2019/01/26 19:25:54 [INFO]  Analyzing d91867cee1db8a866d638ae1d66c8078abfd236cda83c7ba72a5d214c5c8c4a3
2019/01/26 19:25:54 [INFO]  Analyzing b7d5ec0a0cb0939be115288b61e074a17359f8eb283e0deab13d30b4c0a060e8
2019/01/26 19:25:54 [INFO]  Analyzing 130e7676deae310571cbc46260a81adfc9f0de8a8684bbc33077b12c388594b7
2019/01/26 19:25:54 [INFO]  Analyzing 2bc990d0b93546a555b6abd28c365a1383f58fb64fd36142c5a9a0cbd26131e2
2019/01/26 19:25:54 [INFO]  Analyzing f4dfa6837911fd604bedc0d96126c12b2209a87421a7a0f56b0781d507b0aca8
2019/01/26 19:25:54 [INFO]  Analyzing a9598ee0b475f1cafbe8f63d6c7243ca37da704b9496e2d08c164238e8d0be3c
2019/01/26 19:25:54 [INFO]  Analyzing 1622bb5b03dc3ce4a5af7f2f89c443ce749b54be0703b92d7822f8789cf79281
2019/01/26 19:25:54 [INFO]  Analyzing fb4daa3b039b8e9889bc9f3c675c4811c85d34746c8862c673fe3da1998ae08b
2019/01/26 19:25:54 [INFO]  Analyzing 0b6857f87b6965b43ed41cc7a54591b3697b6049603cbd1b760030045915e3de
2019/01/26 19:25:54 [WARN]  Image [postgres:latest] contains 86 total vulnerabilities
2019/01/26 19:25:54 [ERRO]  Image [postgres:latest] contains 86 unapproved vulnerabilities
+------------+-----------------------------+--------------+------------------------+--------------------------------------------------------------+
| STATUS     | CVE SEVERITY                | PACKAGE NAME | PACKAGE VERSION        | CVE DESCRIPTION                                              |
+------------+-----------------------------+--------------+------------------------+--------------------------------------------------------------+
| Unapproved | High CVE-2017-16997         | glibc        | 2.24-11+deb9u3         | elf/dl-load.c in the GNU C Library (aka glibc or libc6)      |
|            |                             |              |                        | 2.19 through 2.26 mishandles RPATH and RUNPATH containing    |
|            |                             |              |                        | $ORIGIN for a privileged (setuid or AT_SECURE) program,      |
|            |                             |              |                        | which allows local users to gain privileges via a Trojan     |
|            |                             |              |                        | horse library in the current working directory, related      |
|            |                             |              |                        | to the fillin_rpath and decompose_rpath functions.           |
|            |                             |              |                        | This is associated with misinterpretion of an empty          |
|            |                             |              |                        | RPATH/RUNPATH token as the "./" directory. NOTE: this        |
|            |                             |              |                        | configuration of RPATH/RUNPATH for a privileged program      |
|            |                             |              |                        | is apparently very uncommon; most likely, no such            |
|            |                             |              |                        | program is shipped with any common Linux distribution.       |
|            |                             |              |                        | https://security-tracker.debian.org/tracker/CVE-2017-16997   |
+------------+-----------------------------+--------------+------------------------+--------------------------------------------------------------+
| Unapproved | High CVE-2017-12424         | shadow       | 1:4.4-4.1              | In shadow before 4.5, the newusers tool could be             |
|            |                             |              |                        | made to manipulate internal data structures in ways          |
|            |                             |              |                        | unintended by the authors. Malformed input may lead          |
|            |                             |              |                        | to crashes (with a buffer overflow or other memory           |
|            |                             |              |                        | corruption) or other unspecified behaviors. This             |
|            |                             |              |                        | crosses a privilege boundary in, for example, certain        |
|            |                             |              |                        | web-hosting environments in which a Control Panel allows     |
|            |                             |              |                        | an unprivileged user account to create subaccounts.          |
|            |                             |              |                        | https://security-tracker.debian.org/tracker/CVE-2017-12424   |
+------------+-----------------------------+--------------+------------------------+--------------------------------------------------------------+

Oops. 86 vulnerabilities! Well, this Docker image might not be really safe to use but in the end, that is all up 2 you.

As you might see in the postgres example, there is a column “STATUS” in the output and all of them are “Unapproved“. Why is that? clair-scanner allows you to whitelist specific vulnerabilities when scanning images.

The clair-scanner tool has an exit code of 0 when no vulnerabilities are found and has an exit code of !=0 when vulnerabilities are found. So if you would run the clair-scanner as part of your CI pipeline, this would fail. However, there could be a reason to whitelist a vulnerability and the clair-scanner will not provide an exit code of !=0 when whitelisted vulnerabilities are found.

Example of a whitelist file.

generalwhitelist: #Approve CVE for any image
  CVE-2017-6055: XML
  CVE-2017-5586: OpenText
images:
  ubuntu: #Apprive CVE only for ubuntu image, regardles of the version
    CVE-2017-5230: Java
    CVE-2017-5230: XSX
  alpine:
    CVE-2017-3261: SE

So we have 2 CVE vulnerabilities that we whitelist, no matter what base Docker OS image is used. For Ubuntu we whitelist 2 CVE’s and 1 for Alpine. I’m not sure, but I would say the XML, OpenText is just a basic description for what package the CVE belongs to.

Summary

So with this blogpost we where able to start Clair and do some Docker image analysing with the tool clair-scanner. It showed us that the postgresql image contains some vulnerabilities. So now you can update your CI pipeline by adding a check to scan for vulnerabilities, before pushing the image to a Docker Registry. Next blogpost, we will start a secure Docker Registry and we will sign Docker images with the Notary Server and Signer tool.

Links

Clair
Clair-scanner

Continuous deployment of Ansible Roles

Ansible Logo

There are a lot of articles about Ansible with continuous deployment, but these are only about using Ansible as a tool to do continuous deployment. There is not much (Well, I can’t really find none) about continuous deployment on code changes in Ansible Roles/playbooks itself. But first: Why do you want to do that?

Well, it is very easy to make changes in a role or a playbook and deploy that to a (production) machine(s). I do hope that these changes are commited into the git repository (And pushed) so that changes on the host can be tracked back to the code. Hopefully you didn’t make any errors in the playbook or role so all will be fine during deployment and no unwanted downtime is caused, because nothing is tested.

When you are part of a team, this would be a downside of using Ansible. It is very easy to make changes to a playbook or a role locally and not commit it to the repository, deploy it to a production server and continue like it didn’t happen. You can make agreements on these kinds of procedures on when and how to execute playbooks, but you always have that coworker that don’t (or partly) want to follow procedures or just because of lack of time (“It has to be working this morning!” or just any other lame excuse to not test your code before deployment).

When the team/serverpark grows bigger and/or the company you work for matures and even has an SLA, you can’t just deploy any untested code anymore. You’ll have to make sure that changes you made to code is tested, like any other code. Application developers write unit tests on their code and the application is tested by either automated tests or by using test|qa team. Application development is not any different than writing software for your infrastructure, it all needs to be tested before you use it on production.

What I will describe in this blogpost is just a suggestion on how to do this. This might not be foolproof or maybe there are other or better ways on how todo this or … (Fill in some something other reason). As this is something that works for me, it might help you to create your own pipeline. YMMV.

I haven’t looked at all at Ansible Tower or the open sourced version, so it might be that parts or maybe all of what I am describing here can be done by Tower.

Before we do anyting, I’ll first describe how my Ansible setup looks like so we have some background before we do anything. All of my roles has their own git repository, including documentation and Jenkinsfiles. A Jenkinsfile is the Jenkins job configuration file that contains all steps that Jenkins will execute. Its the .travis.yml (Of Travis CI) file equivalent of Jenkins and we will come back later to this. I also have 1 git repository that contains all ansible data, like host_vars, group_vars and the inventory file.

I have a Jenkins running with the Docker plugin and once a job is started, a Docker container will be started and the job will be executed from this container. Once the Job is done (Succeeded or Failed doesn’t matter which), the container and all data in this container is removed.

Jenkins Jobs

All my Ansible roles has 3 jenkinsfiles stored in the git repository for the following actions:

  1. Molecule Tests
  2. Staging deployment
  3. Production deployment

Molecule Tests

The first job is that the role is tested with Molecule. With Molecule we create 1 or more Docker containers and the role is deployed to these containers. Once that is done, we do an idempotent check and with TestInfra we verify if installation/configuration is done correctly. We can also execute some commands to verify that the deployed service is running correctly. Once these tests are completed, we can successfully deploy the ansible role without any problems. (On this page I have described some information on Molecule.)

How does the Jenkinsfile looks like:

node() {
    try {
        stage ("Get Latest Code") {
            checkout scm
            sh 'git rev-parse HEAD > .git/commit-id'
        }
        stage ("Install Application Dependencies") {
            sh 'sudo pip install --upgrade ansible==${ANSIBLE_VERSION} molecule==${MOLECULE_VERSION} docker'
        }
        stage ("Executing Molecule lint") {
            sh 'molecule lint'
        }
        stage ("Executing Molecule create") {
            sh 'molecule create'
        }
        stage ("Executing Molecule converge") {
            sh 'molecule converge'
        }
        stage ("Executing Molecule idemotence") {
            sh 'molecule idempotence'
        }
        stage ("Executing Molecule verify") {
            sh 'molecule verify'
        }
        stage('Tag git'){
            def commit_id = readFile('.git/commit-id').trim()
            withEnv(["COMMIT_ID=${commit_id}"]){
                sh '''#!/bin/bash
                if [[ $(git tag | grep ${COMMIT_ID} | wc -l) -eq 1 ]]
                    then    echo "Tag already exists"
                    else    echo "Tag will be created"
                            git config user.name "jenkins"
                            git config user.email "jenkins@localhost"
                            git tag -a $COMMIT_ID -m "Added tagging"
                            git push --tags
                fi
                '''
            }
        }
        stage('Start Staging Job') {
            def commit_id = readFile('.git/commit-id').trim()
            withEnv(["COMMIT_ID=${commit_id}"]){
                build job: 'ansible-access-2-staging', wait: false, parameters: [string(name: 'COMMIT_ID', value: "${COMMIT_ID}") ]
            }
        }
    } catch(all) {
        currentBuild.result = "FAILURE"
        throw err
    }
}

First stage of the Job is the checkout of the sourcecode of the git repository, so that we have data in the container. We get the latest git commit id, because I use this id to create a tag in git once the Molecule Tests succeeds.

First Molecule action is the lint. First we do some linting on the role and test files to make sure it is compliant. If it find some errors, it fails quickly and we can fix it. Then it proceeds with the Molecule actions create, converge, idemptence and verify. For those who are familiar with Molecule will notice that I use different stages for each action and not 1 stage which executes molecule test.

Stages overview of Jenkins job.

I use separate stages with single commands so I can quickly see on which part the job fails and focus on that immediately without going to the console output and scrolling down to see where it fails. After the Molecule verify stage, the Tag git stage is executed. This will use the latest commit id as a tag, so I know that this tag was triggered by Jenkins to run a build and was successful.

Last stage in the job is to start the 2nd job in Jenkins. This stage will start the job ansible-access-2-staging with the COMMIT_ID as parameter to the job and in the background (wait: false).

Currently, the Molecule configuration only has 1 “default” scenario. If I had more scenarios than the Jenkinsfile had probably a lot more stages or maybe more Jenkinsfiles.

Staging deployment

The first job was executed correctly and now this job is triggered. As mentioned before, the commit id of the previous job is passed into this job. The goal for this job is to deploy the role to an staging server and validate if everything is still working correctly. In this case we will execute the same tests on the staging staging as we did with Molecule, but we can also create an other test file and use that. In my case, there is only one staging server but it could also be a group of servers.

The Jenkinsfile for this job looks like this:

node() {
    try {
        stage ("Get the Code") {
            checkout scm: [$class: 'GitSCM', branches: [[name: "refs/tags/${params.COMMIT_ID}"]], extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'ansible-access']], userRemoteConfigs: [[url: 'ssh://git@192.168.1.206:2222/ansible/access.git']]]
            checkout scm: [$class: 'GitSCM', branches: [[name: '*/master']], extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'environment']], doGenerateSubmoduleConfigurations: false, userRemoteConfigs: [[url: 'ssh://git@192.168.1.206:2222/ansible/environment.git']]]
            sh 'pwd > workspace'
        }
        stage ("Install Application Dependencies") {
            sh 'sudo pip install --upgrade ansible==${ANSIBLE_VERSION} testinfra docker'
        }
        stage ("Execute role on host(s)") {
            dir("environment") {
                sh "ansible-playbook -i hosts -l staging playbooks/ansible-access.yml"
            }
        }
        stage ("Test Role execution") {
            workspace = readFile('workspace').trim()
            withEnv(["WORKSPACE_DIR=${workspace}", "MOLECULE_INVENTORY_FILE=${workspace}/environment/hosts"]){
                dir("environment/") {
                    sh "testinfra --connection=ansible --ansible-inventory=hosts --hosts=staging ${WORKSPACE_DIR}/ansible-access/molecule/default/tests/test_default.py --verbose"
                }
            }
        }
        stage('Tag git'){
            withEnv(["COMMIT_ID=${params.COMMIT_ID}_staging"]){
                dir("ansible-access/") {
                    sh '''#!/bin/bash
                    if [[ $(git tag | grep "${COMMIT_ID}" | wc -l) -eq 1 ]]
                        then    echo "Tag already exists"
                        else    echo "Tag will be created"
                                git config user.name "jenkins"
                                git config user.email "jenkins@localhost"
                                git tag -a $COMMIT_ID -m "Added tagging"
                                git push --tags
                    fi
                    '''
                }
            }
        }
        stage('Start Production Job') {
            build job: 'ansible-access-3-production', wait: false, parameters: [string(name: 'COMMIT_ID', value: "${params.COMMIT_ID}") ]
        }
    } catch(all) {
        currentBuild.result = "FAILURE"
        throw err
    }
}

The first stage is to checkout 2 git repositories: The Ansible Role and the 2nd is my “environment” repository that contains all Ansible data and both are stored in their own sub directory. With the Ansible role we checkout the provided tag refs/tags/${params.COMMIT_ID}. I also had to configure the url for the git repositories. Last step is to create a file that holds the output of the pwd file. We need this location in a later stage.

The 2nd Stage is to install the required applications, so not very interesting. The 3rd stage is to execute the playbook. In my “environment” repository (That holds all Ansible data) there is a playbooks directory and in that directory contains the playbooks for the roles. For deploying the ansible-access role, a playbook named ansible-access.yml is present and will be use to install the role on the host:

---
- hosts: all:!localhost
  become: True
  roles:
    - role: ansible-access

Very basic/simple. The 4th stage is to execute the Testinfra test script from the molecule directory to the staging server to verify the correct installation/configuration. In this case I used the same tests as Molecule, but I could also create a seperate file with some extra or other tests to verify the correct behaviour of the host.

And when all tests are complete, we create a new tag. In this job we create a new tag ${params.COMMIT_ID}_staging and push it so we know that the provided tag is deployed to our staging server.

With the last stage, we start the 3rd and last job, the job to deploy the role on the rest of the servers.

Production deployment

This is the job that deploys the Ansible role to the rest of the servers. This Jenkinsfile looks almost the same as the previous one, but with a few exceptions.

node() {
    try {
        stage ("Get the Code") {
            checkout scm: [$class: 'GitSCM', branches: [[name: "refs/tags/${params.COMMIT_ID}"]], extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'ansible-access']], userRemoteConfigs: [[url: 'ssh://git@192.168.1.206:2222/ansible/access.git']]]
            checkout scm: [$class: 'GitSCM', branches: [[name: '*/master']], extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'environment']], doGenerateSubmoduleConfigurations: false, userRemoteConfigs: [[url: 'ssh://git@192.168.1.206:2222/ansible/environment.git']]]
            sh 'pwd > workspace'
        }
        stage ("Install Application Dependencies") {
            sh 'sudo pip install --upgrade ansible==${ANSIBLE_VERSION} testinfra docker'
        }
        stage ("Execute role on host(s)") {
            dir("environment") {
                sh "ansible-playbook -i hosts -l 'all:!localhost:!staging' playbooks/ansible-access.yml"
            }
        }
        stage ("Test Role execution") {
            workspace = readFile('workspace').trim()
            withEnv(["WORKSPACE_DIR=${workspace}", "MOLECULE_INVENTORY_FILE=${workspace}/environment/hosts"]){
                dir("environment") {
                    sh "testinfra --connection=ansible --ansible-inventory=hosts --hosts='all:!localhost:!staging' ${WORKSPACE_DIR}/ansible-access/molecule/default/tests/test_default.py --verbose"
                }
            }
        }
        stage('Tag git'){
            withEnv(["COMMIT_ID=${params.COMMIT_ID}_production"]){
                dir("ansible-access") {
                    sh '''#!/bin/bash
                    if [[ $(git tag | grep "${COMMIT_ID}" | wc -l) -eq 1 ]]
                        then    echo "Tag already exists"
                        else    echo "Tag will be created"
                                git config user.name "jenkins"
                                git config user.email "jenkins@localhost"
                                git tag -a $COMMIT_ID -m "Added tagging"
                                git push --tags
                    fi
                    '''
                }
            }
        }
    } catch(all) {
        currentBuild.result = "FAILURE"
        throw err
    }
}

With the 3rd stage “Execute role on host(s)” we use an different limit. We now use all:!localhost:!staging to deploy to all hosts, but not to localhost and staging. Same is for the 4th stage, for executing the tests. As the last stage in the job, we create a tag ${params.COMMIT_ID}_production and we push it. Once we see this tag in our repository, we know that the changes is installed correctly on all servers.

Keep in mind that this can only be successful if you use proper and correct tests. You’ll really need to be sure that your tests is covering all of the components that is changed by your role. This deployment will fail or succeed with the quality of your tests.

Good luck and if you have suggestions please let me know.