Introduction to Puppet

I have written this introduction to Puppet because I needed to explain how Puppet works. I will try to update it with more information when I have time. I strongly suggest that you read the official Puppet documentation to learn about the language syntax, which is something I don’t explain at all. For people that are used to writing scripts, it should not be a problem to understand what’s going on and learn the syntax while reading.

Definitions:

  • node: a host/server on which you run Puppet
  • agent: the Puppet Agent process that runs on your nodes
  • master: the server on which you run your Puppet Master or the Puppet master process

From What is Puppet

Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to orchestration and reporting. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud.”

Puppet Configuration

puppet.conf

This file configures the Puppet executables (agent and master) and is normally not changed, once configured and running. The file is usually installed by the installation package from Puppet or provided by the Operating System.

auth.conf

TODO: Write about auth.conf

manifests/

This is the where the Puppet manifests are stored and where the primary manifest file site.pp is found.

site.pp

The site.pp manifest is the first file loaded and usually includes other files, to split things up and make it easier to read and maintain. It is not unusual to include external files where you declare the nodes or your classes. In the example below we include all *.pp files in the nodes/ and classes/ folders. The naming is not important, and you could have used any other folder names, but I usually use this convention.

# /etc/puppet/manifests/site.pp

# Ignore this one for now...
# Automatic include classes defined by Hiera 
# (must be written before any imports below)
hiera_include('classes', '')

# Import extra manifiests
import "nodes/*.pp"
import "classes/*.pp"

nodes/

In here you declare the nodes on which you run Puppet and you define what classes (eg. configuration) you would want the node to have. The nodes are matched against their hostnames, or part of it. That means that you can specify ‘server1’ or ‘server1.domain.com’ and so on. You can also use regular expressions to match hostnames, and use a common node definition for many similar hosts.

# /etc/puppet/manifests/nodes/test_nodes.pp

# If a node is not matched against any node declaration, 
# it will use this 'default'. It is not required to have 
# this 'default' node declaration, but if you don't then 
# the puppet agent on "unknown" nodes will fail to run.
node default inherits basenode {
}

# This is not a real node, but can be inherited by other nodes.
# You could use it to install a common set of classes for all 
# or many nodes.
node basenode {
  include basetools
  include sudo
  include ntp
}


# Test node that inherits basenode, and includes its own
# specific classes
node 'opensuse-122' inherits basenode {

   # include other modules specific to this node only
   include suse-cleanup
}

# Here we match the node hostname against a regular expression 
node /^apache.*/ inherits basenode {

    # Install apache for all nodes named 'apache*'
    include apache
}

classes/

From the Puppet docs …

Classes are named blocks of Puppet code, which are stored in modules for later use and are not applied until they are invoked by name. They can be added to a node’s catalog by either declaring them in your manifests or by assigning them from an ENC.

Classes generally configure large or medium-sized chunks of unctionality, such as all of the packages, config files, and services needed to run an application.

You can create a class in any .pp file that is imported from site.pp and then include it for a single node or many nodes (by simple node inheritance or with ENC as we show with Hiera later). I like to keep my classes in their own folder. Best practice is to confine configuration that belongs together (eg. apache configuration) in it’s own module, but in theory there could be exceptions where you want to create a class not in a module. In this example we make a class with functionality to un-install a software package called “kdebase”.

# manifests/classes/suse-cleanup.pp
class suse-cleanup {
  package {
    "kdebase":
      ensure => absent;
  }
}

Or we could make a class to setup some standard files and install some basic packages depending on the nodes Operating System.

# manifests/classes/basetools.pp
class basetools {

  # Define package array based on Operating System
  case $operatingsystem {
    "FreeBSD": {
      $packages = ["wget", "rsync", "subversion" ]
    }
    /(Ubuntu|Debian)/: {
      $packages = ["rsync", "subversion" ]
    }
    "OpenSuSE": {
      $packages = ["rsync", "subversion", "vim" ]
    }
    default: { notice("unsupported os") }
  }

  # Make sure the packages defined above  are installed
  package {
    $packages:
      ensure => installed;
  }

  # Example: a plain file from files/ folder in puppet gets
  # copied to the node
  file { "/etc/motd":
    mode => 644,
    owner => 'root',
    group => 'root',
    source => 'puppet:///files/motd'
  }

  # Example: a file based on a template (parsed/generated) gets
  # copied to the node 
  file { "/etc/hosts":
    mode => 644,
    owner => 'root',
    group => 'root',
    content => template("hosts.erb")
  }

  # Example: Make sure a cron job is running
  cron { "update_svn_repo":
    ensure  => present,
    command => 'cd /some/path && svn up >/dev/null 2>&1',
    user    => root,
    hour    => '*/1',
    minute  => '59',
    require => Package["subversion"],           
  }

}

files/

The files in this folder can be “copied” to the nodes with the file { } declaration in a class. Modules has their own files/ folder, but this one is global and is normally used for files not belonging to a module, or to supply you’re own custom files to modules.

From the previous example we would place the motd file in this directory.

# files/motd
Welcome to the our network.
Unauthorized access strictly prohibited.

templates/

Templates are like files, but will be parsed with Ruby’s ERB templating system. This means you can refer to variables from Puppet or do basic conditional statements within the template. All templates should end with the .erb file-extension.

From the previous example, the template used to generate /etc/hosts file would belong here.

# templates/hosts.erb
127.0.0.1   localhost
127.0.1.1   <%= @hostname %>.mydomain.com <%= @hostname %>
10.2.2.1    apache01.mydomain.com apache01

We use <%= %> to get the output of the variable (or function) inserted into the file produced by the template.

Modules

From Module Fundamentals

Modules are self-contained bundles of code and data. You can write your own modules or you can download pre-built modules from Puppet Labs’ online collection, the Puppet Forge. Nearly all Puppet manifests belong in modules. The sole exception is the main site.pp manifest, which contains site-wide and node-specific code.

Puppet modules are libraries

Much like string.h provides everything you need to manipulate strings in C, your Puppet modules should provide everything needed to manage a service out of the box. By that I mean, I want to pull down your module to enable the functionality I need in Puppet without modifying your module at all.

Example: Writing a module to install and configure Apache, the module should be made very generic, and not have any ‘Company’ specifics hard-coded. To support different kinds of vhost configurations, you could pass a $template=wordpress_vhost argument and provide your own template files (outside the module, eg. in /etc/puppet/templates/apache/ ) that will suit the needs.

A module has it’s own manifests, files and templates folders, that works exactly like the global ones that we just covered.

TODO: more details here.

Running Puppet

You have two options for running Puppet. The traditional way with a dedicated Puppet master, or without. There are pros and cons for both, but if you need to scale to a really large number of nodes, going master-less is the best option.

With Master

You have a dedicated server where you run the Puppet master. The server listens on a configured TCP port (default port is 8140), which of course must be network accessible from the nodes. To speed things up, you can run the puppet-master behind Apache with mod_passenger or similar, but this is not required. Communication between the master and an agent node is SSL encrypted, and certificates are created the first time you connect an agent to the master. It is possible to configure the puppet-master to auto-sign new agent SSL certificates, or you can do so manually on the master (puppet cert sign ‘node-name’).

Masterless Puppet

On the masterless nodes, you will have to copy/sync the Puppet files (usually /etc/puppet) to the nodes (with eg. rsync, svn, git, etc.) and execute the Puppet agent (puppet apply -v /etc/puppet/manifests/site.pp) manually or from cron.

Reporting

TODO: write about the reporting features of Puppet.

Backup / Filestore

TODO: write about the filestore, where Puppet keeps revisions/backups of changed files.

Variables

TODO: Write about Facter

Hiera

Hiera can search through all the tiers in a hierarchy and merge the result into a single array. This is used in the hiera-puppet project to replace External Node Classifiers (ENC) by creating a Hiera compatible include function.

In the recent (3.x) versions of Puppet, Hiera is included, and is now the preferred way to provide “external” configuration for nodes.

Configuration

hiera.conf

Hiera (when used through Puppet) will use the file hiera.conf in the Puppet $confdir directory. In this file you define one or more backends (yaml, json, or both in order) and configure the hierarche search paths. The important part is the hierarchy which tells hiera where and in what order to look for configuration files. The variables referenced below (eg %{hostname}) are provided by Puppet, and are the same that you would be able to use in you’re classes or templates.

:hierarchy:
  - "node/%{hostname}"
  - "virtual/%{virtual}"
  - "osfamily/%{osfamily}"
  - common      

hierarchy files

The hierarchy files sets variables or hashes which will be available for Puppet (and modules) to use. The files are merged in a way so you can have default values in the last file (usally called common) and override in more specifics files ordered above (eg. node/apache01.yaml).

In our example we setup some standard configuration variables in the common hierarchy file, but we also define an array of classes to load. If you refer to the site.pp file mentioned earlier, you will notice the line with hiera_include(‘classes’, ”). This tells Puppet to include any classes defined in hiera. If classes is not defined, we tell the command to return an empty string. This second argument is not required, but if the classes value from hiera is missing, it would result in an error. You could decide to use another name instead of classes, just make sure to use the same name in both the hiera_include() command and in the in hierarchy files.

# hiera/common.yaml
---
classes: [ 'base', 'rsyslog', 'ntp' ]
syslog_server: my-dk-syslog.domain.com
ntp_server: dk.pool.ntp.org

Example: in common above you could define a “default” syslog server (eg. at the primary facility). For the node called ‘apache01-us’ (which would match the site/apache01-us.yaml hiera-file) you could override the syslog server (eg. to use a near by syslog-server).

# hiera/node/apache01-us.yaml
----
syslog_server: my-us-syslog.domain.com
ntp_server: us.pool.ntp.org    

If you are using virtual servers on eg. VMware, the hiera %{virtual} variable would be set to vmware and you could set specific options for these kinds of nodes. In the example below we want to VMware nodes to include the ‘vmware’ class.

# hiera/virtual/vmware.yaml
---
classes: [ 'vmware' ]

Testing Hiera

To test hiera and see the outcome for a value, you can run the hiera command as shown below. This example uses our example hierarchy files shown before:

$ hiera -d -c /etc/puppet/hiera.yaml syslog_server
"my-dk-syslog.domain.com"

To see the outcome if a variable is set, you can provide it as an optional argument as shown here:

 $ hiera -d -c /etc/puppet/hiera.yaml syslog_server node=apache01-us
 "my-us-syslog.domain.com"

Let’s try with the classes array value, that we also used in the example hierarchy files:

 $ hiera -d -c /etc/puppet/hiera.yaml classes
 ["base", "rsyslog", "ntp"]

As expected we see the list of classes typed into our common file. But what happens if we “fake” a VMware server and tells hiera that the %{virtual} variable is set to vmware.

$ hiera -d -c /etc/puppet/hiera.yaml classes virtual=vmware
["vmware"]

Does this mean the a vmware server will not get the ‘base’, ‘rsyslog’ and ‘ntp’ classes then? Luckily no. The hiera_include() command returns a merged result of a array/hash value, so Puppet will see the classes value as ["base", "rsyslog", "ntp", "vmware"] and work as expected.