Managing top.sls and SLS files in SaltStack

Tags

,


Recently I inherited a 90 VM cluster which I have been converting from Puppet to SaltStack, I’ve also needed to build a few new systems to replace obsolete platforms like CENTOS 6 and bring them up to a more “modern” release of Linux. In the process of building these systems, I’ve moved away from CENTOS and over to Oracle Linux 8.6.

For each new VM I’ve been installing the Salt Minion and building them purely from Salt scripts (and the odd “run once” shell script). This has largely been successful with a few edge cases needing multiple custom scripts deployed at initial build time. As a rule, I put setup and “run once” scripts into /root and push them out with SaltStack so at least the script is managed under config management and not randomly left on a server.

What I had encountered is a lot of repetitive SLS scripts needed to initialize and maintain a system. Initially I placed each SLS script call in the /srv/salt/top.sls file for the High State, but it soon became apparent that this yielded a large file with copious amounts of repetitive lines for various related systems.

Common Scripts (roles)

Splitting out the common roles and functionality into self contained packages like “/srv/salt/roles/resolv-conf” meant that they became easy to maintain and deploy. The following list shows some of the common roles I ended up with:

  • roles.resolv-conf – /etc/resolv.conf file download
  • roles.chrony – deploy chrony package, conf file and restart the service on changes
  • roles.snmp – Install net-snmp, download the corporate config and restart the service.
  • roles.firewalld.http – add port 80 to public zone
  • roles.firewalld.https – add port 443 to firewalld public zone
  • roles.firewalld.prtg – download a PRTG specific zone
  • roles.firewalld.logstash – add the logstash port to the public zone
  • etc……

I split the firewalld roles into a common directory and this made it more modular and easy to maintain. Anything bespoke to a system like a web server running on a unique port went into the servers own directory as a standalone SLS file called via init.sls (example below).

Below is a sample of a single VM’s configuration with roles modularised, I also use it to deploy Sys Admin users who will SSH into the servers directly. System Admin Staff will have SSH keys, password hash etc and are grouped under a /srv/salt/users/admins directory structure:

base:
  'elk-dashboard.internal.my-domain.com':
    - elk.repos.elastic-8x-repo
    - elk.elk-dashboard.packages
    - elk.elk-dashboard.firewalld-webserver
    - roles.firewalld.prtg
    - roles.logrotate
    - roles.logrotate.firewalld
    - elk.elk-dashboard.directories
    - roles.resolv-conf
    - roles.snmpd
    - roles.chronyd
    - roles.log-cleanup
    - users.admins.fred
    - users.admins.barney
    - users.admins.wilma

Cleaning up top.sls

In cleaning up the top.sls file I broke out systems into various groups. There is the ELK Stack group, the general servers group (which could be broken down further), the HPC group (High Performance Computing Cluster), the various HA clusters and then the various DEV environments.

If you read through the online doco for Salt it shows using multiple environments for dev, prod, qa and base code/config deployments. I’m not going to cover this aspect in this particular blog article.

I started by creating a directory for each logical cluster of VM’s, for the Elastic Stack, I created /srv/salt/elk and then created a subdirectory for each VM in the ELK cluster. I also created a “common” directory for all the stuff that was common to the ELK VM’s. For the ELK Dashboard VM, the directory to it’s unique configuration is /srv/salt/elk/elk-dashboard/ and in that directory is now an init.sls file. When I made the first iteration of changes I referenced init.sls as shown below “elk.elk-dashboard”.

base:
  'elk-dashboard.internal.my-domain.com':
    - elk.elk-dashboard
    - elk.repos.elastic-8x-repo
    - elk.elk-dashboard.packages
    - elk.elk-dashboard.firewalld-webserver
    - roles.firewalld.prtg
    - roles.logrotate
    - roles.logrotate.firewalld
    - roles.resolv-conf
    - roles.snmpd
    - roles.prtg
    - roles.chronyd
    - servers.common.log-cleanup
    - users.admins.fred
    - users.admins.barney
    - users.admins.wilma

Step 2 – Simplifying top.sls entries

Once the init.sls is present in a custom directory for the VM, I moved all the lines below the elk.elk-dashboard into the init.sls file as shown below, and referenced them as an “include”:

#
# elk-dashboard init.sls file
# Also see: /srv/salt/top.sls to see what gets called.
#
include:
  - elk.repos.elastic-8x-repo
  - elk.elk-dashboard.packages
  - elk.elk-dashboard.firewalld-webserver
  - roles.firewalld.prtg
  - roles.logrotate
  - roles.logrotate.firewalld
  - roles.resolv-conf
  - roles.log-cleanup
  - users.admins.fred
  - users.admins.barney
  - users.admins.wilma
#
# End of File

In the process of removing the lines from top.sls, some roles are common to everything, so I added those back to top.sls and removed them from the init.sls for each system. The top.sls entry now looks like with the simplified elk-dashboard config shown as a 1 line. Ideally in the next major re-rk of the file I will have and “elk:” block and put those server under that, maybe putting them into a top.sls file in /srv/salt/elk if that works as planned:

base:
   '*':
      - roles.snmpd
      - roles.chrony
      - roles.log-cleanup

   'elk-dashboard.internal.my-domain.com':
     - elk.elk-dashboard

While this has cleaned up the top.sls its really just pushed the scripts into the /srv/salt/elk/elk-dashboard/init.sls file and that’s not a bad thing. I also pushed the three (3) common roles into the top of the top.sls file, being the SNMP, Chrony service and the generic log cleanup script I push to all Linux servers. I will add the Corporate /etc/ssh/sshd_banner file with legal disclaimers later.

On checking the resolv.conf file role, I discovered I had made two, so that’s a cleanup task for me to standardise on one config for servers and put that into the top section of top.sls.

The HPC group has a slightly different resolv file config so that’s under /srv/salt/hpc/roles so there is no pollution of the general server configs.

As I re-organize my /srv/salt layout I now have the following layout:

  • dev
  • elk
  • ha-clusters
  • hpc
  • roles
  • servers
  • top.sls
  • users

The function of each becomes obvious and the streamlining becomes easy each time I deploy a new VM. but I will clarify each environment here:

The “dev” environment is where I am testing a new idea, it’s not a “dev” environment where a “test” and “prod” environment already exist. The organization I’m contracting with tend to build three configs for a web server based system, i.e. ws-xxxxx-prod, ws-xxxxx-test and ws-xxxxx-dev. Other Organizations typically use the dns notation dev.xxxx.com for a dev site and VM reflects this naming.

Deploying Run-Once Scripts

Highstating is great for keeping the config under control but at “build” time you typically have a range of one off tasks like generating some ssh keys, creating an empty database, cat’ing a mysqldump into said DB and adding some users. You may also have to deploy a repo and import the GPG keys for it. I usually build these as a script with a Y/N readline check and put them into /root.

For example, my init.sls script might deploy the /root/generate-ssh-keys.sh script which looks like:

#
# Initial Setup for a new server - RUN ONCE from client!
#

function install()
{
        rm -f /root/.ssh/id_rsa* > /dev/null 2>&1
        ssh-keygen -b 2048 -t rsa -f /root/.ssh/id_rsa -q -N ""
}

#
# MAIN
#
echo " "
echo "Auto Generate a NEW SSH key pair. "
echo " "
while true; do
    read -p "Do you wish to generate a new SSH key for ROOT (y/n)? " yn
    case $yn in
        [Yy]* ) install; break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;
    esac
done
#
# End of File

These types of scripts can be highstated without the need to be run, ideally, you run them at build time, then run the highstate for the host which checks that services (newly setup) are now enabled and running.

End Results

The end results of my initial cleanup yield a top.sls file which is significantly smaller, puts the common roles at the top under the base directive and puts the server config in a unique directory.

Stage two of cleaning up the top.sls file will now see the top file segmented down by environment and the top.sls file moved to each environment. Not sure if that will add to the maintenance but it’s an exercise to test and play with.

-oOo-