Configuring static IP on new Ubuntu 20 builds with cloud-init

For a long time I had been building VMs on my Nutanix AHV cluster using DHCP, because unlike VMware, Nutanix didn’t offer an easy way to hard-code an IP to a VM that’s not already on the network. I considered giving it a temporary DHCP address, and then once it was networked, ssh-ing in and hard-coding that or another IP and then doing some network magic in the infrastructure to make everything work, but that was more trouble than it was really worth. Once AHV assigns an IP to a VM, it will always give it that same IP…unless you migrate it between clusters, which happens all too frequently as I have to rebuild my primary cluster every few months after some kind of minor error that the cluster couldn’t survive. After the last time I had to do a rebuild, and went back in to manually apply the correct IPs to a couple dozen DHCP-enabled VMs, I decided to figure out static IPs.

I knew that Nutanix offered the ability to run a cloud-init script when creating a VM, but I’d never been able to get it to work; partly that was RedHat/CentOS’s fault, because it doesn’t seem like they properly implemented cloud-init to allow you to use the “network:” block to set things up. I ended up having to manually “cat” the necessary information into the ifcfg files in /etc/sysconfig/network-scripts, which worked quite well.

Ubuntu 16 and 18 work pretty much the same way; just overwrite the /etc/network/interfaces file and “ifup” the virtual NIC. Ubuntu 20, however, is a real headache, because by default it no longer uses /etc/network/interfaces, instead using a tool named “netplan”. It also is a bit more of a bear when it comes to cloud-init because it doesn’t seem to get along with Nutanix very well to get the scripts in place; Nutanix provides the script as a file on a virtual CD-ROM that cloud-init mounts and gets the info from. Ubuntu 20 doesn’t mount that automatically. Also, I use packer to build the “golden image” that gets cloned when I create VMs; packer reboots the builder VM a couple times as part of generating the disk image, and cloud-init generally only runs on first boot. Getting around this is a bit of a multistep process, and I had to work backwards from the desired end-state, i.e., a VM with an active network using a static IP configuration.

The “netplan” process will read configuration files from /etc/netplan; those files generally follow the same yaml code style as cloud-init. The files are read in lexical (alphanumeric) order, so you need to have a file in there that supersedes anything else, alphabetically; by default there’s a 50-cloud-init.yaml file that’s set for DHCP and that supposedly can get regenerated automatically, so I made my file named 51-cloud-init.yaml (and I remove the 50-* one just to be sure). On the VM it has to look something like this:

#cloud-config
# Configure networking, set a static IP
network:
  version: 2
  renderer: networkd
  ethernets:
    ens3:
      addresses:
      - 10.22.48.8/24
      gateway4: 10.22.48.1
      nameservers:
        addresses:
        - 4.4.4.4
        - 8.8.8.8

The trick, obviously, is that the IP needs to be dynamically generated at build time. Because I use Ansible (and the nifty new Nutanix Ansible collection) to create my VMs, I can use the “template” module and Jinja2, passing in the necessary variables:

#cloud-config
# Configure networking, set a static IP
network:
  version: 2
  renderer: networkd
  ethernets:
    ens3:
      addresses:
      - {{ip_with_prefix}}
      gateway4: {{ ip_with_prefix | ansible.utils.ipaddr ('address/prefix') | ansible.utils.ipaddr('1') | ansible.utils.ipaddr('address') }}
      nameservers:
        addresses:
        - 4.4.4.4
        - 8.8.8.8

The “ip_with_prefix” variable is calculated from inputs elsewhere in the playbook and should always be in “xxx.xxx.xxx.xxx/yyy” format, with yyy being the subnet prefix (not the IP-formatted netmask). The Jinja2 code calculates the gateway for it based on that variable. For now I just hard-code DNS servers, but those could obviously be provided via variables as well.

When my Ansible code builds the VM, it generates a netplan file from that template and passes it in as the cloud-init script, even though technically it’s a netplan script that we’re just using cloud-init to get on the VM. Within the OS, that shows up as a file on the virtual CDROM. Unfortunately, Ubuntu 20 doesn’t seem to automatically mount the CDROM and process the contents as part of its cloud-init on bootup.

Here’s where the trickery, which took me days to figure out, comes in. What needs to happen is that the disk template that Nutanix clones to create a VM needs to have an actual cloud-init script on it that mounts the CDROM and copies the netplan script to the right location. On my Ubuntu 20 disk template, that cloud-init script is named 07_networkcopy.cfg, and looks like this:

#cloud-config
runcmd:
  - "sudo mount -v /dev/sr0 /mnt"
  - "sudo cp /mnt/openstack/latest/user_data /etc/netplan/51-cloud-init.yaml"
  - "sudo rm /etc/netplan/50-cloud-init.yaml"
  - "sudo netplan generate"
  - "sudo netplan apply"

I should have been able to use the “mounts” directive to mount the CDROM, but that was very hit-or-miss. Cloud-init was also weirdly sensitive to what file these commands were in. I originally had the mount step in a separate file named 06_mountcd.cfg, but it never seemed to run. The 07_networkcopy.cfg definitely ran because I got errors about /mnt/openstack/latest/user_data (the location AHV puts cloud-init scripts) not existing. When I put everything in 07_networkcopy.cfg it worked fine. I probably should also put a step in to unmount the CD, I’ll do that next.

I create my disk templates using packer. The code that gets the 07_networkcopy.cfg file into place consists of a few “provisioner” steps in my packer build config file. First is a file directive that copies the file over:

    {
      "destination": "~ubuntu/07_networkcopy.cfg",
      "source": "07_networkcopy.cfg",
      "type": "file"
    },

Then there are a few steps inside an “inline” provisioner (just a list of commands packer runs on the build VM once it’s up):

    {
      "inline": [
        "sudo mv ~ubuntu/07_networkcopy.cfg /etc/cloud/cloud.cfg.d/07_networkcopy.cfg",
        "sudo rm -f /etc/netplan/00-installer-config.yaml",
        "sudo rm -f /etc/netplan/50-cloud-init.yaml",
        "sudo rm -f /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg",
        "sudo cloud-init clean"
      ],
      "type": "shell"
    }

Obviously this assumes that the ubuntu user has the ability to sudo without a password; I have a separate inline block to set that up. The 07_networkcopy.cfg “true cloud-init” file gets copied to the cloud-init config file location, and packer cleans up the /etc/netplan directory and gets rid of a cloud-init file that prevents cloud-init from actually making network changes. (That “subiquity-etc.” step may not actually be necessary because cloud-init itself isn’t running any network commands, it’s merely copying a file directly to netplan and telling it to use it, but it was an artifact from earlier testing, and it worked this way, so I left it.) Then packer runs cloud-init clean, which tells the VM to re-run the cloud-init scripts on bootup.

After running the packer build, you get a disk image that contains no network information, but when booted attempts to mount a CDROM, copy a network file from it to /etc/netplan and tells netplan to apply it. Every time I build a VM, that disk image gets cloned, and I generate a new netplan config file that gets passed in via the CDROM to be consumed by cloud-init when the VM boots.

Clear as mud?