CI on Power9 (PPC64LE) LPAR using snapper

Running a CI testsuite on Intel (or ARM) based hardware is quite common these days. Most of the time the software stacks involved are containers (insert your favourite container orchestrator here) or other neat solutions such as vagrant that allow to spin up pristine test environments.

This works nice, if the software you have to run your tests against does not require access to real hardware. While some of the mentioned solutions are available on PPC64 architecture too, it was not sufficient for the case i had to solve this time. I had to execute tests directly on a Power 9 LPAR with SLES12/15 installed.

While this seems like an easy task, keep in mind that Virtualization on Power is not something that involves the possibility to create snapshots and alike. An LPAR (logical partition) is a subset of a Power systems hardware which is managed by a VIO server running AIX.

Wouldnt it be easy to setup an LPAR running linux and then virtualize using KVM and alike? (using hardware passthrough?) No: KVM is currently not supported on Power 9 and running qemu in full virtualization mode is not a performant solution.

So how to solve this? The Requirements are as follows:

1) Have multiple logical partitions with different os flavours.
2) Install third party software into the running system
4) Execute CI suite
5) Reset the system to a pristine state
6) Execute tests for the next build.

The main obstacle was step 5): how to reset the system into a pristine state?

Implementing a complete re-install of the instance using tftp and autoinstall would be possible, but is too complicated.

Hypervisor based snapshots are not possible: no easy way to reset the system via snapshot.

Solution: use Snapper

SLES comes with a nice tool called snapper which allows to create snapshots of your BTRFS volumes and simply rollback the sytems root filesystem to an older snapshot!

It works by using the btrfs subvolume snapshot feature to create snapshots of your btrfs volumes. You can easily rollback a complete (sub)volume or undo changes, even view differences between snapshots. By default it even creates snapshots automatically if administators change system settings or install software components!

The current solution for this particular CI case via snapper looks as follows:

1) Setup an lpar and install an SLE based distribution

2) Remove the zypper plugin that auto-create snapshots on sofware installation, disable automatic snapshots for the root volume:

 zypper remove -y snapper-zypp-plugin
 sed -i 's/USE_SNAPPER="yes"/USE_SNAPPER="no"/g' /etc/sysconfig/yast2
 snapper -c root set-config "TIMELINE_CREATE=no"

3) Bring the node online in the local Jenkins environment.

4) Create the master snapshot of the finalized setup:

snapper create --description "ci-snapshot"

Now the system is ready for executing testcases. The Testcase would then rollback the snapshot after finalization (based on success or not). The steps executed by a Jenkins build step might look like:

set -e
# test something
grep root /etc/passwd
SNAP_ID=$(snapper list | grep ci-snapshot | awk '{print $1}')
# rollback the system snapshot
snapper rollback $SNAP_ID
echo "Rebooting system"
# dont shutdown immediately otherwise jenkins agent marks
# run with failure due to EOF
shutdown -r -t 1 

The snapper rollback command creates a new writable btrfs volume based on the pristine system snapshot (and does also create a copy of the current running state of the volume for later investigation) and defines it as the default subvolume for the next system boot.

After reboot, the System state is reset to its base snapshot state and can execute the next testrun! Jenkins will mark the agents state online as soon as its reachable again ..

If you didnt know about snapper yet, Have a look at its documentation .

Written on August 31, 2021