How the latest Windows Update KB5005716 broke my CI environment

I depend on full provisioned virtual machine images for CI and unfortunately a major part of our customers still uses parts of the software i need to test on Windows 10. My environment uses vagrant for provisioning Windows 10 based images and then executes several tasks using the vagrant provisioning feature.

The workflow is as follows:

  • Windows 10 image boots on a kvm/libvirt host
  • vagrant waits for the image to be reachable via WinRM
  • vagrant uploads the provisioning script via WinRM and executes it
  • testsuite is executed

Just a few days ago, this workflow broke, without any changes either to the executing environment or the provisioning scripts involved.

Spinning up the vagrant box just hung after:

==> default: Running provisioner: shell...
    default: Running: inline PowerShell script

to find the cause for the problem we need to understand:

How provisioning works on windows

While vagrant can upload powershell scripts via WinRM and execute them, it has to overcome certain restrictions if the provisioning scripts need to be executed with elevated system rights.

In order to get these elevated system rights, vagrant creates a Scheduled Windows task to execute them (yes,.. yes), this is the default setting, but can be changed with the privileged option

With this in mind, we can start..

Analyzing the issue

Once the vagrant box is spinned up, i could issue commands via winrm:

 vagrant winrm -s powershell -c "Write foo"
 foo

which worked flawlessly, but didnt use the elevated execution. So the next step involved this command:

 vagrant winrm -e -s powershell -c "Write foo"

Which reproduced the problem and the command never returned .. until i logged on to the system on the console with any user configured in the image. WTF? From that point on, everyhing behaved as normal!

Windows Scheduled Tasks

I redeployed the image and executed the command again, additionally used another powershell command to get the task state via powershell:

vagrant winrm  -s powershell -c "Get-ScheduledTask" | grep WinRM_Elevated_Shell
     WinRM_Elevated_Shell_8baef852-... Queued

So the windows scheduler decided to queue the task because: .. no user ever logged on to the system.

Update: KB5005716

The Windows images i use are sysprepped windows installations. The images are setup in a way that the default vagrant user is set to Auto-Logon.

Unfortunately, during the first boot the Windows 10 image pulled in available updates and one update recently released is:

KB5005716 Out of Box Experience Update für Windows 10

It turns out, if this update is pulled by a sysprepped system, it updates various user related OOB components and disables the auto logon.

As such, no user ever logged on to the system and the Windows scheduler simply refused to execute any Schedules created by vagrant during the provisioning step.

I was able to fix the situation by re-building the Windows 10 image with the update already applied pre sysprep stage and had things back working.

Thanks again, Microsoft!

Written on October 6, 2021