Internally we use AWS backups to backup our EC2 instances. The backup plan is set to use windows vss aware snapshots for these backups.
During monitoring we noticed that this worked for some servers and not for others. For the servers that it was failing on, it would fail with the error:
Completed with issues.
Windows VSS Backup attempt failed because of insufficient privileges to perform this operation.
This seemed odd. All our EC2 instances have the same IAM role attached to them, and we know that it worked for some of the instances.
So, we started to dig, the first thing we needed to figure out is what was causing the insufficient privileges error. A good place to start was to see if the Systems manager Run command output was generating any error messages. We navigated to AWS System manager -> Run Command and went to command history added a filter for Status.
Set filter to Failed.
We found a failed AWSEC2-CreateVSSSnapshot command for one of the impacted servers and clicked on the Command ID to open the details page up.
Within the run command you can see the output of the impacted Instance ID in Targets and outputs then clicking the View output button.
We can see that we’re getting the error message.
Call to Get-EC2Instance threw and Exception, Verify that your instance role has the Describe-Instances permission
This seemed a little off to us as when we looked at the IAM role attached to the EC2 instance we can see the following policy associated with it that has the “ec2:DescribeInstances” IAM action for all resource (line 15). We also know that this role/policy combination works for other servers.
So, to test that the role was working properly on the EC2 instances impacted by the problem we remoted on using Fleet manger, opened a PowerShell window, and ran the command get-ec2instances.
This command should have picked up the IAM role associated with the EC2 instance and returned some results, but the command is saying that we don’t have any permissions.
Making a call to the IMDSv2 service we can see that the instance has got the correct IAM role associated with it. So why is it not working?
A clue to why this was not working is back in the output of the run command we looked at earlier, I’ve highlighted two sections, one with the error but the first highlighted section shows the PowerShell module version.
Check your trusted entities look correct.
Check you have assigned the correct permission policies and add tags to the role. Then click Create role.
Having done some prior checking for this problem we looked at the prerequisites listed here https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/application-consistent-snapshots-prereqs.html to do VSS based snapshots. There is a requirement that AWS PowerShell is above version 3.3.48.0. So, we initially ignored this as 3.3.542.0 is higher than 3.3.48.0, so it should work right?
Previously we had enabled IMDSv2 on all our EC2 instances, which means that you can’t just call the metadata service on 169.254.169.254 and get a set of temporary access keys, you have to request a token first and use that (see this article for more information https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-metadata-v2-how-it-works.html )
It turns out AWS PowerShell versions below 4.0.1.0 do not support this and so weren’t able to retrieve a set of temporary access keys using IMDSv2, which in turn meant that the calls it was making for access via it’s IAM role failed and stopped the backups from performing vss snapshots.
To test this, we installed the newest version of the AWS PowerShell modules on the instance and reran our test command get-ec2instances and as you can see it now returns a list of instances, which means it was able to use the EC2 instance IAM role attached to it.
Then to see if this had fixed our original problem, we ran a manual vss backup and looked at the run command results. You can see below the backup was Successfully without any warnings.