Using a certificate stored in Key Vault in an Azure App Service

For the last two days, I’ve been trying to deploy some new microservices using a certificate stored in Key Vault in an Azure App Service. By now, you’ve probably figured out that we love them around here. I’ve also been slamming my head against the wall because of some not-well-documented functionality about granting permissions to the Key Vault.

As a quick primer, here’s the basics of what I was trying to do:

resource "azurerm_app_service" "centralus-app-service" {
   name                = "${var.service-name}-centralus-app-service-${var.environment_name}"
   location            = "${azurerm_resource_group.centralus-rg.location}"
   resource_group_name = "${azurerm_resource_group.centralus-rg.name}"
   app_service_plan_id = "${azurerm_app_service_plan.centralus-app-service-plan.id}"

   identity {
     type = "SystemAssigned"
   }
 }

data "azurerm_key_vault" "cert" {
   name                = "${var.key-vault-name}"
   resource_group_name = "${var.key-vault-rg}"
 }
resource "azurerm_key_vault_access_policy" "centralus" {
   key_vault_id = "${data.azurerm_key_vault.cert.id}"
   tenant_id = "${azurerm_app_service.centralus-app-service.identity.0.tenant_id}"
   object_id = "${azurerm_app_service.centralus-app-service.identity.0.principal_id}"
   secret_permissions = [
     "get"
   ]
   certificate_permissions = [
     "get"
   ]
 }
resource "azurerm_app_service_certificate" "centralus" {
   name                = "${local.full_service_name}-cert"
   resource_group_name = "${azurerm_resource_group.centralus-rg.name}"
   location            = "${azurerm_resource_group.centralus-rg.location}"
   key_vault_secret_id = "${var.key-vault-secret-id}"
   depends_on          = [azurerm_key_vault_access_policy.centralus]
 }

and these are the relevant values I was passing into the module:

  key-vault-secret-id       = "https://example-keyvault.vault.azure.net/secrets/cert/0d599f0ec05c3bda8c3b8a68c32a1b47"
  key-vault-rg              = "example-keyvault"
  key-vault-name            = "example-keyvault"

But no matter what I did, I kept bumping up against this error:

Error: Error creating/updating App Service Certificate "example-app-dev-cert" (Resource Group "example-app-centralus-rg-dev"): web.CertificatesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation." Details=[{"Message":"The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation."},{"Code":"BadRequest"},{"ErrorEntity":{"Code":"BadRequest","ExtendedCode":"59716","Message":"The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation.","MessageTemplate":"The service does not have access to '{0}' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation.","Parameters":["/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault"]}}]

I checked and re-checked and triple-checked and had colleagues check, but no matter what I did, it kept puking with this permissions issue. I confirmed that the App Service’s identity was being provided and saved, but nothing seemed to work.

Then I found this blog post from 2016 talking about a magic Service Principal (or more specifically, a Resource Principal) that requires access to the Key Vault too. All I did was add the following resource with the magic SP, and everything worked perfectly.

resource "azurerm_key_vault_access_policy" "azure-app-service" {
   key_vault_id = "${data.azurerm_key_vault.cert.id}"
   tenant_id = "${azurerm_app_service.centralus-app-service.identity.0.tenant_id}"

   # This object is the Microsoft Azure Web App Service magic SP 
   # as per https://azure.github.io/AppService/2016/05/24/Deploying-Azure-Web-App-Certificate-through-Key-Vault.html
   object_id = "abfa0a7c-a6b6-4736-8310-5855508787cd" 

   secret_permissions = [
     "get"
   ]

   certificate_permissions = [
     "get"
   ]
 }

It’s frustrating that Microsoft hasn’t documented this piece (at least officially), but hopefully with this knowledge, you’ll be able to automate using a certificate stored in Key Vault in your next Azure App Service.

Generate Terraform files for existing resources

You may find yourself in a position where a resource already exists in your cloud environment but was created in the respective provider’s GUI rather than in Terraform. You may feel a bit overwhelmed at first, but there are a few ways to generate Terraform files for existing resources, and we’re going to talk about the various ways today. This is also not an exhaustive list; if you have any other suggestions, please leave a comment and I’ll be sure to update this post.

Method 1 – Manual

Be warned, the manual method takes a little more time, but is not restricted to certain resource types. I prefer this method because it means that you’ll be able to see every setting that is already set on your resource with your own two eyes, which is good for sanity checking.

First, you’re going to want to create a .tf file with just the outline of the resource type you’re trying to import or generate.

For example, if I wanted to create the Terraform for a resource group called example-resource-group that had several tags attached to it, I would do:

resource "azurerm_resource_group" "example-resource-group" {
}

and then save it.

Next, I would go to the Azure GUI, find and open the resource group, and then open the ‘Properties’ section from the blade.

I would look for the Resource ID, for example /subscriptions/54ba8d50-7332-4f23-88fe-f88221f75bb3/resourceGroups/example-resource-group and copy it.

I would then open up a command prompt / terminal and import the state by running: terraform import azurerm_resource_group.example-resource-group /subscriptions/54ba8d50-7332-4f23-88fe-f88221f75bb3/resourceGroups/example-resource-group

Finally, and this is the crucial part, I would immediately run terraform plan. There may be required fields that you will need to fill out before this comamnd works, but in general, this will compare the existing state that you just imported to the blank resource in the .tf file, and show you all of the differences which you can then copy into your new Terraform file, and be confident that you have imported all of the settings.

Example:

# azurerm_resource_group.example-resource-group will be updated in-place
   ~ resource "azurerm_resource_group" "example-resource-group" {
         id       = "/subscriptions/54ba8d50-7332-4f23-88fe-f88221f75bb3/resourceGroups/example-resource-group"
         location = "centralus"
         name     = "example-resource-group"
       ~ tags     = {
           ~ "environment" = "dev" -> null
           ~ "owner"       = "example.person" -> null
           ~ "product"     = "internal" -> null
         }
     }

A shortcut I’ve found is to just copy the entire resource section, and then replace all of the tildes (~) with spaces, and then find and remove all instances of -> null.

Method 2 – Az2tf (Azure only)

Andy Thomas (Microsoft employee) put together a tool called Az2tf which iterates over your entire subscription, and generates .tf files for most of the common types of resources, and he’s adding more all the time. Requesting a specific resource type is as simple as opening an issue and explaining which resource is missing. In my experience, he’s responded within a few hours with a solution.

Method 3 – Terraforming (AWS only)

Daisuke Fujita put together a tool called Terraforming that with a little bit of scripting can generate Terraform files for all of your AWS resources.

Method 4 – cf-terraforming (Cloudflare only)

Cloudflare put together a fantastic tool called cf-terraforming which rips through your Cloudflare tenant and generates .tf files for everything Cloudflare related. The great thing about cf-terraforming is that because it’s written by the vendor of the original product, they treat it as a first class citizen and keep it very up-to-date with any new resources they themselves add to their product. I wish all vendors would do this.

To sum things up, there are plenty of ways to generate Terraform files for existing resources. Some are more time consuming than others, but they all have the goal of making your environment less brittle and your processes more repeatable, which will save time, money, and most importantly stress, when an inevitable incident takes place.

Do you know of any other tools for these or other providers that can assist in bringing previously unmanaged resources under Terraform management? Leave a comment and we’ll add them to this page as soon as possible!

Terraform: “Error: insufficient items for attribute “sku”; must have at least 1″

Last week, we were attempting to deploy a new Terraform-owned resource but every time we ran terraform plan or terraform apply, we got the error Error: insufficient items for attribute "sku"; must have at least 1. We keep our Terraform code in a Azure DevOps project, with approvals being required for any new commits even into our dev environment, so we were flummoxed.

Our first thought was that we had upgraded the Terraform azurerm provider from 1.28.0 to 1.32.0 and we knew for a fact that the azurerm_key_vault resource had been changed from accepting a sku {} block to simply requiring a sku_name property. We tried every combination of having either, both, and none of them defined, and we still received the error. We even tried downgrading back to 1.28.0 as a fallback, but it made no change. At this point we were relatively confident that it wasn’t the provider.

The next thing we looked for was any other resources that had a sku {} block defined. This included our azurerm_app_service_plans, our azure_virtual_machines, and our azurerm_vpn_gateway. We searched for and commented out all of the respective declarations from our .tf files, but still we received the error.

Now we were starting to get nervous. Nothing we tried would solve the problem, and we were starting to get a backlog of requests for new resources that we couldn’t deploy because no matter what we did, whether adding or removing potentially broken code, we couldn’t deploy any new changes. To say the tension on our team was palpable would be the understatement of the year.

At this point we needed to take a step back and analyze the problem logically, so we all took a break from Terraform to clear our minds and de-stress a bit. We started to suspect something in the state file was causing the problem, but we weren’t really sure what. We decided to take the sledgehammer approach and using terraform state rm, we removed every instance of those commented out resources we found above.

This worked. Now we could run terraform plan and terraform apply without issue, but we still weren’t sure why. That didn’t bode well if the problem re-occured; we couldn’t just keep taking a sledgehammer to the environment, it’s just too disruptive. We needed to figure out the root cause.

We opened an issue on the provider’s GitHub page for further investigation, and after some digging by other community members and Terraform employees themselves, it seems that Microsoft’s API returns a different response for App Service Plans than any other resource when it is found to be missing. An assumption was being made that it would be the same for all resources, but it turned out that this was a bad assumption to make.

This turned out to be the key for us. Someone had deleted several App Service Plans from the Azure portal (thinking they were not being used) and so our assumption is that when the provider is checking for the status of a missing App Service Plan, the broken response makes Terraform think it actually exists, even though there’s no sku {} data in it, causing Terraform to think that that specific data was missing.

Knowing the core problem, the error message Error: insufficient items for attribute "sku"; must have at least 1 kind of makes sense now: the sku attribute is missing at least 1 item, it just doesn’t make clear that the “insufficient items” are on the Azure side, not the Terraform / .tf side.

They’ve added a workaround in the provider until Microsoft updates the API to respond like all of the other resources.

Have you seen this error before? What did you do to solve it?

Posts navigation