Deploying an Azure App Service from scratch, including DNS and TLS

As many of you have probably gathered, over the past few weeks, I’ve been working on building a process for deploying an Azure App Service from scratch, including DNS and TLS in a single Terraform module.

Today, I write this post with success in my heart, and at the bottom, I provide copies of the necessary files for your own usage.

One of the biggest hurdles I faced was trying to integrate Cloudflare’s CDN services with Azure’s Custom Domain verification. Typically, I’ll rely on the options available in the GUI as the inclusive list of “things I can do” so up until now, if we wanted to stand up a multi-region App Service, we had to do the following:

  1. Build and deploy the App Service, using the azurewebsites.net hostname for HTTPS for each region (R1 and R2)

    e.g. example-app-eastus.azurewebsites.net (R1), example-app-westus.azurewebsites.net (R2)
  2. Create the CNAME record for the service at Cloudflare pointing at R1, turning off proxying (orange cloud off)

    e.g. example-app.domain.com -> example-app-eastus.azurewebsites.net
  3. Add the Custom Domain on R1, using the CNAME verification method
  4. Once the hostname is verified, go back to Cloudflare and update the CNAME record for the service to point to R2

    e.g. example-app.domain.com -> example-app-westus.azurewebsites.net
  5. Add the Custom Domain on R2, using the CNAME verification method
  6. Once the hostname is verified, go back to Cloudflare and update the CNAME record for the service to point to the Traffic Manager, and also turn on proxying (orange cloud on)

While this eventually accomplishes the task, the failure mode it introduces is that if you ever want to add a third (or fourth or fifth…) region, you temporarily have to not only direct all traffic to your brand new single instance momentarily to verify the domain, but you also have to turn off proxying, exposing the fact that you are using Azure (bad OPSEC).

After doing some digging however, I came across a Microsoft document that explains that there is a way to add a TXT record which you can use to verify ownership of the domain without a bunch of messing around with the original record you’re dealing with.

This is great because we can just add new awverify records for each region and Azure will trust we own them, but Terraform introduces a new wrinkle in that it creates the record at Cloudflare so fast that Cloudflare’s infrastructure often doesn’t have time to replicate the new entry across their fleet before you attempt the verification, which means that the lookup will fail and Terraform will die.

To get around this, we added a null_resource that just executes a 30 second sleep to allow time for the record to propagate through Cloudflare’s network before attempting the lookup.

I’ve put together a copy of our Terraform modules for your perusal and usage:

Using this module will allow you to easily deploy all of your micro-services in a Highly Available configuration by utilizing multiple regions.

Using a certificate stored in Key Vault in an Azure App Service

For the last two days, I’ve been trying to deploy some new microservices using a certificate stored in Key Vault in an Azure App Service. By now, you’ve probably figured out that we love them around here. I’ve also been slamming my head against the wall because of some not-well-documented functionality about granting permissions to the Key Vault.

As a quick primer, here’s the basics of what I was trying to do:

resource "azurerm_app_service" "centralus-app-service" {
   name                = "${var.service-name}-centralus-app-service-${var.environment_name}"
   location            = "${azurerm_resource_group.centralus-rg.location}"
   resource_group_name = "${azurerm_resource_group.centralus-rg.name}"
   app_service_plan_id = "${azurerm_app_service_plan.centralus-app-service-plan.id}"

   identity {
     type = "SystemAssigned"
   }
 }

data "azurerm_key_vault" "cert" {
   name                = "${var.key-vault-name}"
   resource_group_name = "${var.key-vault-rg}"
 }
resource "azurerm_key_vault_access_policy" "centralus" {
   key_vault_id = "${data.azurerm_key_vault.cert.id}"
   tenant_id = "${azurerm_app_service.centralus-app-service.identity.0.tenant_id}"
   object_id = "${azurerm_app_service.centralus-app-service.identity.0.principal_id}"
   secret_permissions = [
     "get"
   ]
   certificate_permissions = [
     "get"
   ]
 }
resource "azurerm_app_service_certificate" "centralus" {
   name                = "${local.full_service_name}-cert"
   resource_group_name = "${azurerm_resource_group.centralus-rg.name}"
   location            = "${azurerm_resource_group.centralus-rg.location}"
   key_vault_secret_id = "${var.key-vault-secret-id}"
   depends_on          = [azurerm_key_vault_access_policy.centralus]
 }

and these are the relevant values I was passing into the module:

  key-vault-secret-id       = "https://example-keyvault.vault.azure.net/secrets/cert/0d599f0ec05c3bda8c3b8a68c32a1b47"
  key-vault-rg              = "example-keyvault"
  key-vault-name            = "example-keyvault"

But no matter what I did, I kept bumping up against this error:

Error: Error creating/updating App Service Certificate "example-app-dev-cert" (Resource Group "example-app-centralus-rg-dev"): web.CertificatesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation." Details=[{"Message":"The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation."},{"Code":"BadRequest"},{"ErrorEntity":{"Code":"BadRequest","ExtendedCode":"59716","Message":"The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation.","MessageTemplate":"The service does not have access to '{0}' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation.","Parameters":["/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault"]}}]

I checked and re-checked and triple-checked and had colleagues check, but no matter what I did, it kept puking with this permissions issue. I confirmed that the App Service’s identity was being provided and saved, but nothing seemed to work.

Then I found this blog post from 2016 talking about a magic Service Principal (or more specifically, a Resource Principal) that requires access to the Key Vault too. All I did was add the following resource with the magic SP, and everything worked perfectly.

resource "azurerm_key_vault_access_policy" "azure-app-service" {
   key_vault_id = "${data.azurerm_key_vault.cert.id}"
   tenant_id = "${azurerm_app_service.centralus-app-service.identity.0.tenant_id}"

   # This object is the Microsoft Azure Web App Service magic SP 
   # as per https://azure.github.io/AppService/2016/05/24/Deploying-Azure-Web-App-Certificate-through-Key-Vault.html
   object_id = "abfa0a7c-a6b6-4736-8310-5855508787cd" 

   secret_permissions = [
     "get"
   ]

   certificate_permissions = [
     "get"
   ]
 }

It’s frustrating that Microsoft hasn’t documented this piece (at least officially), but hopefully with this knowledge, you’ll be able to automate using a certificate stored in Key Vault in your next Azure App Service.

How to diagnose per-instance issues in Azure App Service

We have several micro-services that we run as App Services within Azure. In the past few weeks, multiple times we’ve experienced problems where a single instance has decided to go crazy but not crazy enough that Azure knows to take it out of rotation. Being able to diagnose these per-instance issues is imperative when it comes to offering a functioning App Service.

The first thing we’ve done is setup Alerts to monitor each App Service not just as a whole, but also per-instance. (NOTE: These are services we have not migrated to Terraform yet, so we have created these alerts manually, just like the services were as well. The alert definitions have been built into our Terraform stack and automatically get deployed as we move these micro-services over to the new Terraform-managed stack.) To get notified of broken single instances:

  1. Open the App Service in the Portal
  2. Under the ‘Monitoring’ section of the blade, select ‘Alerts’
  3. Select ‘New Alert Rule’
  4. Under ‘Condition’, choose ‘Add’
  5. Choose the ‘Http Server Errors’ signal
  6. Place a check in the box next to the ‘Instance’ dimension
  7. Set your Threshold settings according to your preference (for the record, we are using Dynamic / Medium / 5 minutes / Every 5 Minutes)
  8. Click ‘Done’
  9. Set up your Action Group accordingly
  10. Save the alert

Within 10 minutes, it will begin monitoring each instance which should give you better insight not just into the health of your application, but how each of the pieces that comprise your app are operating.

This is important because you may have 10 instances running a single App Service but the failure of a single instance may not create enough failures to trip an alert, depending on your routing scheme or traffic levels.

I personally believe that more visibility is always a positive, and so being able to detect per-instance issues in your Azure App Service can result in shorter outages and ultimately happier customers.

Posts navigation