Cloudflare – Error 520: What is wrong and how to fix it?

We recently ran into an issue setting up a new DNS entry on Cloudflare, using the orange-cloud (reverse proxying) feature, but we were receiving Error 520 and were curious what was wrong and how to fix it.  The error page itself doesn’t give a lot of information and since it’s a custom error they’ve created, it wasn’t easy to find out or even intuit much information about what it might mean.

To give some backstory, we are using a SaaS provider of a service for our employees that we want to protect behind our own domain. For example, instead of using ourcompany.saascompany.com, we wanted to use something like saasservicename.ourcompany.com. The provider supported this and so we set up the record within Cloudflare but as soon as we tried to visit the page, we received Cloudflare’s infamous 520 error: “Web server is returning an unknown error”.

After trying to troubleshoot the problem through Cloudflare, we turned off the orange-cloud and figured out that the SaaS provider hadn’t installed our TLS certificate correctly and so when Cloudflare was attempting to retrieve our instance from their server, they were receiving the NET::ERR_CERT_COMMON_NAME_INVALID. In response to that, they were throwing their own custom error 520 (it is not an official error code).

As soon as the vendor fixed the certificate issue, the 520 went away and we were able to re-enable orange-cloud, confirm that the site was up and working, and continue on with life confident that an attacker would not be able to determine who is providing the SaaS service for us.

Deploying an Azure App Service from scratch, including DNS and TLS

As many of you have probably gathered, over the past few weeks, I’ve been working on building a process for deploying an Azure App Service from scratch, including DNS and TLS in a single Terraform module.

Today, I write this post with success in my heart, and at the bottom, I provide copies of the necessary files for your own usage.

One of the biggest hurdles I faced was trying to integrate Cloudflare’s CDN services with Azure’s Custom Domain verification. Typically, I’ll rely on the options available in the GUI as the inclusive list of “things I can do” so up until now, if we wanted to stand up a multi-region App Service, we had to do the following:

  1. Build and deploy the App Service, using the azurewebsites.net hostname for HTTPS for each region (R1 and R2)

    e.g. example-app-eastus.azurewebsites.net (R1), example-app-westus.azurewebsites.net (R2)
  2. Create the CNAME record for the service at Cloudflare pointing at R1, turning off proxying (orange cloud off)

    e.g. example-app.domain.com -> example-app-eastus.azurewebsites.net
  3. Add the Custom Domain on R1, using the CNAME verification method
  4. Once the hostname is verified, go back to Cloudflare and update the CNAME record for the service to point to R2

    e.g. example-app.domain.com -> example-app-westus.azurewebsites.net
  5. Add the Custom Domain on R2, using the CNAME verification method
  6. Once the hostname is verified, go back to Cloudflare and update the CNAME record for the service to point to the Traffic Manager, and also turn on proxying (orange cloud on)

While this eventually accomplishes the task, the failure mode it introduces is that if you ever want to add a third (or fourth or fifth…) region, you temporarily have to not only direct all traffic to your brand new single instance momentarily to verify the domain, but you also have to turn off proxying, exposing the fact that you are using Azure (bad OPSEC).

After doing some digging however, I came across a Microsoft document that explains that there is a way to add a TXT record which you can use to verify ownership of the domain without a bunch of messing around with the original record you’re dealing with.

This is great because we can just add new awverify records for each region and Azure will trust we own them, but Terraform introduces a new wrinkle in that it creates the record at Cloudflare so fast that Cloudflare’s infrastructure often doesn’t have time to replicate the new entry across their fleet before you attempt the verification, which means that the lookup will fail and Terraform will die.

To get around this, we added a null_resource that just executes a 30 second sleep to allow time for the record to propagate through Cloudflare’s network before attempting the lookup.

I’ve put together a copy of our Terraform modules for your perusal and usage:

Using this module will allow you to easily deploy all of your micro-services in a Highly Available configuration by utilizing multiple regions.

Using a certificate stored in Key Vault in an Azure App Service

For the last two days, I’ve been trying to deploy some new microservices using a certificate stored in Key Vault in an Azure App Service. By now, you’ve probably figured out that we love them around here. I’ve also been slamming my head against the wall because of some not-well-documented functionality about granting permissions to the Key Vault.

As a quick primer, here’s the basics of what I was trying to do:

resource "azurerm_app_service" "centralus-app-service" {
   name                = "${var.service-name}-centralus-app-service-${var.environment_name}"
   location            = "${azurerm_resource_group.centralus-rg.location}"
   resource_group_name = "${azurerm_resource_group.centralus-rg.name}"
   app_service_plan_id = "${azurerm_app_service_plan.centralus-app-service-plan.id}"

   identity {
     type = "SystemAssigned"
   }
 }

data "azurerm_key_vault" "cert" {
   name                = "${var.key-vault-name}"
   resource_group_name = "${var.key-vault-rg}"
 }
resource "azurerm_key_vault_access_policy" "centralus" {
   key_vault_id = "${data.azurerm_key_vault.cert.id}"
   tenant_id = "${azurerm_app_service.centralus-app-service.identity.0.tenant_id}"
   object_id = "${azurerm_app_service.centralus-app-service.identity.0.principal_id}"
   secret_permissions = [
     "get"
   ]
   certificate_permissions = [
     "get"
   ]
 }
resource "azurerm_app_service_certificate" "centralus" {
   name                = "${local.full_service_name}-cert"
   resource_group_name = "${azurerm_resource_group.centralus-rg.name}"
   location            = "${azurerm_resource_group.centralus-rg.location}"
   key_vault_secret_id = "${var.key-vault-secret-id}"
   depends_on          = [azurerm_key_vault_access_policy.centralus]
 }

and these are the relevant values I was passing into the module:

  key-vault-secret-id       = "https://example-keyvault.vault.azure.net/secrets/cert/0d599f0ec05c3bda8c3b8a68c32a1b47"
  key-vault-rg              = "example-keyvault"
  key-vault-name            = "example-keyvault"

But no matter what I did, I kept bumping up against this error:

Error: Error creating/updating App Service Certificate "example-app-dev-cert" (Resource Group "example-app-centralus-rg-dev"): web.CertificatesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation." Details=[{"Message":"The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation."},{"Code":"BadRequest"},{"ErrorEntity":{"Code":"BadRequest","ExtendedCode":"59716","Message":"The service does not have access to '/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation.","MessageTemplate":"The service does not have access to '{0}' Key Vault. Please make sure that you have granted necessary permissions to the service to perform the request operation.","Parameters":["/subscriptions/[SUBSCRIPTIONID]/resourcegroups/example-keyvault/providers/microsoft.keyvault/vaults/example-keyvault"]}}]

I checked and re-checked and triple-checked and had colleagues check, but no matter what I did, it kept puking with this permissions issue. I confirmed that the App Service’s identity was being provided and saved, but nothing seemed to work.

Then I found this blog post from 2016 talking about a magic Service Principal (or more specifically, a Resource Principal) that requires access to the Key Vault too. All I did was add the following resource with the magic SP, and everything worked perfectly.

resource "azurerm_key_vault_access_policy" "azure-app-service" {
   key_vault_id = "${data.azurerm_key_vault.cert.id}"
   tenant_id = "${azurerm_app_service.centralus-app-service.identity.0.tenant_id}"

   # This object is the Microsoft Azure Web App Service magic SP 
   # as per https://azure.github.io/AppService/2016/05/24/Deploying-Azure-Web-App-Certificate-through-Key-Vault.html
   object_id = "abfa0a7c-a6b6-4736-8310-5855508787cd" 

   secret_permissions = [
     "get"
   ]

   certificate_permissions = [
     "get"
   ]
 }

It’s frustrating that Microsoft hasn’t documented this piece (at least officially), but hopefully with this knowledge, you’ll be able to automate using a certificate stored in Key Vault in your next Azure App Service.