# Deploying Sourcegraph executors using Terraform on AWS or GCP [Terraform modules](https://learn.hashicorp.com/tutorials/terraform/module-use?in=terraform/modules) are provided to provision machines running executors on [AWS](https://sourcegraph.com/github.com/sourcegraph/terraform-aws-executors) and [Google Cloud](https://sourcegraph.com/github.com/sourcegraph/terraform-google-executors). ## Basic Definition The following is the minimum required definition to deploy an executor. ```terraform module "executors" { source = "sourcegraph/executors/" # Find the latest version matching your Sourcegraph version here: # - https://github.com/sourcegraph/terraform-google-executors/tags # - https://github.com/sourcegraph/terraform-aws-executors/tags version = "" # AWS specific availability_zone = "" # Google specific region = "" zone = "" executor_sourcegraph_external_url = "" executor_sourcegraph_executor_proxy_password = "" # Either: executor_queue_name = "" # Or: executor_queue_names = "codeintel,batches" executor_instance_tag = "" executor_metrics_environment_label = "" } ``` | Variable | Description | | ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `availability_zone` | The **AWS** availability zone to create the instance in | | `region` | The **Google** region to provision the executor resources in. | | `zone` | The **Google** zone to provision the executor resources in. | | `executor_sourcegraph_external_url` | The public URL of your Sourcegraph instance. This corresponds to the `externalURL` value in your Sourcegraph instance’s site configuration and must be resolvable from the provisioned executor compute resources. | | `executor_sourcegraph_executor_proxy_password` | The access token corresponding to the `executors.accessToken` in your Sourcegraph instance's site configuration. | | `executor_queue_name` | The single queue from which the executor should pull jobs - [`codeintel`](/code-navigation/auto-indexing) or [`batches`](/batch-changes/server-side). Either this or `executor_queue_names` must be set. | | `executor_queue_names` | The multiple queues from which the executor should pull jobs - one or more of [`codeintel`](/code-navigation/auto-indexing) and [`batches`](/batch-changes/server-side). Either this or `executor_queue_name` must be set. | | `executor_instance_tag` | A label tag to add to all the executors; can be used for filtering out the right instances in stackdriver monitoring | | `executor_metrics_environment_label` | The value for environment by which to filter the custom metrics. | See the Terraform Modules for additional configurations. - [AWS](https://sourcegraph.com/github.com/sourcegraph/terraform-aws-executors/-/blob/modules/executors/variables.tf) - [Google](https://sourcegraph.com/github.com/sourcegraph/terraform-google-executors/-/blob/modules/executors/variables.tf) ## Terraform Version Terraform modules `4.2.x` and above allow Terraform from `1.1.x` to `< 2.x` to be used. If using a Terraform module `4.1.x` or below, use [tfenv](https://sourcegraph.com/github.com/tfutils/tfenv) to install Terraform 1.1+. ```shell tfenv install 1.1.9 tfenv use 1.1.9 ``` ## Permissions In order to provision executor resources, specific permissions must be granted. ### AWS Access to get and create in the following resources. - Auto Scaling - CloudWatch Logs - EBS (EC2) - EC2 (Elastic Compute Cloud) - IAM (Identity & Access Management) - VPC (Virtual Private Cloud) ### Google Ensure the [IAM API](https://console.cloud.google.com/apis/api/iam.googleapis.com/overview) is enabled. - appengine.applications.get - clientauthconfig.brands.\* - clientauthconfig.clients.\* - cloudasset.assets.searchAllResources - cloudnotifications.activities.list - cloudtrace.insights.get - cloudtrace.insights.list - cloudtrace.stats.get - cloudtrace.tasks.\* - cloudtrace.traces.list - compute.addresses.\* - compute.autoscalers.\* - compute.disks.\* - compute.firewalls.\* - compute.globalOperations.get - compute.instanceGroupManagers.\* - compute.instanceGroups.create - compute.instances.\* - compute.instanceTemplates.\* - compute.networks.\* - compute.regionOperations.get - compute.subnetworks.\* - compute.zoneOperations.get - compute.zones.get - container.clusters.list - iam.roles.\* - iam.serviceAccountKeys.\* - logging.logEntries.list - logging.privateLogEntries.list - monitoring.timeSeries.list - oauthconfig.testusers.update - oauthconfig.verification.update - orgpolicy.policy.get - resourcemanager.projects.\* - secretmanager.locations.list - secretmanager.secrets.\* - secretmanager.versions.\* ## Supported Regions ### AWS - `us-east-1` - `us-east-2` - `us-west-1` - `us-west-2` - `eu-west-1` - `eu-west-2` - `eu-west-3` - `eu-north-1` - `eu-south-1` - `eu-central-1` - `ap-northeast-1` - `ap-northeast-2` - `ap-southeast-1` - `ap-southeast-2` - `ap-southeast-3` - `ap-east-1` - `ap-south-1` - `sa-east-1` - `me-south-1` - `af-south-1` - `ca-central-1` ### Google All regions are supported. ## Examples ### Single Executor The following examples provision a single executor to pull from the `codeintel` queue. - [AWS example](https://sourcegraph.com/github.com/sourcegraph/terraform-aws-executors/-/tree/examples/single-executor) - [Google example](https://sourcegraph.com/github.com/sourcegraph/terraform-google-executors/-/tree/examples/single-executor) ### Multiple Executors The following examples provision two executors, one to pull from the `codeintel` queue and the other for the `batches` queue. - [AWS example](https://sourcegraph.com/github.com/sourcegraph/terraform-aws-executors/-/tree/examples/multiple-executors) - [Google example](https://sourcegraph.com/github.com/sourcegraph/terraform-google-executors/-/tree/examples/multiple-executors) ### Step-by-step Guide The following is a step-by-step guide on provisioning a single `codeintel` executor on GCP. #### Provision 1. [Install Terraform](#terraform-version). 2. Install the [`gcloud CLI`](https://cloud.google.com/sdk/docs/install) 3. Run `gcloud auth application-default login` 4. Set up your Sourcegraph instance's Site configuration for executors 1. Click on your profile picture in the top right corner 2. Select **Site admin** 3. Example the **Configuration** section 4. Select **Site configuration** 5. Set the following, - `"externalURL": ""` - A URL that is accessible from the GCP VM (e.g. a public URL such as `https://sourcegraph.example.com`) - `"executors.accessToken": ""` - Can be generated by running `cat /dev/random | base64 | head -c 20` - The secret will be as displayed `REDACTED` once it's saved. - `"codeIntelAutoIndexing.enabled": true` - _This is only for `codeintel` executors._ 5. Download the [example files](https://sourcegraph.com/github.com/sourcegraph/terraform-google-executors/-/blob/examples/single-executor) 6. Change the following in `providers.tf` - `project` to the GCP project to provision the executor in - `region` to the GCP region to provision the executor in - `zone` to the GCP zone to provision the executor in 7. Change the following in `main.tf` - `executor_sourcegraph_external_url` to the URL configured in your instance's **Site configuration** - `executor_sourcegraph_executor_proxy_password` to the access token configured in your instance's **Site configuration** 8. Run `terraform init` to download the Sourcegraph executor modules. 9. Run `terraform plan` to preview the changes that will occur to your GCP infrastructure. 10. Run `terraform apply` and enter "yes" after reviewing the proposed changes to create the executor VM - Ensure `terraform apply` exited with code 0 and did not print any errors 11. Go back to the site admin page, expand **Executors**, click **Instances**, and check to see if your executor shows up in the list with a green dot 🟢 #### Validation The following can be done to troubleshoot or double-check that the executor has been properly provisioned. Ensure the executor is listed in the Compute Engine. Either go to **Compute Engine** in the GCP Console for your project or run the following command. ```shell $ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS sourcegraph-executor-h0rv us-central1-c n1-standard-4 10.0.1.16 RUNNING sourcegraph-executors-docker-registry-mirror us-central1-c n1-standard-2 10.0.1.2 RUNNING ``` You can ssh into to the instance to ensure the service is running. You can open an ssh connection either via the GCP Console or by running the following command. ```shell gcloud compute ssh sourcegraph-executor-h0rv ``` Then run the following command to check if the service is running. ```shell you@sourcegraph-executor-h0rv:~$ systemctl status executor 🟢 executor.service - User code executor Loaded: loaded (/etc/systemd/system/executor.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2021-11-18 02:28:48 UTC; 19s ago ``` To check the logs, you can either query the **Log Explorer** in the GCP Console or by running the following command while connected to the instance. ```shell you@sourcegraph-executor-h0rv:~$ journalctl -u executor | less Nov 18 02:31:01 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:01+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/dequeue code=204 duration=92.131237ms Nov 18 02:31:01 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:01+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/canceled code=200 duration=90.630467ms Nov 18 02:31:02 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:02+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/dequeue code=204 duration=91.269106ms Nov 18 02:31:02 sourcegraph-executor-h0rv executor[2465]: t=2021-11-18T02:31:02+0000 lvl=dbug msg="TRACE internal" host=... path=/.executors/queue/codeintel/canceled code=200 duration=161.469685ms ``` Ensure the `EXECUTOR_FRONTEND_URL` and `EXECUTOR_FRONTEND_PASSWORD` in `/etc/systemd/system/executor.env` are correct ``` cat /etc/systemd/system/executor.env ``` Ensure the VM can hit your `externalURL`: ```shell you@sourcegraph-executor-h0rv:~$ curl Found ``` #### Configure Auto-indexing 1. Go to the **Site admin** page 2. Expand **Code graph**, 3. Select **Configuration** 4. Click **Create new policy**, and fill in: - Name: `TEST` - Click _add a repository pattern_ - Repository pattern #1: set this to an existing repository on your Sourcegraph instance ( e.g. `github.com/gorilla/mux`) - Type: `HEAD` - Retention: Disabled - Auto-indexing: Enabled 5. Expand **Code graph** 6. Select **Auto-indexing**, and check to see if an indexing job has appeared. If nothing is there: - Try clicking **Enqueue** - Try setting a higher update frequency: `PRECISE_CODE_INTEL_AUTO_INDEXING_TASK_INTERVAL=10s` - Try setting a lower delay: `PRECISE_CODE_INTEL_AUTO_INDEXING_REPOSITORY_PROCESS_DELAY=10s` 7. Once you have a completed indexing job, click **Uploads** and check to see that an index has been uploaded. 8. Once the index has been uploaded, you should see the **`PRECISE`** badge in the hover! 🎉 ## Auto-scaling > NOTE: Auto scaling is currently not supported > when [downloading and running executor binaries yourself](/self-hosted/executors/deploy-executors-binary), > and on managed instances when using self-hosted executors, since it requires deployment adjustments. Auto-scaling of executor instances can help to increase concurrency of jobs, without paying for unused resources. With auto-scaling, you can scale down to 0 instances when no workload exist and scale up as far as you like and your cloud provider can support. Auto-scaling needs to be configured separately. Auto-scaling makes use of the auto-scaling capabilities of the respective cloud provider (**AutoScalingGroups** on AWS and **Instance Groups** on GCP). Sourcegraph's `worker` service publishes a scaling metric (that is, the number of jobs in queue) to the cloud providers. Then, based on that reported value, the auto-scalers add and remove compute resources to match the required amount of compute. The autoscaler will attempt to hold 1 instance running per each [`executor_jobs_per_instance_scaling`](https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/sourcegraph/terraform-.*-executors%24+variable+%22executor_jobs_per_instance_scaling%22&patternType=literal) items in queue. For example, if `executor_jobs_per_instance_scaling` is set to `20` and the queue size is currently `400`, then `20`instances would be determined as required to handle the load. You might want to tweak this number based on the [machine type](https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/sourcegraph/terraform-.*-executors%24+variable+%22machine_type%22+-f:docker-mirror&patternType=literal), [concurrency per machine](https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/sourcegraph/terraform-.*-executors%24+variable+%22maximum_num_jobs%22&patternType=literal) and desired processing speed. With the Terraform variables [`executor_min_replicas`](https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/sourcegraph/terraform-.*-executors%24+variable+%22executor_min_replicas%22&patternType=literal) and [`executor_max_replicas`](https://sourcegraph.com/search?q=context:global+repo:%5Egithub.com/sourcegraph/terraform-.*-executors%24+variable+%22executor_max_replicas%22&patternType=literal) in the Terraform modules linked to above, you can configure the minimum and maximum number of compute machines to be run at a given time. For auto-scaling to work, two things must be true: 1. `executor_min_replicas` must be `>= 0` and `executor_max_replicas` must be `> executor_min_replicas`. 2. The Sourcegraph instance (its `worker` service, specifically) needs to publish scaling metrics to the used cloud provider. For the latter to work, the Sourcegraph instance needs to be configured with the correct credentials that allow it to access the cloud provider. The `credentials` submodule in both the [AWS](https://sourcegraph.com/github.com/sourcegraph/terraform-aws-executors/-/tree/modules/credentials) and [Google](https://sourcegraph.com/github.com/sourcegraph/terraform-google-executors/-/tree/modules/credentials) executor modules exists for that purpose. When used, the `credentials` module sets up the credentials on the cloud provider and returns them in the Terraform outputs. Here's an example of how one would configure auto-scaling. ```terraform module "executors" { source = "sourcegraph/executors/" version = "" # Basic configuration... # Auto-scaling executor_min_replicas = 0 # Spin down when not in use executor_max_replicas = 30 executor_jobs_per_instance_scaling = 20 } module "my-credentials" { source = "sourcegraph/executors///modules/credentials" version = "" # AWS availability_zone = "" # Removed in 4.2 # Google zone = "" # Removed in 4.1 resource_prefix = "" } # AWS output "metric_writer_access_key_id" { value = module.my-credentials.metric_writer_access_key_id } output "metric_writer_secret_key" { value = module.my-credentials.metric_writer_secret_key sensitive = true } # Google output "metric_writer_credentials_file" { value = module.my-credentials.metric_writer_credentials_file sensitive = true } ``` After running `terraform apply`, the outputs are retrieved by running the following commands. ```shell # AWS $ terraform output metric_writer_access_key_id $ terraform output metric_writer_secret_key # Google $ terraform output metric_writer_credentials_file ``` These outputs are used to configure the Sourcegraph instance (see below). ### AWS The AWS EC2 auto-scaling groups configured by the Sourcegraph Terraform module respond to changes in metric values written to **CloudWatch**. The target Sourcegraph instance is expected to continuously write these values. To write the scaling metric to **CloudWatch**, the `worker` service must have defined the following environment variables. | Environment Variable | Description | | --------------------------------------- | -------------------------------------------------------- | | `EXECUTOR_METRIC_ENVIRONMENT_LABEL` | Same value as `executor_metrics_environment_label` | | `EXECUTOR_METRIC_AWS_NAMESPACE` | Must be set to `sourcegraph-executor` | | `EXECUTOR_METRIC_AWS_REGION` | The target AWS region | | `EXECUTOR_METRIC_AWS_ACCESS_KEY_ID` | The value of the output of `metric_writer_access_key_id` | | `EXECUTOR_METRIC_AWS_SECRET_ACCESS_KEY` | The value of the output of `metric_writer_secret_key` | ### Google The Google Compute Engine auto-scaling groups configured by the Sourcegraph Terraform module respond to changes in metric values written to Cloud Monitoring. The target Sourcegraph instance is expected to continuously write these values. To write the scaling metric to **Cloud Monitoring**, the `worker` service must have defined the following environment variables. | Environment Variable | Description | | ----------------------------------- | -------------------------------------------------- | | `EXECUTOR_METRIC_ENVIRONMENT_LABEL` | Same value as `executor_metrics_environment_label` | | `EXECUTOR_METRIC_GCP_PROJECT_ID` | The GCP Project ID | Then either one of the following environment variables must be set. | Environment Variable | Description | | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | | `EXECUTOR_METRIC_GOOGLE_APPLICATION_CREDENTIALS_FILE_CONTENT` | The **base64-decoded** output of `metric_writer_credentials_file` | | `EXECUTOR_METRIC_GOOGLE_APPLICATION_CREDENTIALS_FILE` | The path to the file containing the **base64-decoded** output of `metric_writer_credentials_file` | ### Testing auto scaling Once the environment variables have been set and the worker service has been restarted, you should be able to find the scaling metrics in your cloud providers dashboards. To test if the metric is correctly reported into the Cloud provider: - On AWS, this can be found in the **CloudWatch** metrics section. Under **All metrics**, select the namespace `sourcegraph-executor` and then the metric `environment`, `queueName`. Make sure there are entries returned. - On Google Cloud, this can be found in the **Metrics explorer**. Select **Resource type: Global** and then **Metric: `custom/executors/queue/size`**. You should see values reported here. `0` is also an indicator that it works correct. Next, you can test whether the number of executors rises and shrinks as load spikes occur. Keep in mind that auto-scaling is not a real-time operation on most cloud providers and usually takes a short moment and can have some delays between the metric going down and the desired machine count adjusting. ## Upgrading executors Upgrading executors is relatively uninvolved. Simply follow the instructions below. Also, check the [changelog](https://sourcegraph.com/changelog) for any Executors related breaking changes or new features or flags that you might want to configure. See [Executors maintenance](/self-hosted/executors/deploy-executors#Maintaining-and-upgrading-executors) for version compatability. ### **Step 1:** Update the source version of the terraform modules > NOTE: Keep in mind that only one minor version bumps are guaranteed to be disruption-free. ```diff module "executors" { source = "sourcegraph/executors/" # Find the latest version matching your Sourcegraph version here: # - https://github.com/sourcegraph/terraform-google-executors/tags # - https://github.com/sourcegraph/terraform-aws-executors/tags - version = "4.0.0" + version = "4.1.0" # AWS specific availability_zone = "" # Google specific region = "" zone = "" executor_sourcegraph_external_url = "" executor_sourcegraph_executor_proxy_password = "" executor_queue_name = "" executor_instance_tag = "" executor_metrics_environment_label = "" } ``` ### **Step 2:** Reapply the terraform configuration Simply reapply the terraform configuration and executors will be ready to go again. ```bash terraform apply ```