Docs

Home

Getting Started with Scale

Customer Dashboard

Get Support

Data Hosting

Secure Attachment Access

Secure Attachment Access

We recommend using AWS S3, Google Cloud, or Azure to host your data.

S3 IAM Access

If you use AWS S3 to store data, if you submit tasks with attachments as s3: protocol URIs, rather than http: or https:, we will use the S3 API to fetch your data. For example, instead of sending https://s3-us-west-2.amazonaws.com/bucket/key, you would send s3://bucket/key.

We can either fetch your data using IAM Delegated Access (preferred, more secure) or Cross-account Access.

IAM Delegated Access

To access S3 data in your AWS account, Scale can assume a role in your account, which has permission to access data in your S3 buckets. This role must be named ScaleAI-Integration.

To set up IAM Delegated Access:

As a team admin or manager, go to dashboard.scale.com/settings/integrations.
In another window, create a new role in the AWS IAM Console

Select Another AWS account for the Role Type.
Enter 307185671274 (Scale's Account ID) as the Account ID.
Check Require external ID, and enter the external ID displayed in the AWS section of the Integrations Settings page.
Do not check Require MFA.

For permissions, either attach a policy that grants appropriate access, or create a policy. A sample role policy is shown below.
Name the role ScaleAI-Integration.
Return to the Scale Dashboard and enter your AWS account ID.

Sample Role Policy for IAM Delegated Access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "scale-s3-access",
            "Action": [
                "s3:GetObject"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
        }
    ]
}

Note that if you enable the AWS integration for your account, we will not attempt to fetch attachments from our account (307185671274) directly; the policies described in Cross-account Access will not work.

Cross-account Access

If IAM delegated access is not configured, we will directly fetch attachments from your S3 bucket, using AWS account ID 307185671274 (canonical ID ae2259599e139df6cedb60b6300bcafa1c652aff129aa3d887477b6d4abf2e47), which you can grant access to on a per-object basis using ACLs or using bucket policies.

For most customers, we recommend setting a Bucket Policy that shares the bucket's contents with Scale's account.

A sample Bucket Policy below - please be sure to replace YOUR_BUCKET_NAME with the name of your bucket, leaving the /* as shown or replacing it with a more specific bucket path to further restrict access.

Please note that if using Access Control Lists (ACLs), each object must have its ACL individually updated to grant read access to our account, as Bucket ACLs cannot grant read permissions to the objects inside.

Sample Bucket Policy for Cross-account Access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "scale-s3-access",
            "Action": [
                "s3:GetObject"
            ],
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::307185671274:root"
                ]
            },
            "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
        }
    ]
}

Please note that this authentication mechanism suffers from the confused deputy problem — a third party that can guess your S3 URLs will be able to submit tasks with your data.

Google Cloud Storage Access

If you use Google Cloud Storage to store data, if you submit tasks with attachments as gs: protocol URIs, rather than http: or https:, we will use the Google Cloud Storage API to fetch your data. For example, instead of sending https://storage.googleapis.com/bucket/key, you would send gs://bucket/key.

We can either fetch your data using Service Account Impersonation (preferred, more secure) or Cross-project Access.

Service Account Impersonation

To access Cloud Storage data in your GCP project, Scale can impersonate a service account within that project, which has permission to access data in Cloud Storage.

To set up Service Account Impersonation:

As a team admin or manager, go to dashboard.scale.com/settings/integrations.
In another window, navigate to the GCP Service Accounts page for the appropriate project.
Create a service account.

The service account ID must contain an 8-character user identifier as a substring, this identifier can be found in the Google Cloud Platform section of the Integrations Settings page.
We suggest the ID scaleai-integrations-{uid}.

Grant Scale's service account the ability to impersonate the newly created service account

In the Service Accounts page on GCP, check the box associated with the newly created service account.
In the permissions pane on the right, click Add Member, you may need to click "Show Info Panel" in the top right to see this option.
Specify backend-bucket-access@attachment-storage-243718.iam.gserviceaccount.com as the member, and Service Account Token Creator as the role.
Save the permissions.

In Google Cloud Storage, assign the Storage Object Data Reader permissions for the requisite buckets to the newly created service account.

If you use fine-grained access controls, add the service account email as a Reader for any objects you would like to upload (if not already granted by bucket-level access).

Return to the Scale Dashboard and enter the email of the service account.

Note that if you enable the GCP integration for your account, we will not attempt to fetch attachments from the default service account ( backend-bucket-access@attachment-storage-243718.iam.gserviceaccount.com) directly; the policies described in GCP IAM Access will not work.

Cross-project Access

If Service Account Impersonation is not configured, we will directly fetch attachments from your GCS bucket, using the GCP service account backend-bucket-access@attachment-storage-243718.iam.gserviceaccount.com. You can grant access to this service account on a per-object basis with ACLs, or on a per-bucket basis with Cloud IAM
Permissions.

Please note that this authentication mechanism suffers from the confused deputy problem — a third party that can guess your Cloud Storage URLs will be able to submit tasks with your data.

Azure Blob Storage Access

If you use Azure Blob Storage, you can grant access to your Blob Storage resources by completing the Azure access delegation process. Scale has registered the Scale AI application as an Azure multi-tenant application that can access resources in your Azure subscription on your behalf.

After completing the access delegation process, blob storage resource URIs (i.e. of the form https://{storageaccount}.blob.core.windows.net/{container}/{key}) will be fetched using the Scale AI service principal, and you will be able to submit blob URIs to the API that are not publicly accessible.

The process involves the following steps:

Consenting to grant Scale AI the permissions it requires to access resources in your subscription.
Assigning the Scale AI app an appropriate role.

Application Consent

As an administrator or manager of your Scale AI team, go to the integrations tab in the settings page, click the Connect to Azure button. Azure displays the resource permissions requested by the application.

Click Accept to allow Azure to grant permission to Scale AI to access resources in your subscription. You will still need to grant the application a role to access Blob Storage data. Note that after providing application consent, the Scale AI app will stop using anonymous credentials to fetch attachments sent in by your team.

Role-Based Access

As part of the access delegation process, you must assign a role to the Scale AI application service principal to read data from your storage accounts. We recommend assigning the Storage Blob Data Reader role for the particular storage accounts or containers to retrieve data from. Alternatively, you can create a custom role that provides only the minimum permissions necessary. See the Azure
docs for instructions on how to assign the role.

Disconnecting from Azure

To stop the Scale AI service principal from authenticating via Azure AD to access your user's storage accounts, use the Unlink from Azure button in the integrations tab in the settings page. Note that this does not revoke permissions from the Scale AI service principal in Azure, nor does it uninstall the Scale AI app from your subscription; those must done using the Azure portal or the Azure CLI.

IP Whitelisting

For non-AWS customers*, Scale uses a consistent set of IP addresses to fetch data and send callbacks, allowing for IP whitelisting of attachments sent to us, as well as for callback endpoints, to increase data security.

If you are enabling IP whitelisting, we request that you whitelist access to your data to all 5 listed IP addresses below, and we will only fetch content using these IP addresses. In this way, you can secure your content from the public while still allowing Scale to access it.

Scale static IP addresses

52.38.24.56
35.160.30.43
35.167.66.86
52.11.250.38
54.203.55.239

*If you are using AWS S3, do not use IP whitelisting, use S3 IAM Access instead. Requests to S3 will not necessarily originate from our static IPs.

Updated 5 years ago