We recommend using AWS S3, Google Cloud, or Azure to host your data.
If you use AWS S3 to store data, if you submit tasks with attachments as s3:
protocol URIs, rather than http:
or https:
, we will use the S3 API to fetch your data. For example, instead of sending https://s3-us-west-2.amazonaws.com/bucket/key
, you would send s3://bucket/key
.
We can either fetch your data using IAM Delegated Access (preferred, more secure) or Cross-account Access.
To access S3 data in your AWS account, Scale can assume a role in your account, which has permission to access data in your S3 buckets. This role must be named ScaleAI-Integration
.
To set up IAM Delegated Access:
Another AWS account
for the Role Type.307185671274
(Scale's Account ID) as the Account ID.Require external ID
, and enter the external ID displayed in the AWS section of the Integrations Settings page.Require MFA
.ScaleAI-Integration
.Sample Role Policy for IAM Delegated Access
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "scale-s3-access",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
}
]
}
Note that if you enable the AWS integration for your account, we will not attempt to fetch attachments from our account (307185671274
) directly; the policies described in Cross-account Access will not work.
If IAM delegated access is not configured, we will directly fetch attachments from your S3 bucket, using AWS account ID 307185671274
(canonical ID ae2259599e139df6cedb60b6300bcafa1c652aff129aa3d887477b6d4abf2e47
), which you can grant access to on a per-object basis using ACLs or using bucket policies.
For most customers, we recommend setting a Bucket Policy that shares the bucket's contents with Scale's account.
A sample Bucket Policy below - please be sure to replace YOUR_BUCKET_NAME
with the name of your bucket, leaving the /*
as shown or replacing it with a more specific bucket path to further restrict access.
Please note that if using Access Control Lists (ACLs), each object must have its ACL individually updated to grant read access to our account, as Bucket ACLs cannot grant read permissions to the objects inside.
Sample Bucket Policy for Cross-account Access
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "scale-s3-access",
"Action": [
"s3:GetObject"
],
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::307185671274:root"
]
},
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
}
]
}
Please note that this authentication mechanism suffers from the confused deputy problem — a third party that can guess your S3 URLs will be able to submit tasks with your data.
If you use Google Cloud Storage to store data, if you submit tasks with attachments as gs:
protocol URIs, rather than http:
or https:
, we will use the Google Cloud Storage API to fetch your data. For example, instead of sending https://storage.googleapis.com/bucket/key
, you would send gs://bucket/key
.
We can either fetch your data using Service Account Impersonation (preferred, more secure) or Cross-project Access.
To access Cloud Storage data in your GCP project, Scale can impersonate a service account within that project, which has permission to access data in Cloud Storage.
To set up Service Account Impersonation:
scaleai-integrations-{uid}
.Add Member
, you may need to click "Show Info Panel" in the top right to see this option.backend-bucket-access@attachment-storage-243718.iam.gserviceaccount.com
as the member, and Service Account Token Creator
as the role.Storage Object Data Reader
permissions for the requisite buckets to the newly created service account.Note that if you enable the GCP integration for your account, we will not attempt to fetch attachments from the default service account ( backend-bucket-access@attachment-storage-243718.iam.gserviceaccount.com
) directly; the policies described in GCP IAM Access will not work.
If Service Account Impersonation is not configured, we will directly fetch attachments from your GCS bucket, using the GCP service account backend-bucket-access@attachment-storage-243718.iam.gserviceaccount.com
. You can grant access to this service account on a per-object basis with ACLs, or on a per-bucket basis with Cloud IAM
Permissions.
Please note that this authentication mechanism suffers from the confused deputy problem — a third party that can guess your Cloud Storage URLs will be able to submit tasks with your data.
If you use Azure Blob Storage, you can grant access to your Blob Storage resources by completing the Azure access delegation process. Scale has registered the Scale AI application as an Azure multi-tenant application that can access resources in your Azure subscription on your behalf.
After completing the access delegation process, blob storage resource URIs (i.e. of the form https://{storageaccount}.blob.core.windows.net/{container}/{key}
) will be fetched using the Scale AI service principal, and you will be able to submit blob URIs to the API that are not publicly accessible.
The process involves the following steps:
As an administrator or manager of your Scale AI team, go to the integrations tab in the settings page, click the Connect to Azure button. Azure displays the resource permissions requested by the application.
Click Accept to allow Azure to grant permission to Scale AI to access resources in your subscription. You will still need to grant the application a role to access Blob Storage data. Note that after providing application consent, the Scale AI app will stop using anonymous credentials to fetch attachments sent in by your team.
As part of the access delegation process, you must assign a role to the Scale AI
application service principal to read data from your storage accounts. We recommend assigning the Storage Blob Data Reader role for the particular storage accounts or containers to retrieve data from. Alternatively, you can create a custom role that provides only the minimum permissions necessary. See the Azure
docs for instructions on how to assign the role.
To stop the Scale AI service principal from authenticating via Azure AD to access your user's storage accounts, use the Unlink from Azure button in the integrations tab in the settings page. Note that this does not revoke permissions from the Scale AI service principal in Azure, nor does it uninstall the Scale AI app from your subscription; those must done using the Azure portal or the Azure CLI.
For non-AWS customers*, Scale uses a consistent set of IP addresses to fetch data and send callbacks, allowing for IP whitelisting of attachments sent to us, as well as for callback endpoints, to increase data security.
If you are enabling IP whitelisting, we request that you whitelist access to your data to all 5 listed IP addresses below, and we will only fetch content using these IP addresses. In this way, you can secure your content from the public while still allowing Scale to access it.
Scale static IP addresses
52.38.24.56
35.160.30.43
35.167.66.86
52.11.250.38
54.203.55.239
*If you are using AWS S3, do not use IP whitelisting, use S3 IAM Access instead. Requests to S3 will not necessarily originate from our static IPs.