AWS Clean Rooms FAQs

General

AWS Clean Rooms makes it easier for you and your partners to analyze and collaborate on your collective datasets to gain new insights without revealing underlying data to one another. You can create your own clean rooms in minutes and start analyzing your collective datasets with your partners with just a few steps. With AWS Clean Rooms, you can easily collaborate with hundreds of thousands of companies already using AWS without needing to move data out of AWS or load it into another platform.

AWS Clean Rooms collaborations are secure logical boundaries that allow collaboration members to run SQL queries and perform ML modeling without sharing raw data with their partners. Only companies who have been invited to the collaboration can join the collaboration. Multiple participants can contribute data to a collaboration, and one member can receive results. Only companies who have been invited can join a Clean Rooms collaboration.

From the AWS Management Console, you can choose what type of analysis you want to perform, the partners you want to collaborate with, and what datasets you would like to contribute to a collaboration. With AWS Clean Rooms you can perform two types of analyses, SQL queries and machine learning.

When you run SQL or Spark SQL queries, AWS Clean Rooms reads data where it lives and applies built-in, flexible analysis rules to help you maintain control over your data. AWS Clean Rooms provides a broad set of privacy-enhancing SQL controls—including query controls, query output restrictions, and query logging—that allow you to customize restrictions on the queries run by each clean room participant. You can use the Spark analytics engine to run queries using the Spark SQL dialect in AWS Clean Rooms collaborations. AWS Clean Rooms Spark SQL offers configurable compute sizes to provide enhanced flexibility to customize and allocate resources to run SQL queries based your performance, scale, and cost requirements. AWS Clean Rooms Spark SQL is only available for the custom analysis rule. AWS Clean Rooms Differential Privacy helps you protect the privacy of your users with mathematically backed and intuitive controls in a few clicks. With SQL analytics engine, you can use AWS Clean Rooms Differential Privacy by selecting a SQL custom analysis rule and then configure your desired differential privacy parameters. And, Cryptographic Computing for Clean Rooms (C3R) helps you keep sensitive data encrypted during your SQL analyses when using Spark analytics engine or SQL analytics engine to run your queries. To apply AWS Clean Rooms Differential Privacy or use aggregation or list analysis rules in a collaboration you must use SQL as the analytics engine.

AWS Clean Rooms ML helps you and your partners apply privacy-enhancing machine learning (ML) to generate predictive insights without having to share raw data with each other. AWS Clean Rooms ML supports custom and lookalike machine learning (ML) modeling. With custom modeling, you can bring a custom model for training and run inference on collective datasets, without sharing underlying data or intellectual property among collaborators. With lookalike modeling, you can use an AWS-authored model to generate an expanded set of similar profiles based on a small sample of profiles that your partners bring to a collaboration.

AWS Clean Rooms ML lookalike modeling, using an AWS-authored model, was built and tested across a wide variety of datasets such as e-commerce and streaming video, and can help customers improve accuracy on lookalike modeling by up to 36%, when compared with representative industry baselines. In real-world applications such as prospecting for new customers, this accuracy improvement can translate into savings of million dollars.

Using the AWS Management Console or API operations, you will create a clean room collaboration, invite the companies you want to collaborate with, and select the abilities that each participant has within the collaboration. Participants can then set up rules for how structured data can be queried and train ML models on their data. Datasets are not copied from participants' accounts and are only accessed when needed. With AWS Clean Rooms, you can choose what type of analysis you want to perform: SQL queries and ML modeling using AWS Clean Rooms ML. When using SQL queries, you also use additional capabilities such as no-code analysis builder, AWS Clean Rooms Differential Privacy, and cryptographic computing. Once collaboration participants have associated data or models to a collaboration and analyses have run, the collaboration outputs will be stored in a designated Amazon Simple Storage Service (Amazon S3) bucket.

AWS Clean Rooms supports up to five participants per collaboration.

You control who can participate in your AWS Clean Rooms collaboration, and you can create a collaboration or join an invitation to collaborate. Participation is transparent to each party in a collaboration, and new accounts cannot be added after the collaboration is created. However, you can set up new collaborations with different customers or partners if needed. You establish and manage access to your content, and you also set access to AWS services and resources through users, groups, permissions, and credentials that you control.

Customers can generate insights using SQL or AWS Clean Rooms ML modeling on their collective datasets with their partners—without sharing or revealing underlying data.

With SQL, multiple collaborators can contribute data, but only one collaborator can run SQL queries and only one can receive the results. When joining a collaboration, collaborators agree on which party will run the queries, which party will receive the results, and which party will be responsible for the compute charges. Only those who you invite to that collaboration can gain insights based on the analysis rules you establish. When you set up an AWS Clean Rooms collaboration, you can specify different abilities for each collaboration member to suit your specific use cases. For example, if you want the query output to go to a different member, you can designate one member as the query runner who can write queries and another member as the query result receiver who can receive the results. This gives the collaboration creator the ability to make sure that the member who can query doesn't have access to the query results.

With AWS Clean Rooms ML, a collaborator brings the sample set of records based on which they want to find lookalike segments from their partner; the other party has the larger population from which we generate lookalike segments based on their similarity to the sample records. AWS Clean Rooms ML will send the output lookalike segments to a destination that is specified by the party bringing the larger population from which we derive the lookalike segments.

AWS Entity Resolution is natively integrated in AWS Clean Rooms. You can use rule-based or data service provider-based matching to prepare, match, and link your user data with your partner’s data using any common key you choose to use (such as pseudonymized identifiers), inside a privacy-enhanced AWS Clean Rooms collaboration.

AWS Clean Rooms is available in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (London), and Europe (Stockholm).

With AWS Clean Rooms, you can use flexible SQL analysis rules and privacy-enhancing ML to meet your business needs. When you use SQL analysis, you can flexibly choose which collaborator pays for the compute capacity of the SQL queries run in a collaboration, in clean rooms processing unit (CRPU)–hours on a per-second basis (with a 60-second minimum charge). When you use AWS Clean Rooms ML, you only pay for the model trainings you request, and for the lookalike segments created, on a price-per-1,000-profiles basis. For more information, see AWS Clean Rooms pricing.

With AWS Entity Resolution on AWS Clean Rooms, you can use rule-based or data service provider-based matching leveraging provider data sets (such as LiveRamp).

When you use rule-based matching, at least one member in a collaboration is required to prepare their data prior to matching with their partners' data sets, unless they have already prepared their data using AWS Entity Resolution prior to creating or joining the collaboration. This member will pay for data preparation only if used. Any member participating in a collaboration can pay for data matching. Data matching also requires a one-time fee per collaboration, and this fee is assigned to any collaborator paying for data matching.

When you use data service provider-based matching, all collaboration members are required to have a provider subscription in place in order to prepare their data using provider IDs. All collaboration members are required to prepare their data using provider IDs prior to matching with their partners' data sets, unless they have already prepared their data using AWS Entity Resolution prior to creating or joining the collaboration. Any member participating in a collaboration can pay for data matching using providers' IDs. Additionally, the member who pays for data matching is required to have a provider subscription in place. You can use the public subscriptions listed on AWS Data Exchange (ADX), or purchase a private subscription directly with the data service provider of your choice, and then use Bring Your Own Subscription (BYOS) to ADX. 

For more information, see AWS Entity Resolution on AWS Clean Rooms pricing.

AWS Clean Rooms ML

AWS Clean Rooms ML helps you and your partners apply privacy-enhancing machine learning (ML) to generate predictive insights without having to share raw data with each other. AWS Clean Rooms ML supports custom and lookalike machine learning (ML) modeling. With custom modeling, you can bring a custom model for training and run inference on collective datasets, without sharing underlying data or intellectual property among collaborators. With lookalike modeling, you can use an AWS-authored model to generate an expanded set of similar profiles based on a small sample of profiles that your partners bring to a collaboration.

AWS Clean Rooms ML helps customers with multiple use cases. For example, advertisers can bring their proprietary model and data into a Clean Rooms collaboration, and invite publishers to join their data to train and deploy a custom ML model that helps them increase campaign effectiveness; financial institutions can use historical transaction records to train a custom ML model, and invite partners into a Clean Rooms collaboration to detect potentially fraudulent transactions; research institutions and hospital networks can find candidates that are similar to existing clinical trial participants to help accelerate clinical studies; and brands and publishers can model lookalike segments of in-market customers and deliver highly-relevant advertising experiences, without either company sharing their underlying data with the other.

With AWS Clean Rooms ML custom modeling, you can bring your own machine learning (ML) models, algorithms, and data into a collaboration with your partners to train ML models and run inference on collective datasets without having to share sensitive data or proprietary ML models.

AWS Clean Rooms ML custom modeling supports ML training and ML inference workflows. For both workflows, you start by defining an AWS Clean Rooms Spark SQL query which is used to generate a dataset for the training or inference step. The intermediate dataset is kept within the clean room collaboration and can only be used for approved AWS Clean Rooms ML tasks. The second step is the ML model training or inference. ML models and code are packaged in a container image. A trained model can be retained in the collaboration and used as part of an inference workflow. With AWS Clean Rooms ML, your data is only used to train your custom model, and your data is not shared among collaborators or used for AWS model training. You can remove your data from Clean Rooms ML or delete a custom model whenever you want, and you can apply privacy-enhancing controls to safeguard sensitive data that you bring to a collaboration. To apply AWS Clean Room ML custom modeling, you must use Spark SQL as the analytics engine.

With AWS Clean Rooms ML lookalike modeling, you can use an AWS-authored model to generate an expanded set of similar profiles based on a small sample of profiles that your partners bring to a collaboration while protecting you and your partner’s underlying data. You can invite your partners to a clean room and apply the AWS-authored model ML model which is trained to each collaboration to generate lookalike datasets in a few steps, saving months of development work to build, train, tune, and deploy your own model. AWS Clean Rooms ML lookalike modeling was built and tested across various datasets, such as e-commerce and streaming video, and can help customers improve accuracy on lookalike modeling by up to 36% when compared with representative industry baselines. In real-world applications like prospecting for new customers, this accuracy improvement can translate into savings of millions of dollars.

AWS Clean Rooms ML lookalike modeling takes a small sample of records from one party and finds a much larger set of records or lookalike segment from another collaborator's dataset. You can specify the desired size of the resulting lookalike segment, and AWS Clean Rooms ML will privately match the unique profiles in your sample list with those in your partner's dataset and then train an ML model that predicts how similar each profile in your collaborator's dataset is to those in your sample. AWS Clean Rooms ML will automatically group the profiles that are similar to the sample list and output the resulting lookalike segment. AWS Clean Rooms ML removes the need to share data to build, train, and deploy ML models with your partners. With AWS Clean Rooms ML, your data is only used to train your model and is not used for AWS model training. You can use intuitive controls that help you and your partners tune the model’s predictive results.

Security and data protection

Data protection starts with the security foundation of AWS, and AWS Clean Rooms is built on top of AWS security services including AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), and AWS CloudTrail. This allows you to extend your existing data protection strategy to data collaboration workloads. With AWS Clean Rooms, you no longer need to store or maintain a copy of your data outside your AWS environment and send to another party to conduct analysis for consumer insights, marketing measurement, forecasting, or risk assessment.

When you set up an AWS Clean Rooms collaboration and use SQL analysis, you can specify different abilities for each collaboration member to suit your specific use cases. For example, if you want the output of the query to go to a different member, you can designate one member as the query runner who can write queries and another member as the query result receiver who can receive the results. This gives the collaboration creator the ability to make sure that the member who can query doesn't have access to the query results.

AWS Clean Rooms also has SQL query controls that allow you to restrict the kind of queries or specific queries that can be run on your data tables through analysis rules configuration. AWS Clean Rooms supports three types of SQL analysis rules: aggregation, list, and custom. With the aggregation analysis rule, you can configure your table such that only queries that generate aggregate statistics are allowed (such as campaign measurement or attribution). With the list analysis rule, you can configure controls such that queries can only analyze the intersection of your datasets with that of the member who can query. With the custom analysis rule, you can configure query-level controls to allow specific accounts or queries to be run on your dataset. When using custom analysis rules, you can choose to use Differential Privacy. AWS Clean Rooms Differential Privacy helps you protect the privacy of your users with mathematically-backed and intuitive controls in a few clicks. As a fully managed capability of AWS Clean Rooms, no prior differential privacy experience is needed to help you prevent the re-identification of your users. Another control is aggregation thresholds, which prevent queries from drilling down to small, potentially re-identifiable, groups.

With AWS Clean Rooms ML, your data is only used to train your model, and is not used for AWS model training. AWS Clean Rooms ML does not use any company's training or lookalike segment data with another, and you can delete your model and training data whenever you want.

No. Datasets are stored in collaborators AWS accounts. AWS Clean Rooms temporarily reads data from collaborators accounts to run queries, match records, train ML models or expand seed segments. Results of an analysis are sent to the S3 location designed for the analysis.

AWS Entity Resolution on AWS Clean Rooms generates a dataset that maps among each party’s identifiers in a collaboration. The mapping dataset is managed by AWS Clean Rooms. No members in the collaboration can view or download the mapping table. If all members in the collaboration agree to relax this privacy enforcement, the mapping table can be queried for particular use cases. Either party can delete the table at any point.

Models generated by AWS Clean Rooms ML are stored by the service, can be encrypted with a customer managed AWS KMS key, and can be deleted by the customer at any point.

AWS Clean Rooms encryption and analysis rules allow you to have granular control on the type of information you want to share. As a data collaborator, you are responsible for assessing the risk of each collaboration, including the risk of reidentification, and conducting your own additional due diligence to ensure compliance with any data privacy laws. If the data you are sharing is sensitive or regulated, we recommend you also use appropriate legal agreements and audit mechanisms to further reduce privacy risks.

Yes. The AWS Service Terms prohibit certain use cases for collaborations in AWS Clean Rooms.

Yes, the AWS HIPAA compliance program includes AWS Clean Rooms as a HIPAA eligible Service. If you have an executed Business Associate Agreement (BAA) with AWS, you can now use AWS Clean Rooms to create HIPAA-compliant collaborations. If you don't have a BAA or have other questions about using AWS for your HIPAA-compliant applications, contact us for more information.

To learn more, see the following resources:

AWS HIPAA Compliance page

AWS Cloud Computing in Healthcare page

SQL analyses

You can choose to use the Spark analytics engine to run queries using the Spark SQL dialect in AWS Clean Rooms collaborations. AWS Clean Rooms Spark SQL offers configurable compute sizes to provide more control over price performance when running SQL workloads. To apply AWS Clean Rooms Differential Privacy or use aggregation or list analysis rules in a collaboration you must use SQL as the analytics engine.

AWS Clean Rooms Spark SQL uses the default instance type CR.1X, which provides 4 vCPUs, 30 GB memory, and 100 GB storage. You can choose to allocate more resources to run your Spark SQL workloads by selecting the larger CR.4X instance type, which provides 16 vCPUs, 120 GB memory, and 400 GB storage. Larger instance sizes can benefit SQL workloads that process large volumes of data and perform complex analytics, which helps distribute the workloads across a higher number of resources. Learn more about the associated vCPU, memory, and storage for each configuration here.

In the SQL analysis rules, you configure column-level controls that help you define how each column can be used in queries. For example, you can specify which columns can be used to calculate aggregate statistics—such as SUM(price)—and which columns can be used to join your table with other collaboration members. In the Aggregation analysis rule, you can also define a minimum aggregation threshold that each output row must meet. Rows that do not meet the minimum threshold are automatically filtered out by AWS Clean Rooms.

Yes. You will be able to configure AWS Clean Rooms to publish query logs in Amazon CloudWatch Logs. With the custom analysis rule, you can also review queries (stored in analysis templates) before they run in the collaboration. 

AWS Clean Rooms Differential Privacy

Differential privacy is a mathematically proven framework for helping data privacy protection. The primary benefit behind differential privacy is to help protect data at the individual level by adding a controlled amount of randomness—noise—to obscure the presence or absence of any single individual in a dataset that is being analyzed.

AWS Clean Rooms Differential Privacy helps you protect the privacy of your users with mathematically backed and intuitive controls in a few steps. As a fully managed capability of AWS Clean Rooms, no prior differential privacy experience is needed to help you prevent the re-identification of your users. AWS Clean Rooms Differential Privacy obfuscates the contribution of any individual’s data in generating aggregate insights in collaborations so that you can run a broad range of SQL queries to generate insights about advertising campaigns, investment decisions, clinical research, and more.

You can begin using AWS Clean Rooms Differential Privacy in just a few steps after starting or joining an AWS Clean Rooms collaboration as a member with abilities to contribute data. After you have created a configured table, which is a reference to your table in the AWS Glue Data Catalog, you simply choose to turn on differential privacy while adding a custom analysis rule to the configured table when using SQL analytics engine. Next, you associate the configured table to your AWS Clean Rooms collaboration and configure a differential privacy policy in the collaboration to make your table available for querying. You can use a default policy to quickly complete the setup or customize it to meet your specific requirements. To apply AWS Clean Rooms Differential Privacy in a collaboration you must use SQL as the analytics engine.

Once AWS Clean Rooms Differential Privacy is set up, your collaboration partner can start running queries on your table—without requiring any expertise in differential privacy concepts or additional setup from their partners. With AWS Clean Rooms Differential Privacy, query runners can run custom and flexible analyses including complex query patterns with common table expressions (CTEs) and commonly used aggregate functions such as COUNT and SUM.

Cryptographic computing

Cryptographic computing is a method of protecting and encrypting sensitive data while it is in use. Data can be encrypted at rest when it is stored, in motion when it is transmitted, and when it is in use. Encryption means converting plaintext data to encoded data that cannot be deciphered without a specific "key." Private set intersection (PSI) is a type of cryptographic computing that allows two or more parties holding datasets to compare encrypted versions to perform computation. The encryption occurs on premises with the shared collaborator's secret key. C3R is available for both Spark SQL analytics engine or SQL analytics engine.

AWS Clean Rooms includes Cryptographic Computing for Clean Rooms (C3R), which provides the option to pre-encrypt data using a client-side encryption tool—an SDK or command line interface (CLI)—that uses a shared secret key with other participants in an AWS Clean Rooms collaboration. This encrypts data as queries are run.