
Amazon SageMaker Unified Studio (preview) provides a unified experience for using data, analytics, and AI capabilities. You can use familiar AWS services for model development, generative AI, data processing, and analytics—all within a single, governed environment. Users can now build, deploy, and execute end-to-end workflows from a single interface. SageMaker Unified Studio is built on the foundations of Amazon DataZone, where it uses domains to categorize and structure the data assets, while offering project-based collaboration features that allow teams to securely share artifacts and work together across various compute services. This experience allows multiple personas to seamlessly collaborate, while operating under appropriate access controls and governance policies.
In this post, we focus on the admin persona and deep dive into the foundational building blocks while implementing the self-service access to all your data.
Conceptual framework
SageMaker Unified Studio offers an integrated development experience organized into three distinct planes, each serving different personas and purposes within the development lifecycle. This architecture enables seamless collaboration while maintaining clear boundaries of responsibility.
As shown in the following figure, each plane represents a distinct layer of functionality that works in harmony with the others to create a complete data and machine learning (ML) solution.
The planes are as follows:
- Infrastructure plane – The infrastructure plane forms the foundation of SageMaker Unified Studio. Here administrators and domain owners of the organization provision the underlying infrastructure and define rules for users of the data factory plane to deploy the compute resources for data and ML operations in self-service mode. They can also decide to onboard existing resources or pre-create them. They can set up access controls and permissions to enforce and allocate resources to different teams and projects. This layer makes sure that all necessary computational resources are available and properly governed for downstream computation.
- Data factory plane – The data factory plane functions like a sophisticated vending machine for compute resources, where data scientists and ML engineers can select and utilize preconfigured compute resources or deploy new ones. The data product developers, data engineers, and data scientists can create collaboration spaces and build data products by consuming infrastructure resources, with all the underlying complexity abstracted away.
- Product experience plane – At the outermost layer, the product experience plane serves as a discovery and collaboration hub where business units (data producers and data consumers) can explore available data products from the asset catalog. This plane drives users to engage in data-driven conversations with knowledge and insights shared across the organization. Through the product experience plane, data product owners can use automated workflows to capture data lineage and data quality metrics and oversee access controls. They can track how their data products are being used and continuously improve the value proposition of their data assets.
In this post, we focus on the infrastructure plane deployment steps from an administrator’s perspective, outlining key responsibilities and actions required and how to configure and organize your assets under specific business units and teams and authorize policies during the initial setup phase.
Roles and responsibilities of the domain owner (admin) for the infrastructure plane
As shown in the following figure, the infrastructure plane revolves around three pivotal operational paradigms: onboard, organize, and authorize.
The details of the three essential functions in the foundational layer are as follows:
- Onboard – The domain owner establishes a foundational environment by creating a domain, which represents an organization entity for you to connect together your assets, users, resources, and code repository configs. They can onboard the users who have authorization to access the self-serve unified studio. The self-serve unified studio is a browser-based web application where you can analyze, discover, catalog, govern, and share data in self-serve manner. The admin can enable the necessary blueprints and create project profiles to set up the underlying data infrastructure. In a multi-account (Mesh) scenario, the admin can also onboard the business units by associating the AWS accounts.
- Organize – Here the domain owner creates hierarchies to organize and isolate projects within individual business units. The method of creating hierarchical representation of business units or team-level organization is through domain units. This makes sure that each business unit takes ownership of their assets. The admin can also delegate ownership within these business units.
- Authorize – The admin or owners of individual business units or line of business (domain unit owners) can manage user policies—project-specific policies that dictate certain actions these principals can perform under a domain unit.
Now that we have discussed the core functions, let’s delve into the workflow that brings these concepts together.
Process workflow (infrastructure plane)
In the following figure, we break down the roles and responsibilities of domain owners to unit administrators through a sequence of operations, providing infrastructure deployment and management.
The workflow consists of the following steps:
- The root domain owner (admin) creates a SageMaker Unified Studio domain from the console. After the domain is created, you get a SageMaker Unified Studio URL—a browser-based web application that can authenticate you with your AWS Identity and Access Management (IAM) user credentials or with credentials from your identity provider (IdP) through AWS IAM Identity Center or with your SAML credentials.
- As part of the onboarding process, the admin onboards single sign-on (SSO) users, SSO groups, and IAM users who are authorized to log in to SageMaker Unified Studio. IAM roles can be onboarded on the domain as well, but can be used for programmatic access only. During the quick setup deployment of the domain, default project profile templates are created. A project profile is a collection of blueprints that holds configurations of AWS tools and services. You can create following project profiles:
- Generative AI application development – Provides you with the tooling capabilities to build generative AI applications using Amazon Bedrock foundation models (FMs) and tools.
- SQL analytics – Provides you with a SQL editor to query the data in Amazon SageMaker Lakehouse, Amazon Redshift, and Amazon Athena.
- Data analytics and AI-ML model development – Provides you tools to build and orchestrate ML and generative AI models powered by AWS Glue, Athena, Amazon Managed Workflows for Apache Airflow (Amazon MWAA), Amazon SageMaker AI, and SageMaker Lakehouse.
- Custom project profile – Provides capabilities to build custom templates that can bundle multiple blueprints with varied tooling capabilities to suit your business needs.
Admins can also authorize project profile templates to specific users and groups, enforcing the capability to control resource deployment based on user personas. By default, all users are authorized to use default project profiles. However, this can be changed by the admin to limit the access of certain project profiles to certain users and groups.
The quick setup also establishes a default Git connection to AWS CodeCommit for users to manage their code repository. However, you also have the option to create and enable new Git connections to GitHub, GitHub Enterprise Server, GitLab, and GitLab self-managed. The Free Tier release of Amazon Q is enabled by default to all users of SageMaker Unified Studio domain. Amazon Q Developer Pro can be configured if IAM Identity Center is configured for users of the domain.
Finally, as part of the initial setup, the admin provides access to Amazon Bedrock serverless models.
In a multi-account scenario, the central admin associates AWS accounts, and the associated account admins accept the association and enable the blueprints for the project profiles that the central admin would create. Refer to the appendix at the end of this post for more details.
- To organize the data assets within the organization, the admin logs in to the SageMaker Unified Studio URL and creates domain units aligned with the business divisions.
- Each domain unit receives delegated ownership, enabling autonomous management of assets within their designated scope. This domain-based isolation provides clear boundaries while allowing unit owners to independently govern their assets and enforce relevant policies.
Steps 3 and 4 are optional as part of the quick deployment setup. Users can directly log in to SageMaker Unified Studio to build data products for their business use case if domain units are not part of immediate requirement. If no domain units are created, all users and groups fall back under the root domain level and authorization policies are applied on the root domain.
Behind the scenes
While users interact with a streamlined project creation interface in SageMaker Unified Studio, a sophisticated orchestration of components operates beneath the surface. This abstraction allows the admin to deploy infrastructure through simple selections while the system handles resource provisioning automatically. Let’s examine the underlying process behind the scenes, as illustrated in the following figure.
This workflow consists of the following steps:
- Administrators enable the blueprints containing the AWS CloudFormation templates that have information on how to create and set up the underlying data infrastructure. These blueprints are automatically enabled during the quick setup deployment.
- Project profiles bundle these blueprint configurations into templates. These templates determine which infrastructure components deploy when a project is created.
- When users select a project profile within SageMaker Unified Studio, the system automatically triggers the relevant CloudFormation stack and deploys the necessary infrastructure resources in the form of environments. Environments are the actual data infrastructure behind a project.
In a multi-account scenario, the associated account admin enables the blueprints. However, the project profile creation happens at the root domain account. The project profile template will include the associated account details and the linked blueprints from the associated account. Refer to the appendix at the end of this post for more details.
Now that we have understood the functional building blocks of SageMaker Unified Studio, let’s proceed with the deployment walkthrough. We will create a domain using the quick setup deployment for single account. Refer to the appendix for multi-account deployment steps.
Prerequisites
You will need to complete the following prerequisites before you can follow the instructions in the next section:
- Sign up for an AWS account.
- Create a user with administrative access.
- Enable IAM Identity Center in the same AWS Region you want to create your SageMaker Unified Studio domain. Confirm in which Region SageMaker Unified Studio is currently available. Set up your IdP and synchronize identities and groups with IAM Identity Center. For more information, refer to IAM Identity Center Identity source tutorials.
- To use Amazon Bedrock FMs, grant access to base models.
Set up domain
Complete the following steps to create a new SageMaker Unified Studio domain:
- Sign in to the SageMaker console in the Region in which IAM Identity Center is enabled.
- Choose Create a Unified Studio domain.
- Select the Quick setup (recommended for exploration).
- Choose Create VPC (you can also use your own VPC but to simplify the cleanup, we opted to use a new VPC).
This will open a new tab to deploy the CloudFormation stack to create the VPC and the necessary private and public subnets.
- For Stack name, enter a unique name to the stack (if the default name already exists).
- Keep the parameter for useVpcEndpoints as false.
- Choose Create stack.
- After the stack is created, go to the domain creation page and refresh the page, as shown in the following screenshot.
- For Name, enter a unique name for the domain.
- Keep the default selections for Domain Execution role, Domain Service role, Provisioning role, and Manage Access role.
- The configuration automatically selects the VPC and private subnets.
- Keep the default selection for Model provisioning role and Model consumption role.
- Choose Continue.
- Provide the email address of the SSO user that exists in IAM Identity Center.
The SSO user selected here is used as the administrator in SageMaker Unified Studio. If the account doesn’t have IAM Identity Center set up, then it will create an IAM Identity Center account instance, so long as the account is permitted to do so. An SSO or IAM user is required so that a user is able to log in to the studio after the domain is created.
- Choose Create domain.
- After the domain is created, a dialog box pops up. You can close dialog box to set up authorization policies and onboard users.
On the domain detail page, the Amazon SageMaker Unified Studio URL is listed. You can authenticate with your IAM user credentials or with credentials from your IdP through IAM Identity Center or with your SAML credentials. To authorize users to log in to the URL, the administrator must onboard the users to the domain. We see this as part of the next steps.
Onboard users and associated accounts
Complete the following steps:
- To onboard users, go to the User management tab and choose Add.
- On the Add menu, choose either Add SSO users and groups or Add IAM users.
You can also add IAM roles for the purpose of managing the domain programmatically. However, you can’t use IAM roles to log in to the SageMaker Unified Studio URL. After you add the users, they will appear with the status Assigned. The status changes to Activated only when the user logs in to the SageMaker Unified Studio URL.
- If you want to onboard multiple AWS accounts to your domain account, go to the Account associations tab and choose Request association.
This enables domain users to publish and consume data from these AWS accounts.
For a multi-account setup, by sending an association request to another AWS account, you share the root domain with the other AWS account with AWS Resource Access Manger (AWS RAM). The associated admin domain owner accepts the invitation. To access the compute resources of the associated accounts from SageMaker Unified Studio, the associated domain owner must enable the necessary blueprints. Refer to the appendix to understand the cross-account deployment steps.
Project profiles and authorizing users
For the quick setup deployment, when you navigate to the Blueprints tab, you will notice all the blueprints are automatically enabled. Also, on the Project profiles tab, you will find default project profiles are available to the user.
Leave the rest of the tabs with the default options.
Create a custom project profile and authorize users (optional)
In the following example, we show the steps to create a custom project profile by bundling selected blueprints. We also show the steps to authorize only restricted users to use this project profile template. This example creates a custom project profile with selective blueprints. This enables the user to create a data lake environment with AWS Glue database and Athena workgroup to query the data. The user can also create an Amazon MWAA environment for orchestration. You can also change or override the configuration parameters of the blueprint by using the Tooling configurations option within the project profile.
Because SageMaker Unified Studio is in preview mode, the naming conventions of some visual elements might appear different in the current version.
When you create a project profile, you can add blueprint deployment settings in two modes: on create and on demand. On create mode allows you to deploy the blueprint deployment settings as soon as the project is created. On demand mode allows you to deploy the blueprint deployment settings when users need it.
Create a project, create domain units, and delegate ownership (optional)
In the following example, the administrator logs in to SageMaker Unified Studio and creates the retail
domain unit. The admin also delegates ownership to the retail business user. The retail business user logs in to SageMaker Unified Studio and creates a project with the authorized project profile template.
With these configurations in place, you have successfully completed the initial infrastructure plane deployment from an administrative perspective.
Authorization of blueprints (optional)
By default, all domain users have authorization to create projects with the enabled blueprints across domain units. If you want to restrict the usage of the blueprint within a specific domain unit (in this case, the retail
domain unit, as shown in the following screenshot), you need to revoke the existing permissions and authorize the specific domain units. By limiting the use of blueprints to a particular domain unit, users can only create projects using the blueprint within that domain unit. To apply authorization settings to child domain units, enable the Cascade to all child domain units option.
Clean up
Make sure you remove the SageMaker Unified Studio resources to mitigate any unexpected costs. This involves a few steps:
- If you had multiple projects and subscribed to assets, unsubscribe to all assets.
- Note the names of all AWS Glue databases and Athena workgroups created by your projects.
- Delete any connections you created in the data explorer that you don’t want to keep.
- Note the project IDs.
- Delete the projects. If you encounter any errors, check the AWS CloudFormation console and find the failed stack. Fix the error that failed the stack deletion and delete the projects.
- Note down the domain ID.
- Delete the domain.
- Delete the S3 bucket named
amazon-datazone-AWSACCOUNTID-AWSREGION-DOMAINID
. - Delete the AWS Glue databases and Athena workgroups you noted earlier.
- Delete the CloudFormation stack for the VPC (if you followed that step in the setup).
If you have additional resources that haven’t been deleted, you can also use tags to identify and delete specific resources.
Conclusion
In this post, we discussed the foundational building blocks of SageMaker Unified Studio and how, by abstracting complex technical implementations behind user-friendly interfaces, organizations can maintain standardized governance while enabling efficient resource management across business units. This approach provides consistency in infrastructure deployment while providing the flexibility needed for diverse business requirements.
To learn more, refer to the Amazon SageMaker Unified Studio Administrator Guide and the following resources:
Appendix: Multi-account administration
This section illustrates the cross-account association. After the account invitation is accepted by the associated account owner, follow the instructions as shown in the following example to understand how to enable the blueprints. After the blueprints are enabled in the associate accounts, the root domain account can create project profile templates with the parameters of the associated account, including its linked blueprints. The example then demonstrates how the retail domain unit user can deploy compute resources and create data using the resources from the associated account.
About the Authors
Lakshmi Nair is a Senior Analytics Specialist Solutions Architect at AWS. She specializes in designing advanced analytics systems across industries. She focuses on crafting cloud-based data platforms, enabling real-time streaming, big data processing, and robust data governance. She can be reached via LinkedIn.
Fabrizio Napolitano is a Principal Specialist Solutions Architect for DB and Analytics. He has worked in the analytics space for the last 20 years, and has recently and quite by surprise become a Hockey Dad after moving to Canada.