Top analytics announcements of AWS re:Invent 2024

AWS re:Invent 2024, the flagship annual conference, took place December 2–6, 2024, in Las Vegas, bringing together thousands of cloud enthusiasts, innovators, and industry leaders from around the globe. This premier event showcased groundbreaking advancements, keynotes from AWS leadership, hands-on technical sessions, and exciting product launches.

Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. From enhancing data lakes to empowering AI-driven analytics, AWS unveiled new tools and services that are set to shape the future of data and analytics.

In this post, we walk you through the top analytics announcements from re:Invent 2024 and explore how these innovations can help you unlock the full potential of your data.

Amazon SageMaker

Introducing the next generation of Amazon SageMaker

AWS announces the next generation of Amazon SageMaker, a unified platform for data, analytics, and AI. This launch brings together widely adopted AWS machine learning (ML) and analytics capabilities and provides an integrated experience for analytics and AI with unified access to data and built-in governance.

The next generation of SageMaker also introduces new capabilities, including Amazon SageMaker Unified Studio (preview), Amazon SageMaker Lakehouse, and Amazon SageMaker Data and AI Governance. Amazon SageMaker Unified Studio brings together functionality and tools from the range of standalone studios, query editors, and visual tools available today in Amazon EMR, AWS Glue, Amazon Redshift, Amazon Bedrock, and the existing Amazon SageMaker Studio. Amazon SageMaker Lakehouse provides an open data architecture that reduces data silos and unifies data across Amazon Simple Storage Service (Amazon S3) data lakes, Redshift data warehouses, and third-party and federated data sources. Amazon SageMaker Data and AI Governance, including Amazon SageMaker Catalog built on Amazon DataZone, empowers you to securely discover, govern, and collaborate on data and AI workflows.

Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse

Amazon DynamoDB zero-ETL integration with SageMaker Lakehouse automates the extraction and loading of data from a DynamoDB table into SageMaker Lakehouse, an open and secure lakehouse. Using the no-code interface, you can maintain an up-to-date replica of your DynamoDB data in the data lake by quickly setting up your integration to handle the complete process of replicating data and updating records. This zero-ETL integration reduces the complexity and operational burden of data replication to let you focus on deriving insights from your data. You can create and manage integrations using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the SageMaker Lakehouse APIs.

Amazon S3 Tables

Amazon S3 Tables – Fully managed Apache Iceberg tables optimized for analytics workloads

Amazon S3 Tables deliver the first cloud object store with built-in Apache Iceberg support, and the most straightforward way to store tabular data at scale. S3 Tables are specifically optimized for analytics workloads, resulting in up to 3 times faster query throughput and up to 10 times higher transactions per second compared to self-managed tables. S3 Tables are designed to perform continual table maintenance to automatically optimize query efficiency and storage cost over time, even as your data lake scales and evolves. S3 Tables integration with the AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize data—including Amazon S3 Metadata tables—using AWS analytics services such as Amazon Data Firehose, Amazon Athena, Amazon Redshift, Amazon EMR, and Amazon QuickSight.

Amazon S3 Metadata (Preview) – Easiest and fastest way to manage your metadata

Amazon S3 Metadata is the simplest and fastest way to help you instantly discover and understand your S3 data with automated, queried metadata that updates in near real time. S3 Metadata supports object metadata, which includes system-defined details like size and the source of the object, and custom metadata, which allows you to use tags to annotate your objects with information like product SKU, transaction ID, or content rating, for example.

S3 Metadata is designed to automatically capture metadata from objects as they are uploaded into a bucket, and to make that metadata queryable in a read-only table. These metadata tables are stored in S3 Tables, the new S3 storage offering optimized for tabular data. Additionally, S3 Metadata integrates with Amazon Bedrock, allowing for the annotation of AI-generated videos with metadata that specifies its AI origin, creation timestamp, and the specific model used for its generation.

AWS Glue

Introducing AWS Glue 5.0

With AWS Glue 5.0, you get improved performance, enhanced security, support for SageMaker Unified Studio and SageMaker Lakehouse, and more. AWS Glue 5.0 enables you to develop, run, and scale your data integration workloads and get insights faster.

AWS Glue 5.0 upgrades the engines to Apache Spark 3.5.2, Python 3.11, and Java 17, with new performance and security improvements. It also updates open table format support to Apache Hudi 0.15.0, Apache Iceberg 1.6.1, and Delta Lake 3.2.0. AWS Glue 5.0 adds Spark native fine-grained access control with AWS Lake Formation so you can apply table-, column-, row-, and cell-level permissions on S3 data lakes. Finally, AWS Glue 5.0 adds support for SageMaker Lakehouse to unify all your data across S3 data lakes and Redshift data warehouses.

Amazon S3 Access Grants now integrate with AWS Glue

Amazon S3 Access Grants now integrate with AWS Glue for analytics, ML, and application development workloads in AWS. S3 Access Grants map identities from your identity provider (IdP), such as Entra ID and Okta or AWS Identity and Access Management (IAM) principals, to datasets stored in Amazon S3. This integration gives you the ability to manage Amazon S3 permissions for end-users running jobs with AWS Glue 5.0 or later, without the need to write and maintain bucket policies or individual IAM roles. When end-users in the appropriate user groups access Amazon S3 using AWS Glue ETL for Apache Spark, they will then automatically have the necessary permissions to read and write data.

AWS Glue Data catalog now automates generating statistics for new tables

The AWS Glue Data Catalog now automates generating statistics for new tables. These statistics are integrated with a cost-based optimizer (CBO) from Amazon Redshift and Athena, resulting in improved query performance and potential cost savings. Previously, creating statistics for Iceberg tables in the Data Catalog required you to continuously monitor and update configurations for your tables. Now, the Data Catalog lets you generate statistics automatically for new tables with one-time catalog configuration. Amazon Redshift and Athena use the updated statistics to optimize queries, using optimizations such as optimal join order or cost-based aggregation pushdown. The Data Catalog console provides you visibility into the updated statistics and statistics generation runs.

AWS expands data connectivity for Amazon SageMaker Lakehouse and AWS Glue

SageMaker Lakehouse announces unified data connectivity capabilities to streamline the creation, management, and usage of connections to data sources across databases, data lakes, and enterprise applications. SageMaker Lakehouse unified data connectivity provides a connection configuration template, support for standard authentication methods like basic authentication and OAuth 2.0, connection testing, metadata retrieval, and data preview. You can create SageMaker Lakehouse connections through SageMaker Unified Studio (preview), the AWS Glue console, or a custom-built application using APIs under AWS Glue.

With the ability to browse metadata, you can understand the structure and schema of the data source and identify relevant tables and fields. SageMaker Lakehouse unified connectivity is available where SageMaker Lakehouse or AWS Glue is available.

Announcing generative AI troubleshooting for Apache Spark in AWS Glue (Preview)

AWS Glue announces generative AI troubleshooting for Apache Spark, a new capability that helps data engineers and scientists quickly identify and resolve issues in their Spark jobs. Spark Troubleshooting uses ML and generative AI technologies to provide automated root cause analysis for Spark job issues, along with actionable recommendations to fix identified issues. With Spark troubleshooting, you can initiate automated analysis of failed jobs with a single click on the AWS Glue console. Powered by Amazon Bedrock, Spark troubleshooting reduces debugging time from days to minutes.

The generative AI troubleshooting for Apache Spark preview is available for jobs running on AWS Glue 4.0.

Amazon EMR

Introducing Advanced Scaling in Amazon EMR Managed Scaling

We are excited to announce Advanced Scaling, a new capability in Amazon EMR Managed Scaling that provides you increased flexibility to control the performance and resource utilization of your Amazon EMR on EC2 clusters. With Advanced Scaling, you can configure the desired resource utilization or performance levels for your cluster, and Amazon EMR Managed Scaling will use your intent to intelligently scale the cluster and optimize cluster compute resources.

Advanced Scaling is available with Amazon EMR release 7.0 and later and is available in all AWS Regions where Amazon EMR Managed Scaling is available.

Amazon Athena

Amazon SageMaker Lakehouse integrated access controls now available in Amazon Athena federated queries

SageMaker now supports connectivity, discovery, querying, and enforcing fine-grained data access controls on federated sources when querying data with Athena. Athena is a query service that makes it simple to analyze your data lake and federated data sources such as Amazon Redshift, DynamoDB, or Snowflake using SQL without extract, transform, and load (ETL) scripts. Now, data workers can connect to and unify these data sources within SageMaker Lakehouse. Federated source metadata is unified in SageMaker Lakehouse, where you apply fine-grained policies in one place, helping to streamline analytics workflows and secure your data.

Amazon Managed Service for Apache Flink

Amazon Managed Service for Apache Flink now supports Amazon Managed Service for Prometheus as a destination

AWS announced support for a new Apache Flink connector for Amazon Managed Service for Prometheus. The new connector, contributed by AWS for the Flink open source project, adds Amazon Managed Service for Prometheus as a new destination for Flink. You can use the new connector to send processed data to an Amazon Managed Service for Prometheus destination starting with Flink version 1.19. With Amazon Managed Service for Apache Flink, you can transform and analyze data in real time. There are no servers and clusters to manage, and there is no compute and storage infrastructure to set up.

Amazon Managed Service for Apache Flink now delivers to Amazon SQS queues

AWS announced support for a new Flink connector for Amazon Simple Queue Service (Amazon SQS). The new connector, contributed by AWS for the Flink open source project, adds Amazon SQS as a new destination for Apache Flink. You can use the new connector to send processed data from Amazon Managed Service for Apache Flink to SQS messages with Flink, a popular framework and engine for processing and analyzing streaming data.

Amazon Managed Service for Apache Flink releases a new Amazon Kinesis Data Streams connector

Amazon Managed Service for Apache Flink now offers a new Flink connector for Amazon Kinesis Data Streams. This open source connector, contributed by AWS, supports Flink 2.0 and provides several enhancements. It enables in-order reads during stream scale-up or scale-down, supports Flink’s native watermarking, and improves observability through unified connector metrics. Additionally, the connector uses the AWS SDK for Java 2.x, which supports enhanced performance and security features, and native retry strategy. You can use the new connector to read data from a Kinesis data stream starting with Flink version 1.19.

Amazon Redshift

Amazon SageMaker Lakehouse and Amazon Redshift support for zero-ETL integrations from eight applications

SageMaker Lakehouse and Amazon Redshift now support zero-ETL integrations from applications, automating the extraction and loading of data from eight applications, including Salesforce, SAP, ServiceNow, and Zendesk. As an open, unified, and secure lakehouse for your analytics and AI initiatives, SageMaker Lakehouse enhances these integrations to streamline your data management processes. These zero-ETL integrations are fully managed by AWS and minimize the need to build ETL data pipelines. Optimize your data ingestion processes and focus instead on analysis and gaining insights.

Amazon Redshift multi-data warehouse writes through data sharing is now generally available

AWS announces the general availability of Amazon Redshift multi-data warehouse writes through data sharing. You can now start writing to Redshift databases from multiple Redshift data warehouses in just a few clicks. With Redshift multi-data warehouse writes through data sharing, you can keep ETL jobs more predictable by splitting workloads between multiple warehouses, helping you meet your workload performance requirements with less time and effort. Your data is immediately available across AWS accounts and Regions after it’s committed, enabling better collaboration across your organization.

Announcing Amazon Redshift Serverless with AI-driven scaling and optimization

Amazon Redshift Serverless introduces the next generation of AI-driven scaling and optimization in cloud data warehousing. Redshift Serverless uses AI techniques to automatically scale with workload changes across all key dimensions—such as data volume changes, number of concurrent users, and query complexity—to meet and maintain your price-performance targets. Amazon internal tests demonstrate that this optimization can provide you up to 10 times better price performance for variable workloads, without manual intervention.

Redshift Serverless with AI-driven scaling and optimization is available in all AWS Regions where Redshift Serverless is available.

Amazon Redshift now supports incremental refresh on Materialized Views (MVs) for data lake tables

Amazon Redshift now supports incremental refresh of materialized views on data lake tables. This capability helps you improve query performance for your data lake queries in a cost-effective and efficient manner. By enabling incremental refresh for materialized views, you can maintain up-to-date data in a more efficient and affordable way.

Support for incremental refresh for materialized views on data lake tables is now available in all commercial Regions. To get started and learn more, visit Materialized views on external data lake tables in Amazon Redshift Spectrum.

AWS announces Amazon Redshift integration with Amazon Bedrock for generative AI

AWS announces the integration of Amazon Redshift with Amazon Bedrock, a fully managed service offering high-performing foundation models (FMs) making it simpler and faster for you to build generative AI applications. This integration enables you to use large language models (LLMs) from simple SQL commands alongside your data in Amazon Redshift.

The Amazon Redshift integration with Amazon Bedrock is now generally available in all Regions where Amazon Bedrock and Amazon Redshift ML are supported. To get started, see Amazon Redshift ML integration with Amazon Bedrock.

Announcing general availability of auto-copy for Amazon Redshift

Amazon Redshift announces the general availability of auto-copy, which simplifies data ingestion from Amazon S3 into Amazon Redshift. This new feature enables you to set up continuous file ingestion from your S3 prefix and automatically load new files to tables in your Redshift data warehouse without the need for additional tools or custom solutions.

Amazon Redshift auto-copy from Amazon S3 is now generally available for both Redshift Serverless and Amazon Redshift RA3 Provisioned data warehouses in all AWS commercial Regions.

Amazon DataZone

Data Lineage is now generally available in Amazon DataZone and next generation of Amazon SageMaker

AWS announces general availability of Data Lineage in Amazon DataZone and the next generation of SageMaker, a capability that automatically captures lineage from AWS Glue and Amazon Redshift to visualize lineage events from source to consumption. Being OpenLineage compatible, this feature allows data producers to augment the automated lineage with lineage events captured from OpenLineage-enabled systems or through an API, to provide a comprehensive data movement view to data consumers. This feature automates lineage capture of schema and transformations of data assets and columns from AWS Glue, Amazon Redshift, and Spark executions in tools to maintain consistency and reduce errors. Additionally, the data lineage feature versions lineage with each event, enabling you to visualize lineage at any point in time or compare transformations across an asset’s or job’s history.

Amazon DataZone now enhances data access governance with enforced metadata rules

Amazon DataZone now supports enforced metadata rules for data access workflows, providing organizations with enhanced capabilities to strengthen governance and compliance with their organization needs. This new feature allows domain owners to define and enforce mandatory metadata requirements, making sure data consumers provide essential information when requesting access to data assets in Amazon DataZone. By streamlining metadata governance, this capability helps organizations meet compliance standards, maintain audit readiness, and simplify access workflows for greater efficiency and control.

Amazon DataZone expands data access with tools like Tableau, Power BI, and more

Amazon DataZone now supports authentication with the Athena JDBC driver, enabling data consumers to query their project’s subscribed data lake assets in Amazon DataZone using popular business intelligence (BI) and analytics tools such as Tableau, Domino, Power BI, Microsoft Excel, SQL Workbench, and more. Data analysts and scientists can seamlessly access and analyze governed data in Amazon DataZone using a standard JDBC connection with their preferred tools.

This feature is now available in all the AWS commercial Regions where Amazon DataZone is supported. Check out Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more and Connecting Amazon DataZone with external applications via JDBC connectivity to learn more about how to connect Amazon DataZone to external analytics tools via JDBC.

Amazon QuickSight

Announcing scenarios analysis capability of Amazon Q in QuickSight (preview)

A new scenario analysis capability of Amazon Q in QuickSight is now available in preview. This new capability provides an AI-assisted data analysis experience that helps you make better decisions, faster. Amazon Q in QuickSight simplifies in-depth analysis with step-by-step guidance, saving hours of manual data manipulation and unlocking data-driven decision-making across your organization. You can ask a question or state your goal in natural language and Amazon Q in QuickSight guides you through every step of advanced data analysis—suggesting analytical approaches, automatically analyzing data, surfacing relevant insights, and summarizing findings with suggested actions.

Amazon QuickSight now supports prompted reports and reader scheduling for pixel-perfect reports

We are enabling QuickSight readers to generate filtered views of pixel-perfect reports and create schedules to deliver reports through email. Readers can create up to five schedules per dashboard for themselves. Previously, only dashboard owners could create schedules and only on the default (author published) view of the dashboard. Now, if an author has added controls to the pixel-perfect report, schedules can be created or updated to respect selections on the filter control.

Prompted reports and reader scheduling are now available in all supported QuickSight Regions—see Amazon QuickSight endpoints and quotas for QuickSight Regional endpoints.

Amazon Q in QuickSight unifies insights from structured and unstructured data

Amazon Q in QuickSight provides you with unified insights from structured and unstructured data sources through integration with Amazon Q Business. With data stories in Amazon Q in QuickSight, you can upload documents, or connect to unstructured data sources from Amazon Q Business, to create richer narratives or presentations explaining your data with additional context. This integration enables organizations to harness insights from all their data without the need for manual collation, leading to more informed decision-making, time savings, and a significant competitive edge.

Amazon Q Business now provides insights from your databases and data warehouses (preview)

AWS announces the public preview of the integration between Amazon Q Business and QuickSight, delivering a transformative capability that unifies answers from structured data sources (databases, warehouses) and unstructured data (documents, wikis, emails) in a single application.

With the QuickSight integration, you can now link your structured sources to Amazon Q Business through the extensive set of data source connectors available in QuickSight. This integration unifies insights across knowledge sources, helping organizations make more informed decisions while reducing the time and complexity traditionally required to gather insights.

Amazon OpenSearch Service

Amazon OpenSearch Service zero-ETL integration with Amazon Security Lake

Amazon OpenSearch Service now offers a zero-ETL integration with Amazon Security Lake, enabling you to query and analyze security data in-place directly through OpenSearch. This integration allows you to efficiently explore voluminous data sources that were previously cost-prohibitive to analyze, helping you streamline security investigations and obtain comprehensive visibility of your security landscape.

Amazon OpenSearch Ingestion now supports writing security data to Amazon Security Lake

Amazon OpenSearch Ingestion now allows you to write data into Amazon Security Lake in real time, allowing you to ingest security data from both AWS and custom sources and uncover valuable insights into potential security issues in near real time. With this feature, you can now use OpenSearch Ingestion to ingest and transform security data from popular third-party sources like Palo Alto, CrowdStrike, and SentinelOne into OCSF format before writing the data into Amazon Security Lake. After the data is written to Amazon Security Lake, it is available in the AWS Glue Data Catalog and Lake Formation tables for the respective source.

AWS Clean Rooms

AWS Clean Rooms now supports multiple clouds and data sources

AWS Clean Rooms announces support for collaboration with datasets from multiple clouds and data sources. This launch allows companies and their partners to collaborate with data stored in Snowﬂake and Athena, without having to move or share their underlying data among collaborators.

Conclusion

re:Invent 2024 showcased how AWS continues to push the boundaries of data and analytics, delivering tools and services that empower organizations to derive faster, smarter, and more actionable insights. From advancements in data lakes, data warehouses, and streaming solutions to the integration of generative AI capabilities, these announcements are designed to transform the way businesses interact with their data.

As we look ahead, it’s clear that AWS is committed to helping organizations stay ahead in an increasingly data-driven world. Whether you’re modernizing your analytics stack or exploring new possibilities with AI and ML, the innovations from re:Invent 2024 provide the building blocks to unlock value from your data.

Stay tuned for more deep dives into these announcements, and don’t hesitate to explore how these tools can accelerate your journey toward data-driven success!

About the Authors

Sakti Mishra serves as Principal Data and AI Solutions Architect at AWS, where he helps customers modernize their data architecture and define end-to end-data strategies, including data security, accessibility, governance, and more. He is also the author of Simplify Big Data Analytics with Amazon EMR and AWS Certified Data Engineer Study Guide books. Outside of work, Sakti enjoys learning new technologies, watching movies, and visiting places with family. He can be reached via LinkedIn.

Navnit Shukla serves as an AWS Specialist Solutions Architect with a focus on analytics. He possesses a strong enthusiasm for assisting clients in discovering valuable insights from their data. Through his expertise, he constructs innovative solutions that empower businesses to arrive at informed, data-driven choices. Notably, Navnit Shukla is the accomplished author of the book titled “Data Wrangling on AWS.” He can be reached via LinkedIn. Solana Token Creator