Category: Big Data

  • Extract insights in a 30TB time series workload with Amazon OpenSearch Serverless

    In today’s data-driven landscape, managing and analyzing vast amounts of data, especially logs, is crucial for organizations to derive insights and make informed decisions. However, handling large data while extracting insights is a significant challenge, prompting organizations to seek scalable solutions without the complexity of infrastructure management. Amazon OpenSearch Serverless reduces the burden of manual infrastructure provisioning…

  • Build a dynamic rules engine with Amazon Managed Service for Apache Flink

    Imagine you have some streaming data. It could be from an Internet of Things (IoT) sensor, log data ingestion, or even shopper impression data. Regardless of the source, you have been tasked with acting on the data—alerting or triggering when something occurs. Martin Fowler says: “You can build a simple rules engine yourself. All you…

  • Deprecation of Lake Formation’s Governed Tables Feature

    After careful consideration, we have made the decision to end support for Governed Tables, effective December 31, 2024, to focus on open source transactional table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. This decision stems from customer preference for these open source solutions, which offer ACID-compliant transactions, compaction, time travel,…

  • Accelerate Amazon Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

    Amazon Redshift enables you to efficiently query and retrieve structured and semi-structured data from open format files in Amazon S3 data lake without having to load the data into Amazon Redshift tables. Amazon Redshift extends SQL capabilities to your data lake, enabling you to run analytical queries. Amazon Redshift supports a wide variety of tabular data…

  • Amazon EMR Serverless observability, Part 1: Monitor Amazon EMR Serverless workers in near real time using Amazon CloudWatch

    Amazon EMR Serverless allows you to run open source big data frameworks such as Apache Spark and Apache Hive without managing clusters and servers. With EMR Serverless, you can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements. We have launched job…

  • Enrich your serverless data lake with Amazon Bedrock

    Organizations are collecting and storing vast amounts of structured and unstructured data like reports, whitepapers, and research documents. By consolidating this information, analysts can discover and integrate data from across the organization, creating valuable data products based on a unified dataset. For many organizations, this centralized data store follows a data lake architecture.  Although data…

  • Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

    In today’s rapidly evolving digital landscape, enterprises across regulated industries face a critical challenge as they navigate their digital transformation journeys: effectively managing and governing data from legacy systems that are being phased out or replaced. This historical data, often containing valuable insights and subject to stringent regulatory requirements, must be preserved and made accessible…

  • Achieve cross-Region resilience with Amazon OpenSearch Ingestion

    Cross-Region deployments provide increased resilience to maintain business continuity during outages, natural disasters, or other operational interruptions. Many large enterprises, design and deploy special plans for readiness during such situations. They rely on solutions built with AWS services and features to improve their confidence and response times. Amazon OpenSearch Service is a managed service for…

  • How to track Amazon OpenSearch Service domain-level cost

    Amazon OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch domains in AWS to perform interactive log analytics, real-time application monitoring, website search, and more. Understanding OpenSearch service spend per domain is crucial for effective cost management, optimization, and informed decision-making. Amazon OpenSearch Service Pricing is based on three dimensions: instances, storage, and…

  • Amazon OpenSearch Service: Managed and community driven

    I’ve always loved the problem of search. At its core, search is about receiving a question, understanding that question, and then retrieving the best answer for it. A long time ago, I did an AI robotics project for my PhD that married a library of plan fragments to a real-world situation, through search. I’ve worked…