데이터 분석

Build Business Intelligence System from Scratch on AWS

Describes the concepts of lambda architecture and the actual deployment process with an example of building a serverless business intelligence systems using Amazon Kinesis, S3, Athena, OpenSearch Service, and QuickSight.

CDC(Change Data Capture) Data Pipeline

Data Pipeline for CDC data from MySQL DB to Amazon OpenSearch Service through Amazon Kinesis using Amazon Data Migration Service(DMS).

CDC(Change Data Capture) Data Pipeline using Amazon MSK and MSK Connect with Debezium

Data Pipeline for CDC data from MySQL DB to Amazon S3 through Amazon MSK using Amazon MSK Connect (Debezium)

CDC(Change Data Capture) Data Pipeline using Amazon MSK Serverless and MSK Connect with Debezium

Data Pipeline for CDC data from MySQL DB to Amazon S3 through Amazon MSK Serverless using Amazon MSK Connect (Debezium)

Transactional Data Lake supporting CDC-based Upsert operation

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS

Transactional Data Lake using Amazon MSK and Apache Iceberg on AWS Glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)

Transactional Data Lake using Amazon MSK Serverless and Apache Iceberg on AWS Glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)

Streaming Data Pipeline from Apache Kafka to Amazon S3 using Amazon Kinesis Data Firehose

Streaming data pipeline to continuously load data from an Amazon MSK or MSK Serverless cluster to Amazon S3 using Amazon Kinesis Data Firehose.

Redshift Streaming ingestion from Kinesis Data Streams, MSK, or MSK Serverelss (3 examples)

This is a collecton of CDK projects to show how to load data from streaming services into Amazon Redshift.

OpenSearch Serverless 4 Common Usage Patterns

Typical use cases of opensearch serverless: search, time-series, kinesis firehose integration, securing with VPC

  • (1) Search

  • (2) Time-series Log Analysis

  • (3) Streaming Ingestion through Kinesis Firehose

  • (4) Securing OpenSearch Serverless with VPC

Web Analytics System on AWS (a kind of Simple version of Google Analytics)

This web analytics demo shows how to collect web logs with API Gateway and store them into S3 through Amazon Kinesis. Then this project shows how to analyze web logs with Amazon Athena.

AWS Glue Streaming ETL example with Apache Iceberg

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

AWS Glue Streaming Ingestion from Kafka to Apache Iceberg table in S3

This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.

AWS Glue Streaming ETL example with Delta Lake

Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3

Building CQRS Pattern using Amazon Athena

Example of CQRS(Command and Query Responsibility Segregation) Pattern using Amazon Athena

Streaming Count Sketches with HyperLogLog in Amazon MemoryDB for Redis

This repository provides you cdk scripts and sample code on how to count unique items (e.g., unique visitors) with hyperloglog in Amazon MemoryDB for Redis. HyperLogLog (HLL) is a probabilistic data structure that estimates the cardinality of a set. As a probabilistic data structure, HyperLogLog trades perfect accuracy for efficient space utilization.

Real-time Image Analysis System

This sample project is a real-time image analysis system. As an image is uploaded, the real-time image analysis system annotates tags on the image using Amazon Rekognition and ingests image tags into Amazon Elasticsearch for analyzing image labels.

Last updated