데이터 분석
Build Business Intelligence System from Scratch on AWS
Describes the concepts of lambda architecture and the actual deployment process with an example of building a serverless business intelligence systems using Amazon Kinesis, S3, Athena, OpenSearch Service, and QuickSight.
Zero-ETL integrations with Amazon Redshift
(1) Aurora MySQL to Amazon Redshift
An Amazon Aurora MySQL zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from RDS.

(2) Aurora PostgreSQL to Amazon Redshift
An Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from RDS.

(3) Amazon RDS MySQL to Amazon Redshift
An Amazon RDS MySQL zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from RDS.

CDC(Change Data Capture) Data Pipeline
Data Pipeline for CDC data from MySQL DB to Amazon OpenSearch Service through Amazon Kinesis using Amazon Data Migration Service(DMS).
CDC(Change Data Capture) Data Pipeline using Amazon MSK and MSK Connect with Debezium
Data Pipeline for CDC data from MySQL DB to Amazon S3 through Amazon MSK using Amazon MSK Connect (Debezium)

CDC(Change Data Capture) Data Pipeline using Amazon MSK Serverless and MSK Connect with Debezium
Data Pipeline for CDC data from MySQL DB to Amazon S3 through Amazon MSK Serverless using Amazon MSK Connect (Debezium)

Transactional Data Lake supporting CDC-based Upsert operation
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
Transactional Data Lake using Amazon MSK and Apache Iceberg on AWS Glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)

Transactional Data Lake using Amazon MSK Serverless and Apache Iceberg on AWS Glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)

Transactional Data Lake using Amazon Data Firehose and Apache Iceberg
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with Amazon Data Firehose and DMS

Streaming Data to Amazon S3 Tables using Amazon Kinesis Data Firehose
This is a CDK Python project to build a fully managed data lake using Amazon Data Firehose and S3 Tables to store and analyze real-time streaming data.

Streaming Data Pipeline from Apache Kafka to Amazon S3 using Amazon Kinesis Data Firehose
Streaming data pipeline to continuously load data from an Amazon MSK or MSK Serverless cluster to Amazon S3 using Amazon Kinesis Data Firehose.

Redshift Streaming ingestion from Kinesis Data Streams, MSK, or MSK Serverelss (3 examples)
This is a collecton of CDK projects to show how to load data from streaming services into Amazon Redshift.
OpenSearch Serverless 4 Common Usage Patterns
Typical use cases of opensearch serverless: search, time-series, kinesis firehose integration, securing with VPC
(1) Search
(2) Time-series Log Analysis
(3) Streaming Ingestion through Kinesis Firehose
(4) Securing OpenSearch Serverless with VPC

Web Analytics System on AWS (a kind of Simple version of Google Analytics)
Web Log Analytics System with Parquet data format
This web analytics demo shows how to collect web logs with API Gateway and store them into S3 through Amazon Kinesis. Then this project shows how to analyze web logs with Amazon Athena.
Web Log Analytics System with Apache Iceberg Table
This repository provides you cdk scripts and sample code on how to implement a simple web analytics system. Below diagram shows what we are implementing.

Web Log Analytics System using API Gateway integrated with Data Firehose with Apache Iceberg table
This repository provides you cdk scripts and sample code on how to implement a simple web analytics system. Below diagram shows what we are implementing.

AWS Glue Streaming ETL example with Apache Iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
AWS Glue Streaming Ingestion from Kafka to Apache Iceberg table in S3
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
AWS Glue Streaming ETL example with Delta Lake
Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3
Building CQRS Pattern using Amazon Athena
Example of CQRS(Command and Query Responsibility Segregation) Pattern using Amazon Athena
Streaming Count Sketches with HyperLogLog in Amazon MemoryDB for Redis
This repository provides you cdk scripts and sample code on how to count unique items (e.g., unique visitors) with hyperloglog in Amazon MemoryDB for Redis. HyperLogLog (HLL) is a probabilistic data structure that estimates the cardinality of a set. As a probabilistic data structure, HyperLogLog trades perfect accuracy for efficient space utilization.

Real-time Image Analysis System
This sample project is a real-time image analysis system. As an image is uploaded, the real-time image analysis system annotates tags on the image using Amazon Rekognition and ingests image tags into Amazon Elasticsearch for analyzing image labels.

Last updated
Was this helpful?