생성형 AI

Hosting DeepSeek models on Amazon SageMaker

DeepSeek-R1 Distill Llama 8B using SGLang

This is a CDK Python project to host DeepSeek-R1-Distill-Llama-8B on Amazon SageMaker Real-time Inference Endpoint. In this example, we'll demonstrate how to adapt the SGLang framework to run on SageMaker AI endpoints. SGLang is a serving framework for large language models that provides state-of-the-art performance, including a fast backend runtime for efficient serving with RadixAttention, extensive model support, and an active open-source community. For more information refer to https://docs.sglang.ai/index.html and https://github.com/sgl-project/sglang.

DeepSeek-R1 Distill Llama 8B

This is a CDK Python project to deploy DeepSeek-R1-Distill-Llama-8B a SageMaker real-time endpoint with the scale down to zero feature. This project demonstrates how you can scale in your SageMaker endpoint to zero instances during idle periods, eliminating the previous requirement of maintaining at least one running instance.

DeepSeek-R1 Distill Qwen 14B

This is a CDK Python project to host DeepSeek-R1-Distill-Qwen-14B on Amazon SageMaker Real-time Inference Endpoint. DeepSeek-R1 is one of the first generation of DeepSeek reasoning models, along with DeepSeek-R1-Zero. DeepSeek-R1-Distill-Qwen-14B is a fine-tuned based on open-source model, using samples generated by DeepSeek-R1.

DeepSeek-R1 Distill Qwen 32B

This is a CDK Python project to host DeepSeek-R1-Distill-Qwen-32B on Amazon SageMaker Real-time Inference Endpoint using SageMaker JumpStart.

DeepSeek-V2 Lite Chat

This is a CDK Python project to host the DeepSeek: A Strong, Economical, and Efficient Mixture-of-Experts Language Model on Amazon SageMaker Real-time Inference Endpoint using SageMaker DJL Serving DLC.

Janus-Pro 7B

This is a CDK Python project to host deepseek-ai/Janus-Pro-7B on Amazon SageMaker Real-time Inference Endpoint. Janus-Pro-7B is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.

Hosting LG AI EXAONE-Deep models on Amazon SageMaker

EXAONE-Deep 7.8B using SGLang

This is a CDK Python project to host LG AI EXAONE Deep 7.8B on Amazon SageMaker Real-time Inference Endpoint. EXAONE Deep, which exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research. In this example, we'll demonstrate how to adapt the SGLang framework to run on SageMaker AI endpoints. SGLang is a serving framework for large language models that provides state-of-the-art performance, including a fast backend runtime for efficient serving with RadixAttention, extensive model support, and an active open-source community. For more information refer to https://docs.sglang.ai/index.html and https://github.com/sgl-project/sglang.

LLM Observability Tools

Langfuse on AWS

This project is an AWS CDK Python project for deploying the Langfuse application using Amazon Elastic Container Registry (ECR) and Amazon Elastic Container Service (ECS). Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.

(1) Langfuse v3

(2) Lanfuse v2

RAG(Retrieval Augmented Generation)

With Knowledge Bases for Amazon Bedrock

This project is an Question Answering application with Large Language Models (LLMs) and Knowledge Bases for Amazon Bedrock. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon OpenSearch Serverless is used for a Knowledge Base for Amazon Bedrock.

generative-ai-cdk constructs 사용 버전
AWS CDK L1 Constructs 사용 버전

With Amazon Aurora Postgresql used for a Knowledge Base for Amazon Bedrock

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for a Knowledge Base for Amazon Bedrock.

With LLMs and Amazon Kendra

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Kendra. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

With Amazon Bedrock and Kendra

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Kendra. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

With Amazon Bedrock and OpenSearch

With LLMs and Amazon OpenSearch

This project is an Question Answering application with Large Language Models (LLMs) and Amazon OpenSearch Service. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

With LLMs and Amazon OpenSearch Serverless

Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch Serverless Service

With Amazon Bedrock and Amazon Aurora Postgresql using pgvector

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for knowledge base.

With LLMs and Amazon Aurora Postgresql using pgvector

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for knowledge base.

With Amazon Bedrock and MemoryDB for Redis

Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon MemoryDB for Redis.

With Amazon MemoryDB for Redis and SageMaker

Question Answering Generative AI application with Large Language Models (LLMs) deployed on Amazon SageMaker, and Amazon MemoryDB for Redis as a Vector Database.

With Amazon Bedrock and DocumentDB

Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon DocumentDB (with MongoDB Compatibility)

With Amazon DocumentDB and SageMaker

Question Answering Generative AI application with Large Language Models (LLMs) deployed on Amazon SageMaker, and Amazon DocumentDB (with MongoDB Compatibility) as a Vector Database.

Semantic Vector Search in PostgreSQL using Amazon SageMaker and pgvector

This project is a search solution using pgvector for an online retail store product catalog. We’ll build a search system that lets customers provide an item description to find similar items. For more information, check this blog post, Building AI-powered search in PostgreSQL using Amazon SageMaker and pgvector (on MAY 2023)

Last updated

Was this helpful?