생성형 AI
Hosting DeepSeek models on Amazon SageMaker
DeepSeek-R1 Distill Llama 8B using SGLang
This is a CDK Python project to host DeepSeek-R1-Distill-Llama-8B on Amazon SageMaker Real-time Inference Endpoint. In this example, we'll demonstrate how to adapt the SGLang framework to run on SageMaker AI endpoints. SGLang is a serving framework for large language models that provides state-of-the-art performance, including a fast backend runtime for efficient serving with RadixAttention, extensive model support, and an active open-source community. For more information refer to https://docs.sglang.ai/index.html and https://github.com/sgl-project/sglang.
DeepSeek-R1 Distill Llama 8B
This is a CDK Python project to deploy DeepSeek-R1-Distill-Llama-8B a SageMaker real-time endpoint with the scale down to zero feature. This project demonstrates how you can scale in your SageMaker endpoint to zero instances during idle periods, eliminating the previous requirement of maintaining at least one running instance.
DeepSeek-R1 Distill Qwen 14B
This is a CDK Python project to host DeepSeek-R1-Distill-Qwen-14B on Amazon SageMaker Real-time Inference Endpoint. DeepSeek-R1 is one of the first generation of DeepSeek reasoning models, along with DeepSeek-R1-Zero. DeepSeek-R1-Distill-Qwen-14B is a fine-tuned based on open-source model, using samples generated by DeepSeek-R1.
DeepSeek-R1 Distill Qwen 32B
This is a CDK Python project to host DeepSeek-R1-Distill-Qwen-32B on Amazon SageMaker Real-time Inference Endpoint using SageMaker JumpStart.
DeepSeek-V2 Lite Chat
This is a CDK Python project to host the DeepSeek: A Strong, Economical, and Efficient Mixture-of-Experts Language Model on Amazon SageMaker Real-time Inference Endpoint using SageMaker DJL Serving DLC.
Janus-Pro 7B
This is a CDK Python project to host deepseek-ai/Janus-Pro-7B on Amazon SageMaker Real-time Inference Endpoint. Janus-Pro-7B is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
Hosting LG AI EXAONE-Deep models on Amazon SageMaker
EXAONE-Deep 7.8B using SGLang
This is a CDK Python project to host LG AI EXAONE Deep 7.8B on Amazon SageMaker Real-time Inference Endpoint. EXAONE Deep, which exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research. In this example, we'll demonstrate how to adapt the SGLang framework to run on SageMaker AI endpoints. SGLang is a serving framework for large language models that provides state-of-the-art performance, including a fast backend runtime for efficient serving with RadixAttention, extensive model support, and an active open-source community. For more information refer to https://docs.sglang.ai/index.html and https://github.com/sgl-project/sglang.
LLM Observability Tools
Langfuse on AWS
This project is an AWS CDK Python project for deploying the Langfuse application using Amazon Elastic Container Registry (ECR) and Amazon Elastic Container Service (ECS). Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.
(1) Langfuse v3

(2) Lanfuse v2

RAG(Retrieval Augmented Generation)
With Knowledge Bases for Amazon Bedrock
This project is an Question Answering application with Large Language Models (LLMs) and Knowledge Bases for Amazon Bedrock. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon OpenSearch Serverless is used for a Knowledge Base for Amazon Bedrock.

With Amazon Aurora Postgresql used for a Knowledge Base for Amazon Bedrock
This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for a Knowledge Base for Amazon Bedrock.

With LLMs and Amazon Kendra
This project is a Question Answering application with Large Language Models (LLMs) and Amazon Kendra. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

With Amazon Bedrock and Kendra
This project is a Question Answering application with Large Language Models (LLMs) and Amazon Kendra. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

With Amazon Bedrock and OpenSearch

With LLMs and Amazon OpenSearch
This project is an Question Answering application with Large Language Models (LLMs) and Amazon OpenSearch Service. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.
With LLMs and Amazon OpenSearch Serverless
Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch Serverless Service

With Amazon Bedrock and Amazon Aurora Postgresql using pgvector
This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for knowledge base.

With LLMs and Amazon Aurora Postgresql using pgvector
This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for knowledge base.
With Amazon Bedrock and MemoryDB for Redis
Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon MemoryDB for Redis.

With Amazon MemoryDB for Redis and SageMaker
Question Answering Generative AI application with Large Language Models (LLMs) deployed on Amazon SageMaker, and Amazon MemoryDB for Redis as a Vector Database.

With Amazon Bedrock and DocumentDB
Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon DocumentDB (with MongoDB Compatibility)

With Amazon DocumentDB and SageMaker
Question Answering Generative AI application with Large Language Models (LLMs) deployed on Amazon SageMaker, and Amazon DocumentDB (with MongoDB Compatibility) as a Vector Database.

Semantic Vector Search in PostgreSQL using Amazon SageMaker and pgvector
This project is a search solution using pgvector for an online retail store product catalog. We’ll build a search system that lets customers provide an item description to find similar items. For more information, check this blog post, Building AI-powered search in PostgreSQL using Amazon SageMaker and pgvector (on MAY 2023)

Last updated
Was this helpful?