생성형 AI

Hosting DeepSeek models on Amazon SageMaker

DeepSeek-R1 Distill Llama 8B using SGLang

This is a CDK Python project to host DeepSeek-R1-Distill-Llama-8B on Amazon SageMaker Real-time Inference Endpoint. In this example, we'll demonstrate how to adapt the SGLang framework to run on SageMaker AI endpoints. SGLang is a serving framework for large language models that provides state-of-the-art performance, including a fast backend runtime for efficient serving with RadixAttention, extensive model support, and an active open-source community. For more information refer to https://docs.sglang.ai/index.html and https://github.com/sgl-project/sglang.

aws-kr-startup-samples/machine-learning/sagemaker/deepseek-on-sagemaker/deepseek-r1-distill-llama-8b-sglang at main · aws-samples/aws-kr-startup-samplesGitHub

DeepSeek-R1 Distill Llama 8B

This is a CDK Python project to deploy DeepSeek-R1-Distill-Llama-8B a SageMaker real-time endpoint with the scale down to zero feature. This project demonstrates how you can scale in your SageMaker endpoint to zero instances during idle periods, eliminating the previous requirement of maintaining at least one running instance.

aws-kr-startup-samples/machine-learning/sagemaker/scale-to-zero-sagemaker-endpoint at main · aws-samples/aws-kr-startup-samplesGitHub

DeepSeek-R1 Distill Qwen 14B

This is a CDK Python project to host DeepSeek-R1-Distill-Qwen-14B on Amazon SageMaker Real-time Inference Endpoint. DeepSeek-R1 is one of the first generation of DeepSeek reasoning models, along with DeepSeek-R1-Zero. DeepSeek-R1-Distill-Qwen-14B is a fine-tuned based on open-source model, using samples generated by DeepSeek-R1.

aws-kr-startup-samples/machine-learning/sagemaker/deepseek-on-sagemaker/deepseek-r1-distill-qwen-14b at main · aws-samples/aws-kr-startup-samplesGitHub

DeepSeek-R1 Distill Qwen 32B

This is a CDK Python project to host DeepSeek-R1-Distill-Qwen-32B on Amazon SageMaker Real-time Inference Endpoint using SageMaker JumpStart.

aws-kr-startup-samples/machine-learning/sagemaker/deepseek-on-sagemaker/deepseek-r1-distill-qwen-32b at main · aws-samples/aws-kr-startup-samplesGitHub

DeepSeek-V2 Lite Chat

This is a CDK Python project to host the DeepSeek: A Strong, Economical, and Efficient Mixture-of-Experts Language Model on Amazon SageMaker Real-time Inference Endpoint using SageMaker DJL Serving DLC.

aws-kr-startup-samples/machine-learning/sagemaker/deepseek-on-sagemaker/deepseek-v2-lite-chat at main · aws-samples/aws-kr-startup-samplesGitHub

Janus-Pro 7B

This is a CDK Python project to host deepseek-ai/Janus-Pro-7B on Amazon SageMaker Real-time Inference Endpoint. Janus-Pro-7B is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.

aws-kr-startup-samples/machine-learning/sagemaker/deepseek-on-sagemaker/janus-pro-7b at main · aws-samples/aws-kr-startup-samplesGitHub

Hosting LG AI EXAONE-Deep models on Amazon SageMaker

EXAONE-Deep 7.8B using SGLang

This is a CDK Python project to host LG AI EXAONE Deep 7.8B on Amazon SageMaker Real-time Inference Endpoint. EXAONE Deep, which exhibits superior capabilities in various reasoning tasks including math and coding benchmarks, ranging from 2.4B to 32B parameters developed and released by LG AI Research. In this example, we'll demonstrate how to adapt the SGLang framework to run on SageMaker AI endpoints. SGLang is a serving framework for large language models that provides state-of-the-art performance, including a fast backend runtime for efficient serving with RadixAttention, extensive model support, and an active open-source community. For more information refer to https://docs.sglang.ai/index.html and https://github.com/sgl-project/sglang.

aws-kr-startup-samples/machine-learning/sagemaker/lgai-exaone-on-sagemaker/exaone-deep-7_8b-sglang at main · aws-samples/aws-kr-startup-samplesGitHub

LLM Observability Tools

Langfuse on AWS

This project is an AWS CDK Python project for deploying the Langfuse application using Amazon Elastic Container Registry (ECR) and Amazon Elastic Container Service (ECS). Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.

GitHub - aws-samples/deploy-langfuse-on-ecs-with-fargate: Self-hosting Langfuse on Amazon ECS with Fargate using CDK PythonGitHub

(1) Langfuse v3

(2) Lanfuse v2

RAG(Retrieval Augmented Generation)

With Knowledge Bases for Amazon Bedrock

This project is an Question Answering application with Large Language Models (LLMs) and Knowledge Bases for Amazon Bedrock. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon OpenSearch Serverless is used for a Knowledge Base for Amazon Bedrock.

With Amazon Aurora Postgresql used for a Knowledge Base for Amazon Bedrock

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Aurora Postgresql using pgvector. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response. In this project, Amazon Aurora Postgresql with pgvector is used for a Knowledge Base for Amazon Bedrock.

aws-kr-startup-samples/gen-ai/rag-with-knowledge-bases-for-amazon-bedrock-using-aurora-postgresql at main · aws-samples/aws-kr-startup-samplesGitHub

With LLMs and Amazon Kendra

This project is a Question Answering application with Large Language Models (LLMs) and Amazon Kendra. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

aws-kr-startup-samples/gen-ai/rag-with-amazon-kendra-and-sagemaker at main · aws-samples/aws-kr-startup-samplesGitHub

With Amazon Bedrock and Kendra

GitHub - aws-samples/qa-app-with-rag-using-amazon-bedrock-and-kendra: Question Answering Generative AI application with Large Language Models and RAG powered by Amazon Bedrock and Amazon KendraGitHub

With Amazon Bedrock and OpenSearch

aws-kr-startup-samples/gen-ai/rag-with-amazon-bedrock-and-opensearch at main · aws-samples/aws-kr-startup-samplesGitHub

With LLMs and Amazon OpenSearch

This project is an Question Answering application with Large Language Models (LLMs) and Amazon OpenSearch Service. An application using the RAG(Retrieval Augmented Generation) approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a GenAI response.

GitHub - aws-samples/rag-with-amazon-opensearch-and-sagemaker: Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch ServiceGitHub

With LLMs and Amazon OpenSearch Serverless

Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch Serverless Service

GitHub - aws-samples/rag-with-amazon-opensearch-serverless-and-sagemaker: Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch Serverless ServiceGitHub

With Amazon Bedrock and Amazon Aurora Postgresql using pgvector

aws-kr-startup-samples/gen-ai/rag-with-amazon-bedrock-and-postgresql-using-pgvector at main · aws-samples/aws-kr-startup-samplesGitHub

With LLMs and Amazon Aurora Postgresql using pgvector

GitHub - aws-samples/rag-with-amazon-postgresql-using-pgvector-and-sagemaker: Question Answering application with Large Language Models (LLMs) and Amazon Postgresql using pgvectorGitHub

With Amazon Bedrock and MemoryDB for Redis

Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon MemoryDB for Redis.

GitHub - aws-samples/rag-with-amazon-bedrock-and-memorydb: Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon MemoryDB for RedisGitHub

With Amazon MemoryDB for Redis and SageMaker

Question Answering Generative AI application with Large Language Models (LLMs) deployed on Amazon SageMaker, and Amazon MemoryDB for Redis as a Vector Database.

aws-kr-startup-samples/gen-ai/rag-with-amazon-memorydb-and-sagemaker at main · aws-samples/aws-kr-startup-samplesGitHub

With Amazon Bedrock and DocumentDB

Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon DocumentDB (with MongoDB Compatibility)

GitHub - aws-samples/rag-with-amazon-bedrock-and-documentdb: Question Answering Generative AI application with Large Language Models (LLMs), Amazon Bedrock, and Amazon DocumentDB (with MongoDB Compatibility)GitHub

With Amazon DocumentDB and SageMaker

Question Answering Generative AI application with Large Language Models (LLMs) deployed on Amazon SageMaker, and Amazon DocumentDB (with MongoDB Compatibility) as a Vector Database.

aws-kr-startup-samples/gen-ai/rag-with-amazon-documentdb-and-sagemaker at main · aws-samples/aws-kr-startup-samplesGitHub

Semantic Vector Search in PostgreSQL using Amazon SageMaker and pgvector

This project is a search solution using pgvector for an online retail store product catalog. We’ll build a search system that lets customers provide an item description to find similar items. For more information, check this blog post, Building AI-powered search in PostgreSQL using Amazon SageMaker and pgvector (on MAY 2023)

GitHub - ksmin23/semantic-vector-search-with-sagemaker-pgvector: A search application using Aurora Postgresql and pgvector for an online retail store product catalogGitHub

PreviousAI/ML NextSaaS

Last updated 6 months ago

Was this helpful?