Title: Democratizing ChatGPT like service with publicly available LLMs
Author: Chansung (chansung#5308)
Date posted: 2023/04/18
Summary
- Development of a general serving framework for instruction following fine-tuned LLM models such as Alpaca, GPT4-Alpaca, Flan series, Dolly, StackLLaMA, and more.
- Producing instruction following fine-tuned LLM models with some publicly available datasets such as Alpaca, GPT4-Alpaca, Dolly.
- Showcase the actually working Chatbot application (w/ 7B, 13B, 30B)
Background
- Democratizing ChatGPT like chatbot service for everyone to use.
- Case study to share why ChatGPT works so well besides the GPT model itself. It is to unveil the importance of Chatbot framework with pre/post of text(prompt) processings and context managements.
Scope of Work
- Goals: Fine-tuned models (at least 7B, 13B, 30B versions of LLaMA based Alpaca), general serving framework, general conversation management framework
- Features: Dynamically loading/serving different models, dynamically construct different styles of prompts based on model type and data structure of different UI systems, context managements
Timeline
4/1: Starting project
4/14: Announcing fine-tuned models at Hugging Face Hub
4/21: Announcing conversation management framework (open source project)
4/30: Announcing general serving framework (open source project)
~5/31: Advancement of the frameworks by serving the models to end users at the communities + possibly enhance the existing fine-tuned models with better or fixed publicly available datasets
Specification
In order to serve ChatGPT like service based on LLMs, we need at least three technologies in sequential manner.
-
Instruction following fine-tuned LLMs: without any concrete models available, we can not build and test ChatGPT like service. Even though there are already some publicly available fine-tuned models out there, they need to be adjusted frequently (because of ChatGPT, everyone in the field of AI tends to move so fast which leads lots of flaws. For example, Alpaca dataset contains lots of flaws such as empty text, repetitive strings, hallucinations, etc). Hence, as the time moves, we need to update fine-tuned models, and I think it will help us to understand the importance of the datasets and how differently curated datasets generate different texts.
-
General conversation management with context: it is often true that the different datasets used to fine-tune different models have different structure. That means we should layout the structure of prompts according to the datasets, so we need to abstract the prompt layouts to support multiple models. Furthermore, it is important to manage contexts of the conversation due to the technical limitations of the current LLM architecture (<4096 tokens at most), but context management should be done differently for different models since they were fine-tuned with different structure of datasets. Hence, context management should be abstracted too.
- for instance,
### Instruction:
,### Input:
,### Response
is the delimiter in Alpaca like models whileQuestion:
andAnswer:
are used in StackLLaMA.
- for instance,
-
Genera serving framework: there are too many claimed to be open source models coming out every week, but we don’t know how it works better than others. Also, the release of open source models does not mean it works like context aware Chatbot through the conversation history.
Request
- Description: Resource request for fine-tuning and serving LLMs
- Resource(Support) type: DGX (40G A100 x 8) Workstation
- Amount: 1
- Date: 4/1 ~ 5/31
- Impact: It will boost our understanding and actual implementation towards ChatGPT like service with open source models. There is no currently available model comparative to ChatGPT yet, but I think it is important to build a general serving frameworks to prepare when such models come. This will increase the value of communities of both experts who might want to design their own Chatbot and non-experts who simply want to host their own Chatbot with the default solutions out of the box. Furthermore, with the support of multiple models at once, people could understand and compare how different models work differently.
Targets
- LLM-Chatbot-Serve (open source project): general framework to serve fine-tuned LLM
- Bing Bong (open source project): general framework to manage conversation and context
- Fine-tuned models (open source model): fine-tuned LLMs (at least 7B, 13B, 30B variants of Alpaca based on LLaMA)
Participants
- Chansung Park: he has been working as a software engineer for 11 years in the field of building a software management platform of optical networks (L0~L3), and he is working on building lots of MLOps scenarios as a GDE(Google Developers Experts) and Hugging Face Fellow for the last 2 years. He is currently interested in how ChatGPT works in the perspective of model itself, other moving parts besides the model. He will likely commit 2-3 hours of his time per day into this projects.