Democratizing ChatGPT like service with publicly available LLMs

Title: Democratizing ChatGPT like service with publicly available LLMs
Author: Chansung (chansung#5308)
Date posted: 2023/04/18


  • Development of a general serving framework for instruction following fine-tuned LLM models such as Alpaca, GPT4-Alpaca, Flan series, Dolly, StackLLaMA, and more.
  • Producing instruction following fine-tuned LLM models with some publicly available datasets such as Alpaca, GPT4-Alpaca, Dolly.
  • Showcase the actually working Chatbot application (w/ 7B, 13B, 30B)


  • Democratizing ChatGPT like chatbot service for everyone to use.
  • Case study to share why ChatGPT works so well besides the GPT model itself. It is to unveil the importance of Chatbot framework with pre/post of text(prompt) processings and context managements.

Scope of Work

  • Goals: Fine-tuned models (at least 7B, 13B, 30B versions of LLaMA based Alpaca), general serving framework, general conversation management framework
  • Features: Dynamically loading/serving different models, dynamically construct different styles of prompts based on model type and data structure of different UI systems, context managements


4/1: Starting project
4/14: Announcing fine-tuned models at Hugging Face Hub
4/21: Announcing conversation management framework (open source project)
4/30: Announcing general serving framework (open source project)
~5/31: Advancement of the frameworks by serving the models to end users at the communities + possibly enhance the existing fine-tuned models with better or fixed publicly available datasets


In order to serve ChatGPT like service based on LLMs, we need at least three technologies in sequential manner.

  1. Instruction following fine-tuned LLMs: without any concrete models available, we can not build and test ChatGPT like service. Even though there are already some publicly available fine-tuned models out there, they need to be adjusted frequently (because of ChatGPT, everyone in the field of AI tends to move so fast which leads lots of flaws. For example, Alpaca dataset contains lots of flaws such as empty text, repetitive strings, hallucinations, etc). Hence, as the time moves, we need to update fine-tuned models, and I think it will help us to understand the importance of the datasets and how differently curated datasets generate different texts.

  2. General conversation management with context: it is often true that the different datasets used to fine-tune different models have different structure. That means we should layout the structure of prompts according to the datasets, so we need to abstract the prompt layouts to support multiple models. Furthermore, it is important to manage contexts of the conversation due to the technical limitations of the current LLM architecture (<4096 tokens at most), but context management should be done differently for different models since they were fine-tuned with different structure of datasets. Hence, context management should be abstracted too.

    • for instance, ### Instruction:, ### Input:, ### Response is the delimiter in Alpaca like models while Question: and Answer: are used in StackLLaMA.
  3. Genera serving framework: there are too many claimed to be open source models coming out every week, but we don’t know how it works better than others. Also, the release of open source models does not mean it works like context aware Chatbot through the conversation history.


  • Description: Resource request for fine-tuning and serving LLMs
  • Resource(Support) type: DGX (40G A100 x 8) Workstation
  • Amount: 1
  • Date: 4/1 ~ 5/31
  • Impact: It will boost our understanding and actual implementation towards ChatGPT like service with open source models. There is no currently available model comparative to ChatGPT yet, but I think it is important to build a general serving frameworks to prepare when such models come. This will increase the value of communities of both experts who might want to design their own Chatbot and non-experts who simply want to host their own Chatbot with the default solutions out of the box. Furthermore, with the support of multiple models at once, people could understand and compare how different models work differently.


  • LLM-Chatbot-Serve (open source project): general framework to serve fine-tuned LLM
  • Bing Bong (open source project): general framework to manage conversation and context
  • Fine-tuned models (open source model): fine-tuned LLMs (at least 7B, 13B, 30B variants of Alpaca based on LLaMA)


  • Chansung Park: he has been working as a software engineer for 11 years in the field of building a software management platform of optical networks (L0~L3), and he is working on building lots of MLOps scenarios as a GDE(Google Developers Experts) and Hugging Face Fellow for the last 2 years. He is currently interested in how ChatGPT works in the perspective of model itself, other moving parts besides the model. He will likely commit 2-3 hours of his time per day into this projects.

AIN DAO members fully approve this proposal. Thanks for your proposal, and we will contact you directly. Hope your project goes well and good luck!

  1. Discord
  2. Snapshot

Hi @chansung18 how’s the progress so far? If you need any help please reach out. A team at Stanford has done this in a pretty efficient way although I have not yet done a thorough research on it. Here is the link for your reference - Chat with Open Large Language Models

1 Like

This project has been going well so far. Since all the timeline has been passed, let me share the news:

  1. I made multiple SFT(Supervised Fine-Tuned) models more than 10 including Alpaca-LoRA(7/13/30/65B), GPT4 generated Alpaca LoRA(7/13B), and EvolInstruct Vicuna LoRA(7/13B). Additionally, I have had experimented fine-tuning with OpenLLaMA and StarCoder even though the results were not good enough yet. You can find out the full list of the produced models here

  2. Along the way, I have created GradioChat which is very similar to HuggingChat from Hugging Face but entirely built on top of Gradio. This is a helpful side project for users to experience multiple chat histories with saving/loading features. Also, this is particularly helpful when you don’t want to host models on different servers but in an app. Check out the project repo

1 Like

I want to extend this project to the next journey

  1. advancing currently on-going project with more capabilities. the LLM As Chatbot will eventually include features of comparing different models’ outputs in a single view. also, it will have additional feature to generate an entire conversation with the collaboration of different models. I hope this will be helpful since small LLMs are not generalists but they could work together to produce much better quality output.

    • and of course, more SOTA models will be continuously added such as RedPajama V1.
  2. advancing UI and supporting plug-in like feature. I am working closely with the Gradio team at Hugging Face to discuss about enhancing the current Gradio.Chatbot component for more features. Hopefully, this will make a much better environments to chat with Large Language Models.

  3. more fine-tuning study and share. with one of Hugging Face staff, we are going to fine-tune StarCoder with selected repositories of selected GitHub users to see if fine-tuned model could generate codes with one’s own coding style.

  4. more fine-tuning study and share. by teaming up with some Korean psychiatrist and psychological counselor, we are going to fine-tune LLMs with the actual consultation records to see if the fine-tuned models could be a good mental care chat assistant.

for these effort, I want to extend this project proposal for 3 more months (~ 08/31/2023) with the same system spec. I will share some intermediate results whenever I get something useful, and I am planning to serve the actually working application and share it within community at this time.

1 Like