8. Large Language Models
Large Language Models serve as an intelligent intermediary between the user and data, collecting and formatting the information as requested, and making inferences on requests to perform evaluations or simulations as needed, to give accurate info on the expected result.
LLMs in World Modeler™ utilize a RAG architecture to ensure that the appropriate context is maintained, and that the returned information is valid.
As an asset, WMLLMs follow many of the rules mentioned in the AssetDirectory section. They are considered a file of the filesystem, and will be returned whenever their parent directory is requested via the Rest API. WMLLMs also have the following properties:
- Every WMLLM possesses an API Key and llmType determining the provider that should be accessed, and the credentials to access that provider. API Keys are held on behalf of the client, and every client must have their own API Key.
- All of the supported
llmType
's are as followsLLM Type Description ChatGPT This WMLLM uses OpenAI as a LLM provider. "modelName" must be one of the models in the OpenAI specifications, such as "gpt-3.5-turbo" - Every WMLLM possesses a distinct prompt, which determines the context of the conversation that can be held with it.
- WMLLMs also possess additional model information, such as model name, temperature, max tokens, etc... These are model specific, and it is up to that particular WMLLM to determine whether they are used.
- The API Key and prompt are encrypted to and from the database.
When sent to and from World Modeler™ via the REST API, WMLLMs are represented with JSON strings as follows:
{
"id": "unique_GUID",
"name": "unique_name_among_siblings",
"summary": "Short summary of this data source",
"documentation": "string",
"documentationMIMEType": "string",
"parent": "/Path/To/Parent",
"prompt": "You are an assistant for ...",
"apiKey": "...",
"llmType": "ChatGPT",
"modelName": "gpt-3.5-turbo",
"temperature": 0.2,
"maxTokens": 4000,
"tools": ""
}
Note that "parent"
may be either the unique ID of the parent, or the FullPathName of the parent.
WMLLMs support the following REST API endpoints. Note that all {dir} parameters accept both the unique ID of the directory, as well as the URL encoded FullPathName of the directory. Only WMLLMs owned by the user or affiliated organizations can be accessed,
Method | Operation | Description |
GET | /llm | Retrieve all LLMs the user has access to. |
GET | /llm/{dir} | Retrieve the LLM specified by {dir}. |
POST | /llm | Creates a new LLM according to the supplied JSON string. (see above) Note that the ID of the supplied JSON will be checked for uniqueness, and will fail if not unique. If the ID is not supplied, or left blank, a GUID will be generated. |
PUT | /llm/{dir} | Updates the LLM specified with {dir} according to the supplied JSON string. (see above) |
DELETE | /llm/{dir} | Deletes the LLM specified with {dir}. |
GET | /llm/{dir}/apikey | Retrieve the plaintext API key for the LLM specified by {dir}. |
POST | /llmConversation | Initiates a conversation according to the supplied JSON string (See below). The LLM provider's response is streamed to the receiver piece by piece with plaintext JSON strings. (see below) To continue the conversation, the conversation ID supplied with the responses should be added to the input JSON string on subsequent calls to this REST endpoint. |
PUT | /llmConversation/{conversationID}/close | Ends the conversation assigned {conversationID}. This will remove the specified conversation from the memory cache, and store the conversation in a journal. |
Conversation Input JSON String
In order to call the "/llmConversation"
endpoint, the ID of the WMLLM to use, and the message to send in the conversation must be populated, resulting in the JSON string
{
"llm": "aa136a1b-d00a-49f1-be9b-018fbe060cd4",
"message": "Please tell me how to make marinara sauce."
}
On subsequent calls to continue this conversation, the "conversationID"
must also be supplied, such as
{
"conversationID": "...",
"llm": "aa136a1b-d00a-49f1-be9b-018fbe060cd4",
"message": "Rewrite this recipe using weights instead of volumes."
}
Conversation Output JSON String
Because LLM providers often stream results token by token instead of as the whole result, so as to prevent a long wait time, we will be streaming results as well. Each message received from the LLM provider is immediately streamed to the receiver via the HTTPResponse body stream. This means that the receiver must be designed to listen for each write operation to the stream, and write the result to the UI as they come in.
Each write to the stream is formatted as a plaintext JSON string with the following structure.
{
"message": {
"content": "?",
"role": "assistant"
},
"ConversationID": "ed0d11fe-9d49-4f0d-8166-d1ea0a5e9463"
}
The message sent at that moment is in the "message"
field, the "ConversationID"
field contains the ID that must be supplied in subsequent messages sent in the conversation.
Retrieval Augmented Generation (RAG) Implementation
In addition to LLMs, World Modeler™ makes use of vector databases to augment LLM prompts with additional information, improving the fidelity of LLM output. This is achieved with the VectorIndex asset, which acts as a store of text bodies, each of which is fragmented and vectorized for use as context for prompts.
There is a many to many association between LLMs and VectorIndices. Many LLMs can make use of the same VectorIndex for use in RAG, and a single LLM can make use of many VectorIndices in its RAG. These associations are created and destroyed with the appropriate endpoints. Additionally, LLMs and VectorIndex must use the same embedding provider, otherwise it will be impossible to compare the vectors.
As an asset, VectorIndices follow many of the rules mentioned in the AssetDirectory section. They are considered a file of the filesystem, and will be returned whenever their parent directory is requested via the Rest API. VectorIndices also have the following properties:
- Every VectorIndex possesses an API Key and llmType determining the provider that should be accessed, and the credentials to access that provider. API Keys are held on behalf of the client, and every client must have their own API Key.
- All of the supported
llmType
's are as followsLLM Type Description ChatGPT This VectorIndex uses OpenAI as a LLM provider. "modelName" must be one of the models in the OpenAI specifications, such as "gpt-3.5-turbo" - Every VectorIndex possesses a window size, which represents how many characters should be in a text fragment.
When sent to and from World Modeler™ via the REST API, VectorIndices are represented with JSON strings as follows:
{
"id": "unique_GUID",
"name": "unique_name_among_siblings",
"summary": "Short summary of this vector index",
"documentation": "string",
"documentationMIMEType": "string",
"parent": "/Path/To/Parent",
"apiKey": "...",
"llmType": "ChatGPT",
"windowSize": "500",
}
Note that "parent"
may be either the unique ID of the parent, or the FullPathName of the parent.
VectorIndices support the following REST API endpoints. Note that all {dir} parameters accept both the unique ID of the directory, as well as the URL encoded FullPathName of the directory. Only VectorIndices owned by the user or affiliated organizations can be accessed,
Method | Operation | Description |
GET | /llm/vectorindex | Retrieve all VectorIndices the user has access to. |
GET | /llm/vectorindex/{dir} | Retrieve the VectorIndex specified by {dir}. |
POST | /llm/vectorindex | Creates a new VectorIndex according to the supplied JSON string. (see above) Note that the ID of the supplied JSON will be checked for uniqueness, and will fail if not unique. If the ID is not supplied, or left blank, a GUID will be generated. |
PUT | /llm/vectorindex/{dir} | Updates the VectorIndex specified with {dir} according to the supplied JSON string. (see above) |
DELETE | /llm/vectorindex/{dir} | Deletes the VectorIndex specified with {dir}. |
GET | /llm/vectorindex/{dir}/text | Retrieve all text bodies contained in the VectorIndex specified by {dir}. |
PUT | /llm/vectorindex/{dir}/insert | Inserts a new text body into the VectorIndex specified with {dir} according to a supplied JSON string with the single parameter "message. |
DELETE | /llm/vectorindex/{dir}/remove/{textID} | Deletes the text body from the VectorIndex specified with {dir}, where {textID} is the ID of the text body in question. |
PUT | /llm/vectorindex/query/{dir} | Queries the LLM specified with {dir} for contextually similar text fragments to the supplied text. The text should be supplied with a JSON string with a single parameter "message". |
PUT | /llm/vectorindex/{vi_dir}/associate/{llm_dir} | Associates the VectorIndex specified with {vi_dir} with the LLM specified with {llm_dir}. This association will result in all conversation messages sent to specified LLM utilizing the specified VectorIndex for RAG. |
DELETE | /llm/vectorindex/{vi_dir}/disassociate/{llm_dir} | Disassociates the VectorIndex specified with {vi_dir} with the LLM specified with {llm_dir}. If they were previously associated, this disassociation will stop this VectorIndex from being used with the LLM in RAG. |
GET | /llm/vectorindex/associated/{llm_dir} | Returns an array of all VectorIndices associated with the LLM specified with {llm_dir}. |