Learn how to integrate Language Models (LLMs) with SQL databases using LangChain.js

Learn how to integrate Language Models (LLMs) with SQL databases using LangChain.js

A Step-By-Step Tutorial

·

7 min read

(This tutorial is intentionally oversimplified and overly instructive with the objective of helping make AI more approachable. Examples are from the LangChain library with more "hand holding" for those who need help getting started)

The modern enterprise runs on data. The majority of the data that drives the enterprise is stored in relational databases. The value of LLMs in the enterprise increases by orders of magnitude when the power of LLMs is extended to the enterprise's relational database. When LLMs can access the enterprise's relational databases, companies' business analytics and intelligence is going to increase exponentially as LLMs uncover deeper, previously unknown insights. Most importantly, with access to enterprise data, LLMs can connect the dots for businesses at both micro and macro levels.

LangChain.js has out-of-the-box components that extremely simplify the integration of LLMs with SQL databases. The objective of this tutorial is to help you understand the step-by-step process of integrating LLMs with SQL databases using LangChain.js.

Pre-requisites & Setup

If your environment is not set up for LangChainers LangChain.js tutorials, follow the instructions on the LangChain.js Tutorial Setup page before proceeding to the next section.

Q&A using SQL Toolkit

Setup

  • SQL Lite

    This example uses SQL Lite. SQL is a serverless relational database that you can run on the local computer that you are using for this tutorial. Follow the instructions for downloading and installing SQL Lite here.

  • Chinook Database
    The Chinook database is required for this Tutorial. Follow the instructions for downloading and installing the Chinook database here. Set up the Chinook database in your langchainjs-tutorial-1 folder.

Code

Copy the following code into src/app.ts

//Import the OpenAPI Large Language Model (you can import other models here eg. Cohere)
import { OpenAI } from "langchain";

//Import the SqlDatabase LangChain tool that enables us to access a SQL database
import { SqlDatabase } from "langchain/tools";

//Import the function to create a SQL agent and the SqlToolkit that contas the essentioal tools for querying a SQL databse
import { createSqlAgent, SqlToolkit } from "langchain/agents";

//Import SQLLite. This is the relational database that we are going to use to query for data
import sqlite3 from "sqlite3";

//Load environment variables (populate process.env from .env file)
import * as dotenv from "dotenv";
dotenv.config();

export const run = async () => {

  //Initialize the SQL Lite relational datbase. In this intialization, we are also telling it that we'll be using the 
  //Chinook database.
  const db = await new sqlite3.Database("Chinook.db");

  //Initialize the SqlToolkit passing to it the SQL Lite object that was initialized with the Chinook database above.
  //The SqlToolkit is a container for 4 SQL related tools:
  // 1 - QuerySqlTool: Queries the database using SQL that is generated by LLM
  // 2 - InfoSqlTool: Provides the schema of a table (aka table description) and a sample of the rows in the table
  // 3 - ListTablesSqlTool: List tables in the SQL databse. This is used to ensure that the tables identified in the SQL 
  //     generated by LLM actual exist in the database
  // 4 - QueryCheckerTool:  A query cheker that double cheks that a query is correct before executing it
  const tookit = new SqlToolkit(new SqlDatabase(db));

  //Instantiante the OpenAI model 
  //Pass the "temperature" parameter which controls the RANDOMNESS of the model's output. A lower temperature will result 
  //in more predictable output, while a higher temperature will result in more random output. The temperature parameter is 
  //set between 0 and 1, with 0 being the most predictable and 1 being the most random
  const model = new OpenAI({ temperature: 0 });

  //Construct an agent from an LLM and the SqlToolkit. The advantage of using a Toolkit is that, instead of passing the 4 tools 
  //contained in SQLToolKit, we pass only the SQLToolKit to the agent creator.
  //
  //The agent created by this function uses ZeroShotAgent (ie. "zero-shot-react-description") by default.  ZeroShotAgent uses 
  //the ReAct framework to determine which tool to use. The ReAct framework determines which tool to use based solely on the
  //tool’s description. Any number of tools can be provided. This agent requires that a description is provided for each tool.
  //
  //The key to uderstanding how this whole agent works is to understand that createSqlAgent also sets up a PROMPT that tells 
  //the LLM to generate SQL statement for the {input} (i.e. th question) provided.
  //Here is the PROMPT:
  //
  //           *********START PROMPT******
  //  You are an agent designed to interact with a SQL database.
  //  Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
  //  Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results using the LIMIT clause.
  //  You can order the results by a relevant column to return the most interesting examples in the database.
  //  Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
  //  You have access to tools for interacting with the database.
  //  Only use the below tools. Only use the information returned by the below tools to construct your final answer.
  //  You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.
  //
  //  DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
  //
  //  If the question does not seem related to the database, just return "I don't know" as the answer.
  //  Question: {input}
  //  Thought: I should look at the tables in the database to see what I can query.
  //           *********END PROMPT******
  const executor = createSqlAgent(model, tookit);

  //Set the question that you want to ask.  Note that this question is the {input} variable in the prompt. This means the string below will 
  //be used to replace {input} in the prompt
  const input = `Who is our highest spending customer? How much have they spent so far? How many years have they been a customer?`;

  console.log(`YOUR QUESTION IS:  "${input}"...`);

  //Execute the chain passing the question that will be used to replace {input} in the prompt. Generally, here are the steps 
  //of what happens on this call (intentionally staying high level to enable general understandig of concept):
  //STEP 1: Prompt containing the question is passed to the LLM 
  //STEP 2: LLM responds with a SQL query to execute
  //STEP 3: Agent uses the SQLToolKit to confirm that the query suggested by LLM is executable on the tables in the database. If
  //        SQL queery suggested by the LLM can be executed in databse, the agent executes the SQL query
  //STEP 4: Agent send the results of the query to LLM to form answer to question
  //STEP 5: LLM returns  answer to question
  const result = await executor.call({ input });

  //Display the results
  console.log(`THE ANSWER TO YOUR QUESTION IS: ${result.output}`);

  console.log(
    `\n\nHERE ARE THE INTERMEDIATE STEPS THAT WERE EXECUTED TO COME UP WITH ANSWER  ${JSON.stringify(
      result.intermediateSteps,
      null,
      2
    )}`
  );

  db.close();
};
run();
  • Execute the code with the following command

      $ npm run dev
    
  • Result of the execution

      YOUR QUESTION IS: "Who is our highest spending customer? How much have they spent so far? How many years have they been a customer?"...
    
      (node:7199) ExperimentalWarning: The Fetch API is an experimental feature. This feature could change at any time (Use node --trace-warnings ... to show where the warning was created)
    
      THE ANSWER TO YOUR QUESTION IS: Helena Holý has spent $49.62 and has been a customer for 7 years.
    
      HERE ARE THE INTERMEDIATE STEPS THAT WERE EXECUTED TO COME UP WITH ANSWER
    
[
  {
    "action": {
      "tool": "list-tables-sql",
      "toolInput": "",
      "log": "Action: list-tables-sql\nAction Input: "
    },
    "observation": "Album\nArtist\nCustomer\nEmployee\nGenre\nInvoice\nInvoiceLine\nMediaType\nPlaylist\nPlaylistTrack\nTrack"
  },
  {
    "action": {
      "tool": "info-sql",
      "toolInput": "Customer",
      "log": " I should look at the schema of the Customer table to see what columns I can query.\nAction: info-sql\nAction Input: Customer"
    },
    "observation": "CREATE TABLE [Customer]\n(\n    [CustomerId] INTEGER  NOT NULL,\n    [FirstName] NVARCHAR(40)  NOT NULL,\n    [LastName] NVARCHAR(20)  NOT NULL,\n    [Company] NVARCHAR(80),\n    [Address] NVARCHAR(70),\n    [City] NVARCHAR(40),\n    [State] NVARCHAR(40),\n    [Country] NVARCHAR(40),\n    [PostalCode] NVARCHAR(10),\n    [Phone] NVARCHAR(24),\n    [Fax] NVARCHAR(24),\n    [Email] NVARCHAR(60)  NOT NULL,\n    [SupportRepId] INTEGER,\n    CONSTRAINT [PK_Customer] PRIMARY KEY  ([CustomerId]),\n    FOREIGN KEY ([SupportRepId]) REFERENCES [Employee] ([EmployeeId]) \n\t\tON DELETE NO ACTION ON UPDATE NO ACTION\n)\n{\"CustomerId\":1,\"FirstName\":\"Luís\",\"LastName\":\"Gonçalves\",\"Company\":\"Embraer - Empresa Brasileira de Aeronáutica S.A.\",\"Address\":\"Av. Brigadeiro Faria Lima, 2170\",\"City\":\"São José dos Campos\",\"State\":\"SP\",\"Country\":\"Brazil\",\"PostalCode\":\"12227-000\",\"Phone\":\"+55 (12) 3923-5555\",\"Fax\":\"+55 (12) 3923-5566\",\"Email\":\"luisg@embraer.com.br\",\"SupportRepId\":3}\n{\"CustomerId\":2,\"FirstName\":\"Leonie\",\"LastName\":\"Köhler\",\"Company\":null,\"Address\":\"Theodor-Heuss-Straße 34\",\"City\":\"Stuttgart\",\"State\":null,\"Country\":\"Germany\",\"PostalCode\":\"70174\",\"Phone\":\"+49 0711 2842222\",\"Fax\":null,\"Email\":\"leonekohler@surfeu.de\",\"SupportRepId\":5}\n{\"CustomerId\":3,\"FirstName\":\"François\",\"LastName\":\"Tremblay\",\"Company\":null,\"Address\":\"1498 rue Bélanger\",\"City\":\"Montréal\",\"State\":\"QC\",\"Country\":\"Canada\",\"PostalCode\":\"H2G 1A7\",\"Phone\":\"+1 (514) 721-4711\",\"Fax\":null,\"Email\":\"ftremblay@gmail.com\",\"SupportRepId\":3}"
  },
  {
    "action": {
      "tool": "query-sql",
      "toolInput": "SELECT FirstName, LastName, SUM(Total) AS TotalSpent, COUNT(DISTINCT Invoice.InvoiceId) AS YearsAsCustomer FROM Customer INNER JOIN Invoice ON Customer.CustomerId = Invoice.CustomerId GROUP BY Customer.CustomerId ORDER BY TotalSpent DESC LIMIT 1",
      "log": " I should query the Customer table to get the highest spending customer.\nAction: query-sql\nAction Input: SELECT FirstName, LastName, SUM(Total) AS TotalSpent, COUNT(DISTINCT Invoice.InvoiceId) AS YearsAsCustomer FROM Customer INNER JOIN Invoice ON Customer.CustomerId = Invoice.CustomerId GROUP BY Customer.CustomerId ORDER BY TotalSpent DESC LIMIT 1"
    },
    "observation": "{\"FirstName\":\"Helena\",\"LastName\":\"Holý\",\"TotalSpent\":49.620000000000005,\"YearsAsCustomer\":7}"
  }
]

That's it!! That's how easy it is to provide and perform Q&A using relational data in a SQL database using LangChain.js.