Introduction to LangChain

Introduction

LangChain is an open-source framework that makes it easy to build applications using large language models (LLMs). It was created by Harrison Chase and released in October 2022. LangChain has over 41,900 stars on GitHub and over 800 contributors.

LangChain enables applications that:

Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

This framework consists of several parts.

LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
LangServe: A library for deploying LangChain chains as a REST API.
LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
Develop: Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
Productionize: Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy with confidence.
Deploy: Turn any chain into an API with LangServe.

Why LangChain became so popular?

The debut of ChatGPT in November 2022 paved the way for a paradigm shift in the realm of generative AI, democratising its accessibility. This groundbreaking achievement served as a catalyst for the subsequent unveiling of Large Language Models (LLMs) with manifold applications. Based on OpenAI’s GPT3.5 LLM, ChatGPT played a pivotal role in the popularity of generative AI and ushered in a new era of boundless possibilities.

Large Language Models (LLMs) GPT3.5 have revolutionised the field of natural language processing (NLP) by enabling machines to understand and generate human-like text. These powerful AI models have been extensively trained on vast amounts of data, allowing them to capture intricate patterns and nuances of language. LLMs excel in various NLP tasks such as machine translation, text summarization, sentiment analysis, and more. They have proven instrumental in enhancing human-computer interactions, powering virtual assistants, chatbots, and other conversational agents.

With LLMs becoming popular Harrison Chase started LangChain as an open source project in October 2022. Soon, with the launch of ChatGPT & subsequent popularity of LLMs interest rose in LLM-powered applications to solve complex problems. LangChain provided easy & efficient integrations with most LLMs and it also provided add-on functionalities to empower applications further. This led to a meteoric rise in the popularity of LangChain for LLM-powered applications.

LangChain Libraries

The main value props of the LangChain packages are:

Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks

Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.

LangChain Expression Language (LCEL)

LCEL is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.

Overview: LCEL and its benefits
Interface: The standard interface for LCEL objects
How-to: Key features of LCEL
Cookbook: Example code for accomplishing common tasks.

LangChain Modules

The framework is organised in modules each serving a distinct purpose in managing various aspects of interactions with Large Language Models (LLMs).

These modules provide developers with a structured approach to handle different facets of LLM interactions, allowing for efficient management and control over the behaviour and capabilities of the models. By utilising the specific functionalities offered by each module, developers can effectively tailor and customise their LLM interactions based on their application requirements, enhancing the overall performance and usability of their projects.

Models – Langchain provides a standard interface to connect different models like LLMs, Chat-Models(like ChatGPT) or other text embedding models.
Prompts – Prompts Module provides standard PromptTemplates for LLMs, ChatModels, etc. Some commonly used PromptTemplates are available to use as-is. This empowers in constructing complex prompts & makes prompts more reusable, enabling better prompt management & optimization. Output Parsers are also available to parse LLM outputs to any format we want.
Memory – Langchain provides the ability to persist states between multiple interactions with any LLM/ChatModel/Chain/Agent stateful. Memory refers to this persistent state or the ability to preserve context from previous & current calls to subsequent calls made to used models/LLMs. This is very useful when creating chat-like applications.
Indexes – To make our LLMs interact with data we need to provide interfaces and integrations for loading/querying/updating external data. Langchain provides multiple ways to do this such as Text-Splitters, Document-Loaders, VectorStores, and Retrievers.
Chains – Chains are the key building block of Langchain. Chains enable combining LLMs & Prompts to solve tasks. Multiple Chains can be combined to perform a series of tasks to solve complex problems. Langchain provides many chains to use out-of-the-box like SQL chain, LLM Math chain, Sequential Chain, Router Chain, etc.
Agents – Agent refers to an LLM/Model making a decision on actions to take, executing that action & then observing the outcomes, it repeats this until it is able to complete its objective. Agents enable LLMs to use tools like search engines, APIs, etc. LangChain has a lot of built-in tools like Wikipedia, Twilio, Zapier, Python REPL, etc.
Callbacks – Callbacks modules empower users to monitor, log, trace, and stream the output of LLM applications. Callbacks can be called independently, deeply nested or scoped to specific requests. Callbacks can be placed at any stage of the application.

Types of Models in LangChain

Langchain provides standard interfaces for using three different types of language models namely

LLMs

Large Language Models Machine learning models are capable of performing natural language tasks like text generation, summarization, text classification, translation, etc.

These models take text input & provide output also as text. Langchain provides a standard interface to interact with a variety of LLMs like GPT-3 by OpenAI, BERT by Google, RoBERTa Facebook AI, T5 by Google, CTRL by Salesforce Research, Megatron-Turing by NVIDIA etc.

Chat Models

Chat Models are a variation of LLMs. These models use LLMs under the hood but they expose a chat-like interface wherein they take input text & return output text. They specialise in chatting with a user.

Text Embedding Models

Embeddings create a vector representation of a piece of text. They are useful as embeddings help us visualise the text in vector space & do things like semantic search.

These models take text as input and return a list of numbers, the embedding of the text

The Embedding class is a class designed for interfacing with embeddings. Langchain provides a standard interface for all types of embedding models by the likes of OpenAI, Cohere, etc.

What are the integrations of LangChain?

LangChain typically builds applications using integrations with LLM providers and external sources where data can be found and stored. For example, LangChain can build chatbots or question-answering systems by integrating an LLM -- such as those from Hugging Face, Cohere and OpenAI -- with data sources or stores such as Apify Actors, Google Search and Wikipedia. This enables an app to take user-input text, process it and retrieve the best answers from any of these sources. In this sense, LangChain integrations make use of the most up-to-date NLP technology to build effective apps.

Other potential integrations include cloud storage platforms, such as Amazon Web Services, Google Cloud and Microsoft Azure, as well as vector databases. A vector database can store large volumes of high-dimensional data -- such as videos, images and long-form text -- as mathematical representations that make it easier for an application to query and search for those data elements. Pinecone is an example vector database that can be integrated with LangChain.

How does LangChain work?

At LangChain’s core is a development environment that streamlines the programming of LLM applications through the use of abstraction: the simplification of code by representing one or more complex processes as a named component that encapsulates all of its constituent steps.

Abstractions are a common element of everyday life and language. For example, “π” allows us to represent the ratio of the length of a circle’s circumference to that of its diameter without having to write out its infinite digits. Similarly, a thermostat allows us to control the temperature in our home without needing to understand the complex circuitry this entails—we only need to know how different thermostat settings translate to different temperatures.

LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and concepts necessary to work with language models. These modular components—like functions and object classes—serve as the building blocks of generative AI programs. They can be “chained” together to create applications, minimizing the amount of code and fine understanding required to execute complex NLP tasks. Though LangChain’s abstracted approach may limit the extent to which an expert programmer can finely customize an application, it empowers specialists and newcomers alike to quickly experiment and prototype.

LangChain use cases

Applications made with LangChain provide great utility for a variety of use cases, from straightforward question-answering and text generation tasks to more complex solutions that use an LLM as a “reasoning engine.”

Chatbots: Chatbots are among the most intuitive uses of LLMs. LangChain can be used to provide proper context for the specific use of a chatbot, and to integrate chatbots into existing communication channels and workflows with their own APIs.
Summarization: Language models can be tasked with summarizing many types of text, from breaking down complex academic articles and transcripts to providing a digest of incoming emails.
Question answering: Using specific documents or specialized knowledge bases (like Wolfram, arXiv or PubMed), LLMs can retrieve relevant information from storage and articulate helpful answers). If fine-tuned or properly prompted, some LLMs can answer many questions even without external information.
Data augmentation: LLMs can be used to generate synthetic data for use in machine learning. For example, an LLM can be trained to generate additional data samples that closely resemble the data points in a training dataset.
Virtual agents: Integrated with the right workflows, LangChain’s Agent modules can use an LLM to autonomously determine next steps and take action using robotic process automation (RPA).

Key Features of LangChain

Here are some of the key features of LangChain:

Connects LLMs to external sources: LangChain makes it easy to connect LLMs to external sources like Google, Wikipedia, Notion, and Wolfram. This allows developers to access a wider range of data and information when building their applications.
Provides abstractions and tools: LangChain provides a variety of abstractions and tools to help developers interface between text input and output. This makes it easier for developers to build applications that can understand and respond to natural language.
Links LLM models and components into a pipeline: LangChain links LLM models and components together in a pipeline. This makes it easy for developers to rapidly prototype robust applications.

Benefits of using Langchain

It makes it easier to develop LLM-powered applications.
It provides a standard interface for interacting with LLMs.
It has a library of pre-built components for common tasks.
It supports multiple LLMs.
It has tools for debugging and monitoring applications.