This chapter focuses on setting up everything needed to follow along with tutorial examples. We then introduce several model integrations, including OpenAI’s ChatGPT, Hugging Face models, Jina AI, and others, showing how to configure each provider and obtain the required API keys.
Finally, we walk through a practical, real-world example: an LLM-powered application for customer service. This use case highlights where LLMs can deliver the most impact and introduces best practices for working effectively with LangChain.
Table of Contents
Installing LangChain
LangChain is a Python library and can be installed like any other Python package. It is recommended to install it inside a virtual environment or managed environment (pip, Poetry, Conda, or Docker) to avoid dependency conflicts.
LangChain evolves quickly, so using a pinned version (as provided in this book’s repository) ensures compatibility with the examples.
Install with pip
If you are using pip, install LangChain from PyPI:
pip install langchain
To include common integrations (such as OpenAI support), install optional dependencies:
pip install "langchain[openai]"
Install with Poetry
If you are using Poetry, add LangChain as a project dependency:
poetry add langchain
Optional integrations can be added in the same way:
poetry add langchain openai
Install with Conda
LangChain itself is installed via pip inside a Conda environment. First activate your environment, then run:
conda activate langchain_ai
pip install langchain
Install in Docker
When using Docker, LangChain is installed as part of the image build process. The project’s Dockerfile already includes LangChain and all required dependencies. If you need to add it manually, include the following line:
RUN pip install langchain
Verify the installation
After installation, verify that LangChain is available by importing it in Python:
import langchain
print(langchain.__version__)
If no error is raised, LangChain is installed correctly and ready to use.
Exploring API Model Integrations
Before diving into generative AI, we need access to models like LLMs or text-to-image models to integrate into applications. Popular LLMs include GPT-4 (OpenAI), BERT and PaLM-2 (Google), LLaMA (Meta), and more.
LangChain supports many providers, including OpenAI, Hugging Face, Cohere, Anthropic, Azure, Google Cloud’s Vertex AI, and Jina AI. A full, up-to-date list is at LangChain LLM integrations.

LangChain supports three main model interfaces:
- Chat models – handle lists of messages as input, generate chat responses, ideal for conversational apps. (Chat models docs)
- LLMs – process text input and produce text output.
- Embedding models – convert text to numerical embeddings for NLP tasks (sentiment analysis, classification, search). (Embedding models docs)
For image models, popular options include OpenAI (DALL-E), Midjourney, and Stability AI (Stable Diffusion). LangChain doesn’t natively support non-text models, but Replicate provides an interface for Stable Diffusion.
Setting API Keys
Most providers require an API key. In Python, you can set keys like this:
import os
os.environ["OPENAI_API_KEY"] = "<your token>"
Or via terminal:
Linux/macOS:
export OPENAI_API_KEY=<your token>
Windows (cmd):
set OPENAI_API_KEY=<your token>
For permanent setup, add to ~/.bashrc, ~/.bash_profile, or a Windows batch script.
Recommended: Store keys in a config.py file (not tracked in Git) for security:
import os
OPENAI_API_KEY = "..." # Add other keys as needed
def set_environment():
variable_dict = globals().items()
for key, value in variable_dict:
if "API" in key or "ID" in key:
os.environ[key] = value
Use it like this in your app:
from config import set_environment
set_environment()
This loads all keys into the environment, avoiding hardcoding them in your code.
Next, we’ll explore a few prominent model providers with examples, starting with a fake LLM for testing purposes.
Fake LLM
The FakeLLM lets you simulate LLM responses for testing without making real API calls. This is ideal for rapid prototyping, unit testing, and avoiding rate limits. You can mock responses to validate agent behavior quickly.
For example, a simple FakeLLM returning "Hello":
from langchain.llms import FakeLLM
fake_llm = FakeLLM(responses=["Hello"])
For more complex agent testing, LangChain provides FakeListLLM. Here’s an example that uses a Python REPL tool and the ZERO_SHOT_REACT_DESCRIPTION strategy:
from langchain.llms.fake import FakeListLLM
from langchain.agents import load_tools, initialize_agent, AgentType
tools = load_tools(["python_repl"])
responses = [
"Action: Python_REPL\nAction Input: print(2 + 2)",
"Final Answer: 4"
]
llm = FakeListLLM(responses=responses)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run("whats 2 + 2")
Explanation:
- The agent decides actions using the React strategy.
- The tool
Python_REPLexecutes Python code when called:
class PythonREPLTool(BaseTool):
"""A tool for running python code in a REPL."""
name = "Python_REPL"
description = (
"A Python shell. Use this to execute python commands. "
"Input should be a valid python command. "
"If you want to see the output of a value, you should print it out with `print(...)`."
)
- FakeListLLM returns predetermined outputs, so the agent’s final answer is controlled (
4in this case). - Changing the second response to
"Final Answer: 5"would change the output, showing how mock responses control agent behavior.
This setup allows you to test agent logic before connecting to real LLMs like OpenAI.
OpenAI
As discussed in Chapter 1, OpenAI is a leading American AI research lab, especially known for generative AI and LLMs. They provide models of varying power for different tasks. In this chapter, we’ll explore interacting with OpenAI models via LangChain and the OpenAI Python client, including their Embedding class for text embeddings.
We’ll focus on OpenAI but also experiment with LLMs from other providers. When you send a prompt to an LLM API, it tokenizes the text. Token count affects API costs and usage, so strategies like smaller models, summarization, and input preprocessing help optimize results within budget.
Obtaining an OpenAI API Key
- Sign up at OpenAI Platform.
- Set up billing information.
- Go to Personal | View API Keys.
- Click Create new secret key and name it.
After generating the key, copy it and set it as an environment variable (OPENAI_API_KEY) or pass it when initializing the OpenAI class.
Using OpenAI LLMs
Here’s an example agent that calculates using OpenAI’s LLM:
from langchain.llms import OpenAI
llm = OpenAI(temperature=0.0, model="text-davinci-003")
agent = initialize_agent(
tools, llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
agent.run("whats 4 + 4")
Expected output:
> Entering new chain...
I need to add two numbers
Action: Python_REPL
Action Input: print(4 + 4)
Observation: 8
Thought: I now know the final answer
Final Answer: 4 + 4 = 8
> Finished chain.
'4 + 4 = 8'
Even simple tasks highlight how natural language prompts can produce precise results. Later, we’ll tackle more complex problems and explore other providers.
Hugging Face
Hugging Face is a major player in NLP, known for open-source tools and hosting solutions. They develop the Transformers Python library, supporting models like Mistral 7B, BERT, and GPT-2, compatible with PyTorch, TensorFlow, and JAX.
They also run the Hugging Face Hub, hosting over 120k models, 20k datasets, and 50k demo apps (Spaces). The Hub enables collaboration and easy access to models, embeddings, and datasets. Integrations like HuggingFaceHub allow text generation and classification, while HuggingFaceEmbeddings works with sentence-transformer models.
Other ecosystem libraries include:
- Datasets – dataset processing
- Evaluate – model evaluation
- Simulate – simulation
- Gradio – ML demos
Hugging Face contributed to the BigScience Workshop, releasing the open LLM BLOOM with 176B parameters. They’ve raised significant funding (Series B: $40M; Series C: $2B valuation) and partnered with Graphcore and AWS to expand their reach.
To use Hugging Face models, create an account and API key at Hugging Face Profile and set it as HUGGINGFACEHUB_API_TOKEN.
Example: Flan-T5-XXL
from langchain.llms import HuggingFaceHub
llm = HuggingFaceHub(
model_kwargs={"temperature": 0.5, "max_length": 64},
repo_id="google/flan-t5-xxl"
)
prompt = "In which country is Tokyo?"
completion = llm(prompt)
print(completion)
Output:
japan
The model takes a text input and returns a completion, demonstrating its knowledge and ability to answer questions naturally.
Jina AI
Founded in 2020 by Han Xiao and Xuanbin He, Jina AI is a Berlin-based company focused on cloud-native neural search for text, image, audio, and video. Its open-source ecosystem enables scalable information retrieval, and recent tools like Finetuner support fine-tuning neural networks for specific use cases.
Jina AI has raised $37.5M across three funding rounds, with investors including GGV Capital and Canaan Partners.
You can create an account at https://chat.jina.ai/api, where APIs are available for tasks such as text and image embeddings, image captioning, visual reasoning, and visual question answering (VQA).
Although Jina APIs are not yet directly supported by LangChain, they can be integrated by subclassing LangChain’s LLM interface.
Jina AI Chat Example
After generating an API token (JINACHAT_API_KEY), we can use Jina AI via LangChain’s chat model. Here’s a translation example:
from langchain.chat_models import JinaChat
from langchain.schema import HumanMessage
chat = JinaChat(temperature=0.0)
messages = [
HumanMessage(
content="Translate this sentence from English to French: I love generative AI!"
)
]
chat(messages)
Output:
AIMessage(content="J'adore l'IA générative !", ...)
A lower temperature produces more predictable responses.
Let’s try a recommendation task:
from langchain.schema import SystemMessage, HumanMessage
chat = JinaChat(temperature=0.0)
chat(
[
SystemMessage(
content="You help a user find a nutritious and tasty food to eat in one word."
),
HumanMessage(
content="I like pasta with cheese, but I need to eat more vegetables, what should I eat?"
)
]
)
Example response:
AIMessage(content="A tasty and nutritious option could be a vegetable pasta dish...", ...)
The model ignored the one-word constraint but provided useful suggestions.
Replicate
Founded in 2019, Replicate Inc. is a San Francisco–based startup that simplifies deploying, running, and fine-tuning AI models via cloud infrastructure. It supports public and private models and raised $12.5M in Series A funding led by Andreessen Horowitz, with support from Y Combinator and Sequoia.
Replicate’s founders created Cog, an open-source tool that packages ML models into production-ready containers with auto-generated APIs, enabling scalable GPU deployment.
Models can be explored at https://replicate.com/explore. After authenticating, copy your API token and set it as REPLICATE_API_TOKEN.
Image Generation Example
from langchain.llms import Replicate
text2image = Replicate(
model="stability-ai/stable-diffusion:db21e45d3f7023abc2a46ee38a23973f6dce16bb082a930b0c49861f96d1e5bf",
input={"image_dimensions": "512x512"},
)
image_url = text2image(
"a book cover for a book about creating generative ai applications in Python"
)
Other Providers
Many other providers exist, and we’ll encounter more throughout the book. Two notable ones are Azure and Anthropic.
Azure
Microsoft Azure integrates OpenAI models such as GPT, Codex, and embeddings, supporting use cases like summarization, code generation, and semantic search. Accounts can be created at https://azure.microsoft.com, with API keys available under Cognitive Services | Azure OpenAI.
While Azure models are accessible via LangChain’s AzureOpenAI() interface, the setup process can be complex and may require additional account validation.
Anthropic
Founded in 2021 by former OpenAI researchers, Anthropic focuses on responsible AI development. The company has raised $1.5B and is best known for Claude, a ChatGPT-like assistant. Access is currently limited and requires approval, along with setting the ANTHROPIC_API_KEY.
Exploring Local Models
LangChain also supports running models locally, giving you full control over your data and eliminating the need for API tokens or internet access.
⚠️ Note on resources: LLMs are large and memory-intensive. Even quantized models require significant RAM (roughly 1B parameters ≈ 1 GB RAM). While the examples here run on modest hardware, larger models may be slow or crash notebooks. Hosted environments like Google Colab, Kubernetes, GPUs, or TPUs can help with heavier workloads.
We’ll briefly explore Hugging Face Transformers, llama.cpp, and GPT4All—powerful tools with far more features than we can fully cover here.
Hugging Face Transformers
A common way to run local models is via the Hugging Face pipeline API:
from transformers import pipeline
import torch
generate_text = pipeline(
model="aisquared/dlite-v1-355m",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
framework="pt"
)
generate_text(
"In this chapter, we'll discuss first steps with generative AI in Python."
)
This downloads the tokenizer and model weights automatically. The example model (355M parameters) is small, efficient, and instruction-tuned.
If needed, install dependencies:
pip install transformers accelerate torch
You can integrate this pipeline directly into LangChain:
from langchain import PromptTemplate, LLMChain
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=generate_text)
question = "What is electroencephalography?"
print(llm_chain.run(question))
This example also demonstrates structured prompting using PromptTemplate.
llama.cpp
llama.cpp, maintained by Georgi Gerganov, is a high-performance C++ implementation for running LLaMA-style models efficiently on CPUs (with optional GPU support).
Prerequisites include an md5 checksum tool:
brew install md5sha1sum
Clone and install dependencies:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
pip install -r requirements.txt
If needed:
pip install 'blosc2==2.0.0' cython FuzzyTM
Compile the project:
make -C . -j4
After downloading model weights (for example, OpenLLaMA 3B) and placing them in models/3B/, convert the model:
python3 convert.py models/3B/ --ctx 2048
Optional quantization reduces memory usage:
./quantize ./models/3B/ggml-model-f16.gguf ./models/3B/ggml-model-q4_0.bin q4_0
You can then use the model in LangChain:
llm = LlamaCpp(
model_path="./ggml-model-q4_0.bin",
verbose=True
)
Quantized models are smaller and run on more modest hardware.
GPT4All
GPT4All builds on llama.cpp but offers a simpler installation and user experience, supporting model execution, serving, and customization. It supports multiple architectures, including:
- GPT-J
- LLaMA
- MPT
- Replit
- Falcon
- StarCoder
Available models and benchmarks are listed at: https://gpt4all.io/
Example usage:
from langchain.llms import GPT4All
model = GPT4All(
model="mistral-7b-openorca.Q4_0.gguf",
n_ctx=512,
n_threads=8
)
response = model(
"We can run large language models locally for all kinds of applications, "
)
This downloads a Mistral 7B OpenOrca model (~3.8 GB disk, ~8 GB RAM), demonstrating that strong chat models can run entirely offline.
Building an Application for Customer Service
Customer service agents handle inquiries, resolve issues, and manage complaints—directly impacting customer satisfaction, loyalty, and business success. Generative AI can significantly support their work by automating and augmenting common tasks:
- Sentiment classification – detect customer emotions to tailor responses
- Summarization – condense long messages into key points
- Intent classification – identify the customer’s goal for faster resolution
- Answer suggestions – propose accurate, consistent replies
Combined, these capabilities enable faster, more accurate responses and improved customer experience.
LangChain’s flexibility allows us to combine different models and providers. In this prototype, we’ll process a customer email to:
- Classify sentiment
- Summarize the content
- Identify customer intent
We’ll explore question answering in more depth later (Chapter 5).
Sentiment Analysis with LLMs and Smaller Models
LLMs are highly effective at open-domain sentiment classification—often without additional training—especially with well-designed prompts. This has been shown in studies such as Is ChatGPT a Good Sentiment Analyzer? (Wang et al., April 2023).
A simple sentiment prompt might look like:
Given this text, what is the sentiment conveyed? Is it positive, neutral, or negative?
Text: {sentence}
Sentiment:
While LLMs are powerful, they can be slower and more expensive. For focused tasks like sentiment analysis, smaller transformer models (via Hugging Face or spaCy-based providers) are often sufficient and more efficient.
Popular Text Classification Models
We can list the most downloaded text-classification models on Hugging Face:
from huggingface_hub import list_models
def list_most_popular(task: str):
for rank, model in enumerate(
list_models(filter=task, sort="downloads", direction=-1)
):
if rank == 5:
break
print(f"{model.id}, {model.downloads}\n")
list_most_popular("text-classification")
Table 1: Most popular text classification models on Hugging Face Hub
| Model | Downloads |
|---|---|
| distilbert-base-uncased-finetuned-sst-2-english | 40,672,289 |
| cardiffnlp/twitter-roberta-base-sentiment | 9,292,338 |
| MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli | 7,907,049 |
| cardiffnlp/twitter-roberta-base-irony | 7,023,579 |
| SamLowe/roberta-base-go_emotions | 6,706,653 |
These models typically focus on narrow categories such as sentiment, emotion, or irony—ideal for customer service pipelines.
Sentiment Analysis Example
Using a shortened customer email complaining about a broken coffee machine:
from transformers import pipeline
customer_email = """
I am writing to pour my heart out about the recent unfortunate experience I had
with one of your coffee machines that arrived broken...
"""
sentiment_model = pipeline(
task="sentiment-analysis",
model="cardiffnlp/twitter-roberta-base-sentiment"
)
print(sentiment_model(customer_email))
Output:
[{'label': 'LABEL_0', 'score': 0.5822020173072815}]
LABEL_0 corresponds to negative sentiment. While this Twitter-trained model isn’t ideal for long emails, it still captures overall tone effectively.
Summarization
Popular summarization models on Hugging Face include:
Table 2: Most popular summarization models
| Model | Downloads |
|---|---|
| facebook/bart-large-cnn | 4,637,417 |
| t5-small | 2,492,451 |
| t5-base | 1,887,661 |
| sshleifer/distilbart-cnn-12-6 | 715,809 |
| t5-large | 332,854 |
Let’s summarize the email using a hosted model (requires HUGGINGFACEHUB_API_TOKEN):
from langchain import HuggingFaceHub
summarizer = HuggingFaceHub(
repo_id="facebook/bart-large-cnn",
model_kwargs={"temperature": 0, "max_length": 180}
)
def summarize(llm, text) -> str:
return llm(f"Summarize this: {text}!")
summarize(summarizer, customer_email)
The result is usable but still verbose—highlighting why LLM-based summarization with tailored prompts is often preferable. We’ll revisit this in Chapter 4.
Intent Classification with Vertex AI
To identify the customer’s issue, we can use an LLM with constrained categories:
from langchain.llms import VertexAI
from langchain import PromptTemplate, LLMChain
template = """Given this text, decide what is the issue the customer is concerned about.
Valid categories are:
* product issues
* delivery problems
* missing or late orders
* wrong product
* cancellation request
* refund or exchange
* bad support experience
* no clear reason to be upset
Text: {email}
Category:
"""
prompt = PromptTemplate(template=template, input_variables=["email"])
llm = VertexAI()
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
print(llm_chain.run(customer_email))
Output:
product issues
This correctly identifies the intent of the complaint.
Wrapping Up
This example shows how quickly we can combine sentiment analysis, summarization, and intent classification in LangChain to build a practical customer service tool. Such systems can handle routine requests and assist human agents with faster context and better insights.
Summary
In this chapter, we:
- Covered multiple ways to set up LangChain environments
- Explored several model providers for text and image generation
- Built a customer service prototype for sentiment analysis and intent classification
By chaining models and tools, LangChain makes it easy to create useful, production-ready AI workflows.
Next, in Chapter 4 (Building Capable Assistants) and Chapter 5 (Building a Chatbot Like ChatGPT), we’ll dive deeper into assistants, tools, and retrieval-augmented question answering.