Harness LLM Output-parsers for a Structured Ai

Rupak (Bob) Roy - II
5 min readAug 13, 2024

--

Unlock the Full Guide to Easy Setup of Output Parsers: From CommaSeparatedList to Pydantic, JSON and more.

Hi everyone today we will look into some powerful pre-build functions of LLM that will help us format the output/results. We will call it LLM OutputParsers

Here are the commonly used:

  1. CommaSeparatedListOutputParser
  2. StructuredOutputParser/Response Schema
  3. JsonOutputParser: pydantic & without pydantic
  4. PydanticOutputParser
  5. DatetimeOutputParser

Let’s get started.

Amanora, Pune, #Home

Likewise in our previous articles, we will be using Huggingface model API calls which provide better token limits than open ai.

First login to the Hugging face and generate the API key(Access Token)

huggingface
Huggingface
#######################################################
#Step up the LLM Environment
#######################################################

from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

##################################################
#Model API call
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
repo_id=repo_id,
max_length=128,
temperature=0.5,
huggingfacehub_api_token= "hf_yourkey")

CommaSeparatedListOutputParser: Parse the output of an LLM call to a comma-separated list.


#############################################
# CommaSeparatedListOutputParser ############
#############################################
from langchain.output_parsers import CommaSeparatedListOutputParser

#CommaSeparatedListOutputParser: Parse the output of an LLM call to a comma-separated list.
output_parser = CommaSeparatedListOutputParser()

#view the format
format_instructions = output_parser.get_format_instructions()
format_instructions

prompt = PromptTemplate(
template="Provide 5 examples of {query}.\n {format_instructions}",
input_variables=["query"],
partial_variables={"format_instructions": format_instructions}
)

llm = llm #misterial
prompt = prompt.format(query="Currencies")

output = llm.invoke(prompt)
print(output)
print(output)

#another approach using LCEL ------------
output_parser = CommaSeparatedListOutputParser()
prompt = PromptTemplate(
template="Provide 5 examples of {query}.\n {format_instructions}",
input_variables=["query"],
partial_variables={"format_instructions": format_instructions}
)
llm1 = prompt|llm |output_parser

results = llm1.invoke({"query" : "chocolates"})
results = llm1.invoke({“query” : “chocolates”})

2. Json Format using StructuredOutputParse, ResponseSchema

#StructuredOutputParser: This output parser can be used when you want to return multiple fields. While the Pydantic/JSON parser is more powerful, this is useful for less powerful models.
#ResponseSchema: Schema for a response from a structured output parse.


from langchain.output_parsers import StructuredOutputParser, ResponseSchema
#StructuredOutputParser:This output parser can be used when you want to return multiple fields. While the Pydantic/JSON parser is more powerful, this is useful for less powerful models.
#ResponseSchema: Schema for a response from a structured output parse

response_schemas = [
ResponseSchema(name="currency", description="answer to the user's question"),
ResponseSchema(name="abbrevation", description="Whats the abbrebation of that currency")
]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
print(output_parser)

format_instructions = output_parser.get_format_instructions()
print(format_instructions)

prompt = PromptTemplate(
template="answer the users question as best as possible.\n{format_instructions}\n{query}",
input_variables=["query"],
partial_variables={"format_instructions": format_instructions}
)

#prompt = prompt.format(query="list me the currencies of europe")
output = llm.invoke(prompt.format(query="list me the currencies of europe"))
print(output)
output = llm.invoke(prompt.format(query=”list me the currencies of europe”))
#Scenario without currency
#prompt = prompt.format(query="list me the chcolates?")
output = llm.invoke(prompt.format(query="list me the chcolates?"))
print(output)
output = llm.invoke(prompt.format(query=”list me the chocolates?”))

if you wish to use for chocolates try to change the Response Schema

 ResponseSchema(name="chocolates", description="answer to the user's question"),
ResponseSchema(name="abbrevation", description="Whats the abbrebation of that chocolate")

3.JsonOutputParser: pydantic & without pydantic

Parses the output of an LLM call to a JSON object

#Pydantic
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

# Define your desired data structure.
class Joke(BaseModel):
setup: str = Field(description="question to set up a joke")
punchline: str = Field(description="answer to resolve the joke")

# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser
chain.invoke({"query": joke_query})


#without Pydantic --------------------------

joke_query = "Tell me a joke."
parser = JsonOutputParser()#without pydantic_object

prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser
chain.invoke({"query": joke_query})
chain.invoke({“query”: joke_query})

4.PydanticOutputParser

Parses an output using a pydantic model.


#######################################################
# PydanticOutputParser ################################
#######################################################
#PydanticOutputParser: Parse an output using a pydantic model.

from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

llm = llm

# Define your desired data structure.
class BrandInfo(BaseModel):
brand_name: str = Field(description="This is the name of the brand")
reasoning: str = Field(description="This is the reasons for the score")
likelihood_of_success: int = Field(description="This is an integer score between 1-10")

# You can add custom validation logic easily with Pydantic.
@validator('likelihood_of_success')
def check_score(cls, field):
if field >10:
raise ValueError("Badly formed Score")
return field

# Set up a parser + inject instructions into the prompt template.
pydantic_parser = PydanticOutputParser(pydantic_object=BrandInfo)

format_instructions = pydantic_parser.get_format_instructions()

template_string = """You are a master branding consulatant who specializes in naming brands. \
You come up with catchy and memorable brand names.

Take the brand description below delimited by triple backticks and use it to create the name for a brand.

brand description: ```{brand_description}```

then based on the description and you hot new brand name give the brand a score 1-10 for how likely it is to succeed.

{format_instructions}
"""

prompt = ChatPromptTemplate.from_template(template=template_string)

messages = prompt.format_messages(brand_description="a cool hip new sneaker brand aimed at rich kids",
format_instructions=format_instructions)
messages[0].content

output = llm(messages[0].content)
output = llm(messages[0].content)

5.DatetimeOutputParser


from langchain.output_parsers import DatetimeOutputParser

output_parser = DatetimeOutputParser(format="%d-%m-%Y") #the default format= “%Y-%m-%dT%H:%M:%S.%fZ”
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

prompt = PromptTemplate(
template="{question}\n{format_instructions}",
input_variables=["question"],
partial_variables={"format_instructions": format_instructions}
)

llm1 = llm
chain = prompt | llm1 | output_parser
result = chain.invoke({"question":"when is the indepence day of india ?"})

That’s it. We have learned different ways of formatting the LLM outputs which play an important role in API calls or integrating with any external applications.

Once again, thanks again for your time. i hope you enjoyed this. I tried my best to gather details across and simply as much as possible i could.

In the next article, we will explore ways to prompt our LLM because Prompt tunning is not Prompt Engineering.

Until then feel free to reach out. Thanks for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy

Some of my alternative internet presences are Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd, and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Let me know if you need anything. Talk Soon.

Check out the links i hope it helps.

udemy: https://www.udemy.com/user/rupak-roy-2/
using isotonic, logistic regression, and calibratedclassifierCV.
I walk a lonely road the only one that I have ever known don’t know where it goes. But it’s home to me… Green Day

--

--

Rupak (Bob) Roy - II

Things i write about frequently on Medium: Data Science, Machine Learning, Deep Learning, NLP and many other random topics of interest. ~ Let’s stay connected!