Taking Pandas To The Next Level With LLMs

Rajat Roy
3 min readMay 14, 2023
Photo by Andrew Neel on Unsplash

Introduction

If you are working or have worked on any data science task then you definitely used pandas. So, pandas is a library which helps with performing data ingestion and transformations.

Pandas Code

For this example, I have taken the supermart grocery sales dataset which contains information about products, sales, discount, profit etc.

Import dataset

df = pd.read_csv('./sample_data/Supermart Grocery Sales - Retail Analytics Dataset.csv')

List columns

df.columns
Index(['Order ID', 'Customer Name', 'Category', 'Sub Category', 'City',
'Order Date', 'Region', 'Sales', 'Discount', 'Profit', 'State'],
dtype='object')

Get the Order Date Info like first and last date

df['Order Date'].describe()
count                             9994
mean 2017-04-30 05:17:08.056834048
min 2015-01-03 00:00:00
25% 2016-05-23 00:00:00
50% 2017-06-26 00:00:00
75% 2018-05-14 00:00:00
max 2018-12-30 00:00:00
Name: Order Date, dtype: object

Average sales per year

df['year'] = df['Order Date'].apply(lambda x: x.year)
df.groupby('year')['Sales'].mean()
year
2015 1493.025088
2016 1489.990010
2017 1496.680325
2018 1502.871981
Name: Sales, dtype: float64

All of these simple explorations required you to write some code and do some cleaning to get the desired output.

Pandas AI

Introducing new library for utilizing pandas with just using natural language with the help of LLMs. Here is the github link.

Now let's try it out.

from pandasai.llm.openai import OpenAI
llm = OpenAI("OPENAI_API_KEY")
pandas_ai = PandasAI(llm)

First order date.

pandas_ai.run(
df,
"What is the first order date?",
)
The first order date was on January 3rd, 2015 at midnight.

Latest order date.

pandas_ai.run(
df,
"What is the latest order date?",
)
The latest date you can place an order is December 30th, 2018 at midnight.

Yearly average sales.

pandas_ai.run(
df,
"year wise average sale",
)
On average, the sales for each year were as follows: 
in 2015 it was $1493.03,
in 2016 it was $1489.99,
in 2017 it was $1496.68,
and in 2018 it was $1502.87

Convert it into a graph.

pandas_ai.run(
df,
"plot year wise average sale",
)
Year-wise average Sales

Conclusion

Super easy, right? This was all about using LLMs within pandas. In this article, we did a quick comparison between the pandas and pandas-ai libraries. This is going to save a lot of time during data manipulation and exploration tasks.

Note for Readers — Are you a programming, AI, or machine learning enthusiast? Then you’ll love my blog on Medium! I regularly post about these topics and share my insights on the latest trends and tools in data science. If you find my content helpful, please like and follow my blog. And if you want to show some extra support, you can give a tip by clicking the button below. Thanks for your time and support!

waitlist. BECOME a WRITER at MLearning.ai. Your Machine, Your AI

--

--

Rajat Roy

Data Scientist | Machine Learning Engineer | Blogger