Introduction
If you are working or have worked on any data science task then you definitely used pandas. So, pandas is a library which helps with performing data ingestion and transformations.
Pandas Code
For this example, I have taken the supermart grocery sales dataset which contains information about products, sales, discount, profit etc.
Import dataset
df = pd.read_csv('./sample_data/Supermart Grocery Sales - Retail Analytics Dataset.csv')
List columns
df.columns
Index(['Order ID', 'Customer Name', 'Category', 'Sub Category', 'City',
'Order Date', 'Region', 'Sales', 'Discount', 'Profit', 'State'],
dtype='object')
Get the Order Date Info like first and last date
df['Order Date'].describe()
count 9994
mean 2017-04-30 05:17:08.056834048
min 2015-01-03 00:00:00
25% 2016-05-23 00:00:00
50% 2017-06-26 00:00:00
75% 2018-05-14 00:00:00
max 2018-12-30 00:00:00
Name: Order Date, dtype: object
Average sales per year
df['year'] = df['Order Date'].apply(lambda x: x.year)
df.groupby('year')['Sales'].mean()
year
2015 1493.025088
2016 1489.990010
2017 1496.680325
2018 1502.871981
Name: Sales, dtype: float64
All of these simple explorations required you to write some code and do some cleaning to get the desired output.
Pandas AI
Introducing new library for utilizing pandas with just using natural language with the help of LLMs. Here is the github link.
Now let's try it out.
from pandasai.llm.openai import OpenAI
llm = OpenAI("OPENAI_API_KEY")
pandas_ai = PandasAI(llm)
First order date.
pandas_ai.run(
df,
"What is the first order date?",
)
The first order date was on January 3rd, 2015 at midnight.
Latest order date.
pandas_ai.run(
df,
"What is the latest order date?",
)
The latest date you can place an order is December 30th, 2018 at midnight.
Yearly average sales.
pandas_ai.run(
df,
"year wise average sale",
)
On average, the sales for each year were as follows:
in 2015 it was $1493.03,
in 2016 it was $1489.99,
in 2017 it was $1496.68,
and in 2018 it was $1502.87
Convert it into a graph.
pandas_ai.run(
df,
"plot year wise average sale",
)
Conclusion
Super easy, right? This was all about using LLMs within pandas. In this article, we did a quick comparison between the pandas and pandas-ai libraries. This is going to save a lot of time during data manipulation and exploration tasks.
Note for Readers — Are you a programming, AI, or machine learning enthusiast? Then you’ll love my blog on Medium! I regularly post about these topics and share my insights on the latest trends and tools in data science. If you find my content helpful, please like and follow my blog. And if you want to show some extra support, you can give a tip by clicking the button below. Thanks for your time and support!