An Input Data tool contains 100 records, each representing a unique customer transaction information. The Customer ID (a unique identifier for each customer) and Customer Segment provide basic information about the customer. The Weekly Sales column shows the average weekly sales for each customer. In this exercise, drop the columns “Regions”, “Store Volume” and “TransInYear”. Then find the 3 customers with the highest average weekly sales in the “Consumer” and “Corporate” segments. Rename the Customer_Segment field to Customer Type.
The correct answer and explanation is :
To process your dataset and identify the top 3 customers with the highest average weekly sales in the “Consumer” and “Corporate” segments, follow these steps:
- Import Necessary Libraries:
Begin by importing the pandas library, which is essential for data manipulation in Python.
import pandas as pd
- Load the Dataset:
Assuming your data is in a CSV file named ‘customer_transactions.csv’, load it into a pandas DataFrame:
df = pd.read_csv('customer_transactions.csv')
- Drop Unnecessary Columns:
Remove the “Regions”, “Store Volume”, and “TransInYear” columns as they are not required for this analysis:
df = df.drop(columns=['Regions', 'Store Volume', 'TransInYear'])
- Rename the ‘Customer Segment’ Column:
To enhance clarity, rename the ‘Customer Segment’ column to ‘Customer Type’:
df = df.rename(columns={'Customer Segment': 'Customer Type'})
- Filter for Relevant Segments:
Focus on the “Consumer” and “Corporate” customer types:
df_filtered = df[df['Customer Type'].isin(['Consumer', 'Corporate'])]
- Identify Top 3 Customers by Segment:
Group the data by ‘Customer Type’ and use thenlargestfunction to find the top 3 customers based on ‘Weekly Sales’ within each segment:
top_customers = df_filtered.groupby('Customer Type').apply(
lambda x: x.nlargest(3, 'Weekly Sales')
).reset_index(drop=True)
- Display the Results:
Thetop_customersDataFrame now contains the top 3 customers for each segment:
print(top_customers)
Explanation:
- Data Preparation: After loading the dataset, we eliminate irrelevant columns to streamline our analysis. Renaming ‘Customer Segment’ to ‘Customer Type’ ensures clarity in our subsequent operations.
- Filtering Data: By isolating the ‘Consumer’ and ‘Corporate’ segments, we ensure our analysis is targeted towards these specific customer types.
- Grouping and Selection: The
groupbyfunction organizes the data by ‘Customer Type’. Within each group, thenlargestfunction efficiently retrieves the top 3 entries based on ‘Weekly Sales’. This method is both concise and efficient for such operations. - Resetting Index: After applying
groupbyandnlargest, the index can become disorganized. Usingreset_index(drop=True)ensures a clean, consecutive index in the final DataFrame, enhancing readability.
Visual Representation:
To visualize the top customers, you can create a bar chart:
import matplotlib.pyplot as plt
import seaborn as sns
# Set the aesthetic style of the plots
sns.set(style="whitegrid")
# Initialize the matplotlib figure
plt.figure(figsize=(10, 6))
# Create a bar plot
sns.barplot(
x='Customer ID',
y='Weekly Sales',
hue='Customer Type',
data=top_customers,
dodge=True
)
# Add titles and labels
plt.title('Top 3 Customers by Weekly Sales in Each Segment')
plt.xlabel('Customer ID')
plt.ylabel('Weekly Sales')
# Display the plot
plt.show()
This script utilizes Seaborn and Matplotlib to generate a bar chart that showcases the top 3 customers in each segment based on their average weekly sales. Such visualizations can provide immediate insights into customer performance across different segments.
By following these steps, you can effectively process your dataset to identify and visualize the top-performing customers in the “Consumer” and “Corporate” segments.