Skip to the content.

Supply Chain Analytics: A Comprehensive Approach

Supply Chain Analytics

Table of Contents

Objective

The main objective of this project is to use analytical techniques to gather insights to ensure seamless flow of products from suppliers to stores. The objectvies can be outlines as follows:

  1. Segment customers based on purchasing behavior and demographics to identify the most profitable segments.
  2. Identify factors influencing customer spending and predict future spending patterns.
  3. Optimize inventory levels to maximize profit and minimize costs while meeting customer demand.
  4. Test hypotheses regarding spending behaviors among different customer groups.
  5. Identify trends and seasonal patterns in customer purchasing behavior over time.
  6. Improve customer retention rates by identifying factors leading to churn.

User Story

As the Supply Chain Manager, I want to apply various analytical techniques to our customer data so that we can derive insights and support decision-making processes for improving our supply chain operations. I need probability calculations, hypothesis testing, regression modeling, and linear programming to provide actionable recommendations for our upcoming sales and operations planning meeting.

Target Audience

  1. General Manager
  2. Supply Chain Manager
  3. Sales Manager
  4. Operations Manager
  5. Procurement Manager

Data Source

The data for this analysis has been obtained from a confidential source and mimics real-time events, names, products, and places. It has been synthesized to mask the original source.

Stages

Design

Based on the requirements, we can create dashboards that contain the following components, enabling the stakeholders to make informed decisions about their supply chain:

  1. Overview Section
  2. Customer Demographics Overview
  3. Sales Performance Dashboard
  4. Product Purchase Analysis
  5. Customer Engagement Dashboard
  6. Marketing Campaign Response
  7. Interactive Filters and Drill downs

DashBoard Mockups

What should the Dashboard Look like?

Some of the data visuals that may be appropriate in answering our questions include:

  1. Table
  2. Bar Chart
  3. Line Chart
  4. Scatter Plot
  5. Text Box
  6. KPI Card
  7. Slicers

Tools Used

Tool Purpose
Excel Exploring and analyzing the data
MySQL Server Cleaning and testing the data
Power BI Visualizing the data via interactive dashboards
GitHub Hosting the project documentation

Development

Pseudocode

Pseudocode

Data exploration notes

This is the stage where we scan the data for errors, inconcsistencies, bugs, weird and corrupted characters etc.

Data Cleaning and Testing

The aim is to refine the dataset to ensure it is structured and ready for analysis. The cleaned data should meet the following criteria and constraints:

  1. Only relevant columns should be retained.
  2. All data types should be appropriate for the contents of each column.
  3. No column should contain null values, indicating complete data for all records.

Below is a table outlining the constraints on our cleaned dataset,

Tool Purpose
Number of Rows 2241
Number of Columns 18

Expected Schema for Clean Data

Column Name Data Type Description
CustomerID INT Unique identifier for the customer
Year_Birth INT Year of birth of the customer
Education VARCHAR(50) Highest education level attained by the customer
Marital_Status VARCHAR(50) Marital status of the customer
Income DECIMAL(10,2) Annual income of the customer
Kidhome INT Number of children in the customer’s household
Dt_Customer DATETIME Date when the customer was enrolled
Recency INT Number of days since the last purchase
Complain INT Whether the customer has made a complaint (0 or 1)
MntWines DECIMAL(10,2) Amount spent on wines
MntFruits DECIMAL(10,2) Amount spent on fruits
MntMeatProducts DECIMAL(10,2) Amount spent on meat products
MntFishProducts DECIMAL(10,2) Amount spent on fish products
MntSweetProducts DECIMAL(10,2) Amount spent on sweet products
MntGoldProds DECIMAL(10,2) Amount spent on gold products
NumDealsPurchases INT Number of purchases made with a discount
NumWebPurchases INT Number of purchases made through the web
NumCatalogPurchases INT Number of purchases made using a catalog
NumStorePurchases INT Number of purchases made directly in stores
NumWebVisitsMonth INT Number of visits to the company’s website in the last month
Response INT Whether the customer responded to a campaign (0 or 1)

Transform and test the data

Before data transformation, a new database is created in the MySQL server. After the database is created, the raw data file is imported into the server using the table import wizard. After the import is complete, we can now being cleaning the data.

Convert Date Columns to DateTime format

ALTER TABLE customers_data MODIFY COLUMN Dt_Customer DATETIME;

Check for Missing Values and Outliers

SELECT column_name, COUNT(*) - COUNT(column_name) AS missing_count
FROM customer_data
GROUP BY column_name;

This step is repeated for multiple columns to check for any missing values and outliers

Handle Outliers

-- Update the Education column to change '2n Cycle' to 'Diploma'
UPDATE customers_data
SET Education = 'Diploma'
WHERE Education = '2n Cycle';


-- Update the Education column to change '2n Cycle' to 'Diploma'
UPDATE customers_data
SET marital_status = 'Single'
WHERE marital_status = 'YOLO' or marital_status ='Absurd' or marital_status ='Alone';

-- Delete rows with negative values in the Income column
DELETE FROM customers_data
WHERE Income < 0;

-- Update the MntWines column to change negative values to positive values
UPDATE customers_data
SET MntWines = ABS(MntWines)
WHERE MntWines < 0;

-- Update the MntFruits column to change negative values to positive values
UPDATE customers_data
SET MntFruits = ABS(MntFruits)
WHERE MntFruits < 0;

-- Update the MntMeatProducts column to change negative values to positive values
UPDATE customers_data
SET MntMeatProducts = ABS(MntMeatProducts)
WHERE MntMeatProducts < 0;

-- Update the MntFishProducts column to change negative values to positive values
UPDATE customers_data
SET MntFishProducts = ABS(MntFishProducts)
WHERE MntFishProducts < 0;

-- Update the MntSweetProducts column to change negative values to positive values
UPDATE customers_data
SET MntSweetProducts = ABS(MntSweetProducts)
WHERE MntSweetProducts < 0;

-- Update the MntGoldProds column to change negative values to positive values
UPDATE customers_data
SET MntGoldProds = ABS(MntGoldProds)
WHERE MntGoldProds < 0;

-- Update the NumDealsPurchases column to change negative values to positive values
UPDATE customers_data
SET NumDealsPurchases = ABS(NumDealsPurchases)
WHERE NumDealsPurchases < 0;

-- Update the NumWebPurchases column to change negative values to positive values
UPDATE customers_data
SET NumWebPurchases = ABS(NumWebPurchases)
WHERE NumWebPurchases < 0;

-- Update the NumCatalogPurchases column to change negative values to positive values
UPDATE customers_data
SET NumCatalogPurchases = ABS(NumCatalogPurchases)
WHERE NumCatalogPurchases < 0;

-- Update the NumStorePurchases column to change negative values to positive values
UPDATE customers_data
SET NumStorePurchases = ABS(NumStorePurchases)
WHERE NumStorePurchases < 0;

-- Update the NumWebVisitsMonth column to change negative values to positive values
UPDATE customers_data
SET NumWebVisitsMonth = ABS(NumWebVisitsMonth)
WHERE NumWebVisitsMonth < 0;

-- Change column name for ID column
Alter table customers_data
Change Column ID ID INT;

Standardise the datatypes

ALTER TABLE customers_data
MODIFY COLUMN ID INT,
MODIFY COLUMN Year_Birth INT,
MODIFY COLUMN Education VARCHAR(50),
MODIFY COLUMN Marital_Status VARCHAR(50),
MODIFY COLUMN Income DECIMAL(10,2),
MODIFY COLUMN Kidhome INT,
MODIFY COLUMN Recency INT,
MODIFY COLUMN Complain INT,
MODIFY COLUMN MntWines DECIMAL(10,2),
MODIFY COLUMN MntFruits DECIMAL(10,2),
MODIFY COLUMN MntMeatProducts DECIMAL(10,2),
MODIFY COLUMN MntFishProducts DECIMAL(10,2),
MODIFY COLUMN MntSweetProducts DECIMAL(10,2),
MODIFY COLUMN MntGoldProds DECIMAL(10,2),
MODIFY COLUMN NumDealsPurchases INT,
MODIFY COLUMN NumWebPurchases INT,
MODIFY COLUMN NumCatalogPurchases INT,
MODIFY COLUMN NumStorePurchases INT,
MODIFY COLUMN NumWebVisitsMonth INT,
MODIFY COLUMN Response INT;

Revalidate the data types

DESCRIBE customers_data;

Revalidated data types

Now that the data is cleaned, we can perform a series of checks to ensure data validity, consistency and correctness. Some of the tests performed on this dataset are as follows:

  1. Check for Null Values
  2. Check for negative values
  3. Check for consistency in categorical data
  4. Check for logical consistency
  5. Check for duplicates

Ensure no Null Values

SELECT COUNT(*) AS NullCount
FROM customers_data
WHERE ID IS NULL
   OR Year_Birth IS NULL
   OR Education IS NULL
   OR Marital_Status IS NULL
   OR Income IS NULL
   OR Kidhome IS NULL
   OR Dt_Customer IS NULL
   OR Recency IS NULL
   OR Complain IS NULL
   OR MntWines IS NULL
   OR MntFruits IS NULL
   OR MntMeatProducts IS NULL
   OR MntFishProducts IS NULL
   OR MntSweetProducts IS NULL
   OR MntGoldProds IS NULL
   OR NumDealsPurchases IS NULL
   OR NumWebPurchases IS NULL
   OR NumCatalogPurchases IS NULL
   OR NumStorePurchases IS NULL
   OR NumWebVisitsMonth IS NULL
   OR Response IS NULL;

Check for Negative Values

SELECT COUNT(*) AS NegativeCount
FROM customers_data
WHERE Income < 0
   OR MntWines < 0
   OR MntFruits < 0
   OR MntMeatProducts < 0
   OR MntFishProducts < 0
   OR MntSweetProducts < 0
   OR MntGoldProds < 0
   OR NumDealsPurchases < 0
   OR NumWebPurchases < 0
   OR NumCatalogPurchases < 0
   OR NumStorePurchases < 0
   OR NumWebVisitsMonth < 0;

Check for Consistency in Categorical Data

-- Check distinct values in Education column
SELECT DISTINCT Education
FROM customers_data;

-- Check distinct values in Marital_Status column
SELECT DISTINCT Marital_Status
FROM customers_data;

Check for Logical Consistency

SELECT ID, (YEAR(CURDATE()) - Year_Birth) AS Age
FROM customers_data
WHERE (YEAR(CURDATE()) - Year_Birth) < 0
   OR (YEAR(CURDATE()) - Year_Birth) > 100;

Check for Duplicates

SELECT ID, COUNT(*) AS DuplicateCount
FROM customers_data
GROUP BY ID
HAVING COUNT(*) > 1;

We can now export this cleaned data into PowerBI to create visualisations.

Visualisation

Results

Analysis

Now that we have cleaned and tested the data using SQL, we can take a deeper dive into the data with the help of Excel Solver add-in to gather insights and to strategise performance improvement opportunities.

Probability

What is the probability that for at most 10 out of 100 randomly selected customers to have complained within the last two years?

Probability

The 1.092% probability suggests that the complaints are relatively rate. This indicates that the customer satisfaction is generally high. Further investigation into the reasons for the complaints can help improve customer satisfaction even more.

Actions to be taken: