Skip to Content
Portfolio
About
Contact
Jennifer Kim's Playground
(0)
Cart (0)
Portfolio
About
Contact
Jennifer Kim's Playground
(0)
Cart (0)
Portfolio
About
Contact
Project   One
Project   Two
Project Three
Project  Four
  • Project One

  • Project Two

  • Project Three

  • Project Four

Shopping:  Spending Score Prediction Model

Here, I want to introduce a prediction model that expects the customer’s spending score.

Based on my work experience at Massimo Dutti, a famous Spain fashion brand that represents fast fashion trends like Zara,

I chose to explore and build a prediction model.

This will explore what variables have a significant impactful on the resulting spending score.

How would the spending score increase/ decrease when the Age of a customer increases by one year?

Would the work experience of customers significantly impact how they spend on shopping?

Would the family size of a customer impact how they spend?

About Dataset:

Shop Customer Data is a detailed analysis of a imaginative shop's ideal customers. It helps a business to better understand its customers. The owner of a shop gets information about Customers through membership cards.

Dataset consists of 2000 records and 8 columns:

  • Gender

  • Age

  • Annual Income K

  • Spending Score - Score assigned by the shop, based on customer behavior and spending nature

  • Profession

  • Work Experience - in years

  • Family Size

Click the button below to get more details of the data.

Learn more
    1. How Average Spending Scores are distributed across each profession by gender?

  • The basic model.

    prediction target is spending score.

    What variables are significant in spending score?

    What impact does it have on when a customer’s age increases by one year?

  • This model works for proportional response variable.

    So, computed Spending.Score *0.01

    What would be differ from the first model?

    What variables would have meaningful impact on spending score at 95%/ 90%/ 80% significance level?

data cleaning

〰️

data cleaning 〰️ data cleaning 〰️

  1. Filtered Age less than 17. ( Some customers were 4 or 10 years old. Considered these cases to exclude.)

  2. Filled in Unknown in Profession when the cell is empty.

  3. Converted Annual Income to Annual Income K

  4. Deleted Customer ID variable. So the data prediction process was fully performed anonymously.

Normality check

I first assumed that Spending Score’s distribution is not normality.

However, according to the result from Shapiro test, it results that I can reject the first assumption and conclude that Spending score is normally distributed.

Also, the distribution of Spending score as can be found on the right side, it also represents the normality shape as shaping an uni-modality at the middle (single bump).

Distribution of Average Spending Score for Profession by gender

3D Scatter Plotting

Prediction Model 1 with Linear generalized regression model

Source Code

library(rcompanion)
plotNormalHistogram(customers_cleaned$Spending.Score)
shapiro.test(customers_cleaned$Spending.Score)
customers_cleaned$Profession <- factor(customers_cleaned$Profession)
rel_profession <- relevel(customers_cleaned$Profession, ref="Unknown")
summary(fitted.model_1 <- glm(Spending.Score ~ Gender + Age + Annual.Income.K + rel_profession + 
                              + Work.Experience + Family.Size, data=customers_cleaned, 
                              family = gaussian(link =identity)))

null.model_1 <- glm(Spending.Score ~ 1, data=customers_cleaned, family = gaussian(link=identity))
print(deviance_test <- -2*(logLik(null.model_1) - logLik(fitted.model_1)))
print(p_value <- pchisq(deviance_test, df =14 , lower.tail=F))

print(predict(fitted.model_1, data.frame(Gender = "Female", Age = 20, Annual.Income.K = 58, 
      rel_profession = "Doctor",  Work.Experience = 0, Family.Size = 1)))

Prediction Profie 

Gender = Female

Age = 20

Annual Income = 58,000

Profession = Doctor

Work Experience = 0 years

Family Size = 1 (Single)

The Expected Spending Score for a such customer is

50.14097.

interpretation of significant variables

At 22% of significant level,

0.01589 represents the estimated difference in Spending Score between a customer whose profession is "Artist" and a customer whose job is "Unknown."

This suggests that, on average, the spending score for Artists is 0.01589 scores higher than that for customers with an Unknown profession, though this difference is relatively small.

6.64289 represents the estimated difference in Spending Score between a customer working in the Entertainment field and a customer whose job is "Unknown."

This suggests that, on average, customers in Entertainment have 6.64289 scores higher spending scores compared to those with an Unknown profession.

This difference is notably larger than the one observed for Artists.

At 30% of the significant level, significant variables can contain Annual Income of customers.

0.01589 represents the estimated change in Spending Score for every $1,000 increase in a customer's Annual Income. This means that,

on average, when a customer's annual income increases by $1,000, their Spending Score increases by 0.01589 units, holding all other factors constant.

Prediction model 2: Beta regression

Source Code

customers_cleaned_2 <- customers_cleaned %>% 
  mutate(Spending.Score.Prop = Spending.Score *0.01)
customers_cleaned_2 <- customers_cleaned_2 %>% select(-Spending.Score)
print(customers_cleaned_2)
customers_cleaned_2$Profession <- factor(customers_cleaned_2$Profession)
relevel_profession <- relevel(customers_cleaned_2$Profession, ref ="Unknown")
install.packages("betareg")
library(betareg)
install.packages("statmod")
library(statmod)
summary(fitted.model_2 <- betareg(Spending.Score.Prop ~ Gender + Age + Annual.Income.K + relevel_profession +
                                    Work.Experience + Family.Size, data=customers_cleaned_2, link = "logit"))

null.model_2 <- betareg(Spending.Score.Prop ~ 1, data=customers_cleaned_2, link = "logit")
print(deviance_2 <- -2*(logLik(null.model_2) - logLik(fitted.model_2)))
print(p_value_2 <- pchisq(deviance_2, df = 14, lower.tail = F))

print(predict(fitted.model_2, data.frame(Gender = "Female", Age = 20, Annual.Income.K = 58, 
      relevel_profession= "Doctor", Work.Experience = 0, Family.Size = 1)))

Prediction Profile

Gender = Female

Age = 20

Annual Income = 58,000

Profession = Doctor

Work Experience = 0 years

Family Size = 1 (Single)

Prediction result

from Beta regression model

The estimated spending score for a such customer as displayed earlier is

0.4978756 ( 49.78756)

Interpretation of significant variables: How much impact would predictors have on Spending Score?

At 20% of significant level,

Annual income, Artist customers, and customers, who work in the Entertainment field are the findings of the second prediction model that has a significant impact on the resulting spending score.

For Annual income (in thousands of dollars), if the annual income increases by one thousand dollars, the estimated average spending score of such customer is exp(0.0007589-1)*100% =36.82% of that for Unknown profession customer.

For Artist customers, the estimated average spending score is exp(0.2937506)*100% =134.18% of that for customers whose job is unknown.

The estimated average spending score for customers who work in Entertainment is exp(0.2681346)*100% =130.752 %.

conclusion

conclusion conclusion

The analysis of Spending Score using both a linear generalized regression model and a beta regression modelprovides valuable insights into how different factors, such as profession and annual income, influence customer spending behavior.

In the linear generalized regression model, at a 22% significance level, the impact of a customer’s profession on their Spending Score is observed. Specifically, Artists have a slightly higher estimated Spending Score (by 0.01589 units) compared to those with an Unknown profession, though the effect is relatively small. In contrast, customers working in the Entertainment industry exhibit a notably larger increase in Spending Score, with an estimated difference of 6.64289 units compared to those whose profession is Unknown.

When considering Annual Income at a 30% significance level, it is found to be a significant predictor. A $1,000 increase in annual income leads to an estimated 0.01589-unit increase in Spending Score, indicating a weak but positive relationship between income and spending behavior.

The beta regression model, at a 20% significance level, further confirms the importance of Annual Income, Artist profession, and Entertainment profession in predicting Spending Score. A $1,000 increase in income leads to a spending score that is 36.82% of that of customers with an Unknown profession. Moreover, Artist customers have an estimated Spending Score that is 134.18% of that for those with an Unknown profession, while Entertainment professionals exhibit a 130.75% spending score relative to the Unknown group.

Thank you