List of statistical models in marketing: overview, usage, issues and requirements explained in an easy-to-understand manner (no equations)

Update date: Column
Data scienceData analysisstatistics

Statistical analysis has become an essential tool in the modern business environment. Companies are increasingly looking to make data-driven decisions, and statistical models are key to their success.

This article is aimed at people who think, "Our company is using statistical models, but I don't really understand them," or "I'd like to start incorporating them into our marketing practices to make them more effective." It provides a clear overview of the main statistical modeling methods, examples of their use, and the challenges and prerequisites involved.

table of contents

What is a statistical model?

A statistical model is a method of finding patterns and relationships in data and expressing them in mathematical formulas. By capturing the relationship between "causal factors (explanatory variables)" and "results (target variables)" numerically, it can be used to make predictions and make decisions.

A familiar example is weather forecasting, where factors such as air pressure, humidity, temperature, and wind speed are used to predict when, where, and how much rain will fall. This is exactly how statistical models work.

Marketing use cases

Statistical models are powerful in many areas of marketing.

  • Market SegmentationHow to group customers
  • Predicting customer behaviorWho buys what, when?
  • Optimizing advertising and promotionsWhich measures are effective?

Three benefits of using statistical models

1. Transparency and accountability

Statistical models can scientifically explain marketing results. They clarify "why a result was achieved" and "which factors influenced it," making them more persuasive to superiors, team members, and other departments.

2. Highly accurate predictions

Accurately forecasting product demand and sales allows you to optimize budget allocation, inventory management, and production planning, resulting in cost savings and increased efficiency.

3. Decision support

Management and marketing managers can formulate strategies based on the analysis results, allowing them to make decisions that are backed by data, rather than relying solely on intuition and experience.

8 statistical models you can use in marketing

1. Multiple Regression Analysis

Multiple Regression Analysis

Overview of Multiple Regression Analysis

Multiple regression analysis is a method for understanding the relationship between factors and results when an event or result is influenced by multiple factors. Through this analysis, we evaluate which factors affect the result and to what extent, and clarify the relationship between the factors. The purpose of this method is to mathematically model the relationship between factors and results and understand the causal relationship. In multiple regression analysis, the relationship between the result (target variable) and multiple factors (explanatory variables) is expressed as an equation. This makes it possible to estimate the extent to which different factors affect the result. The model is expressed as a mathematical equation, and coefficients that show the relationship between the factors and the result are calculated. This clarifies the degree of influence of the factors and the causal relationship.

Usage example: Analysis of advertising effectiveness

To analyze advertising effectiveness, you can perform multiple regression analysis using advertising costs, social media posts, seasonal factors, etc. as explanatory variables and outcomes such as sales and customer acquisition as objective variables. This helps you understand which advertising channels and factors have the most impact on results and develop an effective advertising strategy.

Issues with multiple regression analysis

The main challenges include multicollinearity (high correlation between predictor variables), overfitting (model is too complex), underfitting (model is too simple), etc. In addition, multiple regression analysis is an analytical model that assumes a linear relationship between predictor variables and outcomes (the rate at which outcomes change as predictor variables change is always constant), but this is not always the case in reality.

Conditions under which multiple regression analysis is valid

Multiple regression analysis is useful when there is low multicollinearity and you want to verify hypotheses about linear relationships between multiple factors.

A guide to multiple regression analysis using Excel

Free downloads of related materials

A guide to multiple regression analysis in Excel that empowers marketers
~ Understand the correlation between marketing measures and business results ~

2. Hierarchical Multiple Regression Analysis

Hierarchical Multiple Regression Analysis

Hierarchical multiple regression analysis overview

Hierarchical regression analysis is a method for evaluating various factors in a step-by-step manner to determine their impact on an outcome. The goal of this analysis is to understand their impact on the outcome and identify which factors are most influential. This method helps to evaluate the main factors that contribute to the outcome and their relative importance. Hierarchical regression analysis evaluates the most basic factors first, and then other factors are added sequentially to evaluate their impact. During the model construction stage, causal and interrelationships between factors are hypothesized and their impact is statistically analyzed.

Use case: Pricing optimization

By performing hierarchical regression analysis with the product price as the explanatory variable and sales as the objective variable, we can analyze the optimal price. By evaluating the basic factor as the product price and then adding other factors such as advertising costs and competitor prices, we can analyze the impact of pricing on sales, including the relationship between factors. This helps us identify the optimal pricing strategy and maximize profits.

Issues in hierarchical multiple regression analysis·important point

As with multiple regression analysis, there are issues with multicollinearity and overfitting. In addition, because the selection of variables and the input order in model construction are based on hypotheses, there are also issues with errors and bias in the hypotheses.

Conditions under which hierarchical multiple regression analysis is effective

The conditions under which hierarchical regression analysis is valid are similar to those for regular multiple regression analysis, and it is appropriate when there are specific variable and order hypotheses.

3. Path Analysis

Path Analysis

Path Analysis Overview

Path analysis is a method to visually show the relationships between various factors or variables and understand how those relationships affect an outcome. The goal of this technique is to clarify the relationships between factors and understand their influence on a particular event or outcome. It also uses models to visually represent the influence between factors and evaluate the statistical relationships. Path analysis uses a graphical representation of various factors or variables to show how those factors affect an outcome. The relationships between the factors are represented by arrows and the degree of influence on the outcome is evaluated.

Use case: Improving customer experience

In today's world, there are countless interactions between customers and brands through various touch points, such as websites, mobile apps, or offline. Path analysis can help you understand the customer journey more deeply, from the initial touch points (e.g., social media, websites, etc.) to intermediate steps (e.g., product reviews, price comparisons, sales, etc.) to conversions (e.g., purchases, downloads, newsletter registrations, etc.). In addition, by identifying important touch points along the path that increase the customer's lifetime value, you can optimize your resources and maximize your cost-effectiveness, such as improving conversion rates and upselling premium products.

Challenges in path analysis·important point

The main challenges are designing an optimal model, the assumption of multivariate normal distribution, and the inability to handle latent variables (variables that are not directly observed but are estimated from other observed variables).

Conditions under which path analysis is valid

Path analysis is appropriate when there is a clearly defined theoretical model with hypotheses about the causal relationships between variables and when all variables are observable.

4. Logistic Regression Analysis

Logistic Regression Analysis

Overview of Logistic Regression

Logistic regression analysis is a method to examine the probability of an event or outcome occurring. Specifically, it is used to predict or understand whether a particular event will occur. The goal of this analysis is twofold. First, to predict the probability of a particular event occurring. Second, to understand which factors influence the occurrence of that event. Logistic regression analysis mathematically models how much different factors (explanatory variables) influence the occurrence of a particular event (target variable). The model is expressed as a probability equation, showing how factors are related to the probability of occurrence.

Use case: Customer segmentation

Logistic regression analysis can be useful for customer segmentation. For example, an online shop can use logistic regression analysis to predict whether a customer will view a particular product, add it to their cart, and ultimately purchase it. This information can be used to understand customer purchasing intent and optimize advertising and promotion delivery and product recommendations. Understanding the factors and characteristics that lead customers to the final purchase stage can improve sales efficiency and increase revenue.

Challenges in logistic regression analysis·important point

A nonlinear relationship is assumed between the objective variable and explanatory variables, and a large sample size is required to obtain reliable results. Overfitting can be a problem when there are many explanatory variables.

Conditions under which logistic regression analysis is valid

It is useful when the response variable is binary (i.e., categorical, such as "yes" or "no") and the relationship between the explanatory variables and the response is assumed to be non-linear.

5. Covariance structure analysis (Structural Equation Modeling/SEM)

Covariance structure analysis (Structural Equation Modeling/SEM)

Overview of Covariance Structure Analysis

Covariance structure analysis is a method to explore relationships between different data and understand causal relationships and correlations between the data. The purpose of this analysis is to understand patterns and structures in the data and how different variables are related. A model may also be used to identify potential causal relationships behind the data. Covariance structure analysis models the structure between different variables from observed data. The model contains paths that show the observed data and potential causal relationships.

Use case: Interpreting survey results

Covariance structure analysis can help you interpret survey results. For example, if you conduct a consumer survey, you can use covariance structure analysis to clarify the relationship between different questions or survey items. This allows you to understand how a specific advertising measure affects customer purchase intentions and evaluate the relationship between product characteristics or price and purchasing behavior. Using covariance structure analysis can lead to improved marketing strategies and identification of target consumers.

Issues in covariance structure analysis·important point

Covariance structure analysis requires a large sample size and complex hypotheses. It is difficult to identify and build an appropriate model, so it is important to have people with expert knowledge of statistical theory and people with business domain knowledge who have solid hypotheses (statistical theory alone or business domain knowledge alone is not enough). The same can be said for other methods, but this is especially true for covariance structure analysis.

Conditions under which covariance structure analysis is valid

Like path analysis, it is appropriate when there is a clearly defined theoretical model with hypotheses about the causal relationships between variables.

6. ARIMA (Autoregressive Integrated Moving Average) Model

ARIMA (Autoregressive Integrated Moving Average) Model

Overview of ARIMA Models

As the main model in time series analysis, ARIMA (Autoregressive Integrated Moving Average) is a method to understand patterns and trends in data that change over time and make predictions about the future. The purpose of this model is to understand the fluctuations and patterns in data related to time and to predict future events and trends. The model can also be used to reveal causal relationships and influences behind the data.

Use case: Demand forecasting

The ARIMA model can be used for demand forecasting in the marketing field. For example, by using the ARIMA model based on past sales data to predict future demand, and optimizing product order quantities and inventory management based on this prediction, you can effectively respond to fluctuations in demand. By using ARIMA, you can reduce inventory surpluses and shortages and achieve efficient supply chain management.

Challenges with ARIMA models·important point

It is assumed that the data is stationary (the statistical properties of time series data (such as the mean, variance, and autocorrelation) are constant regardless of time), but in reality this is not always the case. Another issue is when the data is seasonal.

Conditions for ARIMA model to be valid

It is suitable for analyzing time series data if the data is stationary.

7. State-Space Model

State-Space Model

Overview of State-Space Models

State space models are a technique for understanding data that changes over time. These models are used to estimate hidden factors or states in the data. The goal of the model is to identify hidden states or trends behind the data and predict future data. The model can also be used to understand characteristics such as fluctuations or periodicity in the data. State space models consist of data observations and a model that represents the states behind those observations. The model typically captures changes over time and is useful for data fluctuations and predictions.

Usage example: Inventory management

State space models can be used in inventory management. For example, they can be used to understand the stock status of products and respond to fluctuations in demand. Using this model, future demand can be predicted based on past sales data and inventory levels, and appropriate inventory levels can be maintained. This helps optimize inventory costs and avoid stock-outs, resulting in efficient inventory management.

Challenges in state space models·important point

The main challenge is the difficulty of parameter estimation due to the presence of latent variables.

Conditions for validity of state-space model

It is suitable for analyzing time series data that has "states" (current states) and "observations" (observed values ​​of those states). In addition, the state space model is more flexible than ARIMA and does not necessarily assume stationarity of the data, so it can be applied to a wide range of time series patterns and is suitable for capturing complex dynamic systems.

8. Bayesian Network

Bayesian Network

Overview of Bayesian Networks

A Bayesian network is a method for visually representing the relationships between events and factors and for making probabilistic inferences. The method clearly shows how different events and factors affect each other. The goal of the method is to understand the relationships between events and factors and calculate the probability of a particular event occurring. The model may also be used to predict the impact of different factors on an outcome. A Bayesian network represents different factors and events as nodes and shows the relationships between these factors with arrows (edges). Probability distributions are used to calculate the probability of an event occurring, and models are used to communicate information and make inferences.

Usage example: New product development

Bayesian networks can be used in new product development. For example, when a company is considering the features and pricing of a new product, it uses a Bayesian network to predict the market reaction. This model can be used to evaluate how different combinations of product features and pricing will be received in the market. It also considers the relevance of competitors' strategies and market conditions to formulate an optimal new product strategy. Using a Bayesian network, it is possible to improve the success rate of a new product and minimize risks.

Challenges in Bayesian networks·important point

Learning the structure of a Bayesian network from data is a computationally expensive technique, especially for large networks, and requires a sufficient amount of data to produce reliable results.

Conditions for validating Bayesian networks

Similar to MCMC methods, it is useful when you need to model probability theory and uncertainty for business purposes. It is suitable for representing probabilistic relationships between variables in a graph form (network).

To the end

Statistical models bring transparency, predictive accuracy, and decision support to marketing.

However, the choice of model depends on the problem you want to solve and the resources you have available (knowledge, time, data). The key to success is for business and data scientists to work together to accurately understand the problem and choose the right model.

No statistical model can perfectly describe the reality of marketing, but using these models effectively is essential to being able to act with confidence.

It is crucial for data scientists to validate and tune the accuracy of the model. The combination of reliable data and accurate models allows for marketing optimization.

Statistical Modeling FAQs

Q1. Do I need advanced mathematical knowledge to use statistical models?

When using statistical models in practice, not everyone needs to understand the formulas. What is important is to understand the characteristics of each model and the situations in which it can be applied.

Typically, data scientists handle the actual analysis, while marketers focus on defining the problem and interpreting the results. By working together, they can achieve the best results.

Q2. I don't know which model to choose.

The model selection depends on three factors:

Issues to be resolved

  • Do you want to predict or understand the factors?
  • Are the results numeric, binary, or time series?

Available data

  • Is the sample size sufficient (at least several hundred, ideally several thousand or more)?
  • Data type (numeric, categorical, time series)
  • Data quality (presence of missing values ​​and outliers)

Resource

  • Time available for analysis
  • Available expertise
  • budget

If you're unsure, we recommend starting with simple models such as multiple regression or logistic regression.

Q3. How reliable are the results of statistical models?

The reliability of a statistical model depends on the following factors:

Data quality

  • Are there enough samples?
  • Is there any bias in the data?
  • Are missing values ​​and outliers handled appropriately?

Model adequacy

  • Has the appropriate model been selected for the task?
  • Are the model assumptions met?
  • Is there overfitting or underfitting?

Verification Process

  • Has the prediction accuracy been verified?
  • Has it been confirmed across multiple datasets?

While perfect predictions are impossible, a well-constructed model can support decision-making with greater accuracy than intuition or experience alone. It is important to regularly verify the accuracy of your model and adjust it as necessary.

Q4. What is the difference between AI and machine learning?

Statistical models and machine learning have different goals and approaches.

Statistical Model

  • Emphasis on understanding the relationship between causes and effects
  • I can explain why it happened
  • Works with relatively little data
  • Easy to interpret results

Machine learning

  • Focus on maximizing forecast accuracy
  • Predict what will happen
  • Large amounts of data are required
  • It tends to be a black box

In marketing, it is effective to use both methods appropriately. Statistical models are suitable for understanding factors, while machine learning is suitable for purely improving prediction accuracy.

Q5. How should I explain the results to management?

When explaining the results of your statistical model to management, keep the following points in mind:

Avoid jargon

  • "Coefficient of determination" → "goodness of fit of the model"
  • "Significance level" → "The degree of certainty that it is not a coincidence"

Demonstrate business impact

  • Show the impact on sales and profits in monetary terms, not just statistical figures
  • A correlation coefficient of 0.8 means that a 10% increase in advertising spending is expected to increase sales by 8%.

Use visualization

  • Make it easy to understand with graphs and diagrams
  • Visualize complex relationships using path diagrams, etc.

Communicate risks and limitations

  • Be honest about the uncertainty of your forecasts
  • Clarify what you know and what you don't know

Q6. How do I evaluate the accuracy of the model?

The method for evaluating a model varies depending on the model type.

For predictive models

  • Measure the difference (error) between the predicted value and the actual value
  • Split the data to check the accuracy of predictions on unknown data
  • Multifaceted evaluation using multiple indicators (coefficient of determination, root mean square error (RMSE), etc.)

For classification models

  • Check accuracy rate, precision rate, recall rate, etc.
  • Get detailed results with confusion matrices

General Checks

  • Are the model assumptions met?
  • Are there any abnormal values ​​or outliers affecting the results?
  • Is overfitting occurring?

The most important thing is to continually check the accuracy through practical use and continue to improve it.

Q7. What should I do if the analysis results are different from what I expected?

If the results are different from what you expected, try the following steps:

Double-check the data

  • Check for input errors or abnormal values
  • Are the data collection period and scope appropriate?
  • Is the missing value handling appropriate?

Rethinking the model

  • Is the selected model appropriate for the task?
  • Are the prerequisites met?
  • Are any important variables missing?

Reexamining the hypothesis

  • Was the original hypothesis correct?
  • Are there any factors that I'm overlooking?
  • Are changes in the market environment having an impact?

Unexpected results are also opportunities for new discoveries, so it's important to think outside the box and embrace what the data tells you.

About XICA

XICA has over 10 years of experience in the field of marketing data science, and has a track record of supporting over 280 companies, primarily domestic enterprises.

Data scientists and consultants with expertise in a variety of industries use statistical methods toDevelop a data-driven strategyFrom TESTaCustomer understanding, creative analysis and production, media planning,Effectiveness verification and budget optimizationWe support the use of data to support better decision-making.

Please contact us to discuss the optimal statistical model and analysis support based on your company's specific challenges and goals and the data available.Contact us.

Recommended articles