Understanding the causal structure of marketing strategies using DAG (Directed Acyclic Graph)

This series focuses on the theme of "causal inference," and aims to understand its basic concepts and explore practical approaches to verifying effectiveness. In this second article, we explain how to visualize causal structures and organize hypotheses using a directed acyclic graph (DAG). By understanding the basic concepts of DAG, you can intuitively grasp the causal relationships between variables and clarify the factors behind the measures. In addition, we will introduce analytical design to avoid erroneous conclusions through practical approaches such as controlling backdoor paths (paths that create an apparent association between two variables) and confounding factors (external variables that affect both explanatory and target variables) that may prevent the inference of correct causal relationships.
The contents of the series on this topic are as follows. Please also refer to the other articles to utilize causal inference.
- #1The Basics and Importance of Causal Inference in Marketing":
Explaining the basics of causal inference in marketing - Part 2: "Understanding the causal structure of marketing strategies using DAG (Directed Acyclic Graph)" (this article):
Explains the basic principles for correctly understanding causal relationships - The 3nd "How to use causal inference in marketing practice: Evidence from observational data analysis":
Learn how causal inference can be applied to practical effectiveness testing
table of contents
What is DAG?
In causal inference, a Directed Acyclic Graph (DAG) is used to visualize the relationships between variables and verify the validity of hypotheses. DAG is like a "map of cause and effect," with variables represented as nodes (circles and boxes) and causal relationships represented as arrows (paths). By using this tool in analyzing the effectiveness of marketing measures, complex causal relationships can be systematically organized, making it easier to identify the effectiveness of the measures. DAG plays an important role especially in situations where a measure affects multiple elements, so it is important to use DAG to organize the causal structure as the first step in causal inference. After organizing your hypotheses using DAG, select the appropriate analysis subject and analysis method.
Main description rules
- node:Variables (advertising costs, sales, etc.) are expressed as circles.
- Edge (arrow):It indicates a direct causal relationship. In the case of A → B, A represents the cause and B represents the result.
- Non-cyclic:There must be no loops. For example, a loop like A → B → C → A is not allowed.
- Conditional:Draw a box around the node (more on this below).
Four basic causal structures
DAGs are created according to the description rules above. There are four main types of structures: The direction of the arrows in these DAGs indicates causal relationships rather than statistical correlations, so they must be determined based on specialized knowledge, past knowledge, literature, etc. Conversely, one of the advantages of DAGs is that they can determine correlations and causal relationships between variables without numerical calculations.
1. Complete independence
This is a structure in which there is no causal path between A and B.
For example, if SNS advertising (A) has absolutely no effect on sales of a certain product (B), the relationship falls under "complete independence." In such a situation, factors other than SNS advertising must be considered to clarify the cause of the increase in sales.

2. Chain
It is a structure in which there is a "causal chain" with an arrow going in one direction, from A to B. The arrows in a DAG go from cause to effect, so this path represents a direct and serial causal relationship.
For example, if a TV commercial (A) increases brand awareness (M), which in turn increases sales (B), this can be represented as shown in the diagram below, where M (the mediator) acts as an intermediate variable in the causal chain.

3. Fork
This is a structure in which A and B are connected through a common variable, C. This indicates the existence of a common cause between A and B, and this C is often called a "confounding factor."
For example, if temperature (C) affects ice cream sales (A) and the number of beachgoers (B), it can be expressed as shown in the diagram below. In this case, because there is an apparent correlation between A and B, if it is not organized using a DAG, it is possible to infer a false causal relationship between ice cream sales and the number of beachgoers.

4. Collider
This is a structure in which A and B are connected through a common result variable D. This D is called a "collider variable" and indicates that A and B have a common result.
For example, if the number of new customers acquired (A) and the drop-off rate of existing customers (B) both affect sales (D), this can be represented as shown in the diagram below.
Conditioning on the confluence point D can create a bias that creates a spurious correlation between A and B; this is called the "confluence bias."

Backdoor pathways and control for confounding factors
In causal inference, it is important to control for confounding factors (see "The Basics and Importance of Causal Inference in Marketing"). The strength of DAG is that it enables reliable analysis design that controls confounding factors by using concepts such as "backdoor paths" and "confluence bias." Here, we explain the control of confounding factors using backdoor paths as an example.
What is a backdoor pass?
A backdoor path is "another path that distorts the causal relationship from A to B." For example, if you want to estimate the effect of advertising expenses (A) on coat sales (B), and seasonality (C) has a structure that affects both A and B, like the branching structure below, then seasonality acts as a confounding factor and forms a path that distorts the causal relationship from A to B.
More specifically, if sales of coats increase in winter (B↑) and advertising expenses also increase in winter (A↑), it may appear that "sales increased because advertising expenses increased," but in reality it is possible that the season is simply affecting both. A path other than this direct effect from A to B (here, A ← C → B) is called a backdoor path. In this example, it is not possible to distinguish between the direct effect from A to B and the effect of C on A and B via the backdoor path. This state is called a backdoor path being open.

Control for confounding factors
So how do we eliminate the effects of confounding factors? One solution is to close the backdoor. Closing the backdoor means "conditioning" the confounding factor C. In terms of the DAG, the conditioned state is when C is surrounded by a square, as shown in the figure below, and conditioning specifically means fixing the value of that variable. For example, in the above example, this would mean narrowing down the analysis to only winter data, or only non-winter data. In this case, the "season" factor is kept constant during the analysis period, eliminating the effects of the season and making the effects of other factors clearer.
Alternatively, conditioning can be done by incorporating seasonality as a variable in regression analysis. Doing so closes the backdoor path and makes it possible to distinguish the effects of C on A and B and estimate the causal effect from A to B. This approach makes it possible to visually organize the causal structure and determine whether there are any confounding factors, and if so, what kind of analytical design can be used to remove the effects of confounding factors and make causal inferences.

Summary
Key points to remember
- By using a DAG in which variables are nodes and causal relationships are arrows, it is possible to infer causality rather than mere correlation.
- Understanding the concepts of confounding factors and backdoor paths will reduce the risk of erroneous conclusions, enabling accurate policy evaluation.
- By understanding the theory of causal inference from the basics to applications, you will be able to smoothly communicate with data scientists and design analyses using actual marketing data.
In this way, organizing causal structures using DAG is an essential skill for significantly improving the quality of marketing decision-making.How to use causal inference in marketing practice: Evidence from observational data analysis" focuses on specific analytical methods and explains how to apply them to actual measures. If you would like to apply the theory of causal inference to practice, please continue reading.
Recommended articles
- Column
Where data and intuition intersect in marketing strategy: How to improve the quality and speed of your decisions
- Column
Turning environmental change into your ally: Marketing strategies to win in the market
- Column
The first step towards a data-driven marketing organization: A practical approach