TAS Building 2F
2−5−2 Nishikanda, Chiyoda-Ku,
Tokyo, 101-0065, Japan View on Google Map


【The Beginner’s Guide to Data Analysis】The 8 Steps of Basic Data Analysis

Data Utilization

It is no secret that data analysis is becoming increasingly important in any business. Many companies and organizations are already moving toward data-driven management.

However, it can be difficult to get started. Some thoughts that may stop you from getting started would probably be something like this:

  • “I actually don’t know what data analysis is.”
  • “I have the data, but I don’t know how to utilize it to solve the issues that I’m facing.”
  • “I am aware of the importance of utilizing data, but I’m not getting the results I want.”
  • “I’m not familiar with statistics, mathematics, or programming, so data analysis is not for me.”

In reality, however, the utilization of data in business can often be done without specialized knowledge.

In this article, we will explain the 8 foundational steps of data analysis for those who have never studied data analysis before. We will introduce the basics of data analysis with key points and tips on how to utilize data for business results.

Why your business needs data analysis

Speedometers and petrol meters in vehicles.

Imagine you are driving a car. While driving, most of us probably check the speedometer, gasoline level, and car navigation system displayed on the meter panel in front of the driver’s seat.

Even without a meter, a car will still run and reach its destination. However, if asked, “Would you buy a car without a meter?”  most people would probably answer no.

This is because a meter helps you answer questions such as “Can I make this turn at this speed?” and “Can we reach the destination without a gas refill?” It also allows us to enjoy the pleasure of driving on roads we have never traveled before, and to discover new ways to get around that we didn’t know existed.

The meter is a very important device that helps us get to our destination more efficiently and safely, and shows us new roads.

Thinking about data analysis in a similar way will help us understand why a business needs data analysis. Running a business without data is like driving a car without meters or navigation systems.

There are two main benefits of incorporating data analysis into your business.

One is the increased odds of success.

Just as the meter in your car tells you the safe speed or the amount of gas you need, data can be used to derive strategies that have a higher chance of success. This is because we can equate which factors have an impact on outcomes and by how much. Visualizing and quantifying the relationship between outcomes and factors brings repeatability to success.

Another benefit of incorporating data analysis is that it highlights previously overlooked shortcomings and reveals unexpected growth potential. Surprisingly, data is a source of creativity.

The more data you collect, the more power it holds. This is because it allows for more learning and increases accuracy. The earlier you start collecting data, the better your chances of succeeding in your business are.

In an era of ever-increasing digitization, “data-based decision making” will become more necessary than ever before.

What does it mean to “leverage data?”

Before explaining the specifics of how to leverage data in your business, we will go over what “data,” “data analysis,” and “data utilization” mean respectively.

What is “data”?

If we were to define what “data” is, we could say that it is “the quantification of events happening in the world”. In other words, anything that can be quantified is data.

When asked what data is, some people may imagine purchase data such as “20 boxed lunches were sold on August 1, 2021,” or weather data such as “the average temperature in July 2021 was 25.9 degrees Celsius.”

Data can be broadly classified into “quantitative data (quantitative variables)” and “qualitative data (qualitative variables)”.

Quantitative data is data that can be expressed in terms of numbers with units, such as the number of pieces, temperature, number of cases, frequency, height, weight, etc., as shown in the examples. Qualitative data, on the other hand, is data that distinguish categories such as gender, blood type, favorite celebrities, likes and dislikes, and are represented by letters such as “Yes/No” or “A, B, O, A B” rather than numerical values.

Quantitative and qualitative data.

Advances in technology have made it possible to devise ways to quantify even qualitative data. It is expected that the types of data that can be handled in data analysis will continue to increase.

What is “data analysis”?

Data analysis is the process of extracting information from data, and there are two types of data analysis methods: “descriptive statistics” and “inferential statistics”.

Descriptive statistics is a method of analysis in which the collected data is put into charts and graphs to make the data easier to read and explore its characteristics.

For example, let’s say we have numerical data on average height and weight for each grade. If this data set is visualized as bar graphs or line graphs, it becomes easier to see the differences in average height per grade, and the relationship between height and weight. In this way, descriptive statistics is an effort to understand the nature of the data collected by visualizing the data in a way that is easy to read.

Inferential statistics, on the other hand, is a method of analysis that extracts information that is not obvious from the data. This is done by looking at a small sample of data to extract an overall trend.

Take, for example, a preliminary election report. Have you ever wondered how they know a candidate is sure to win with only 1% of the vote counted? Statistical analysis is often compared to soup. A ladleful of soup out of a pot full of soup is the same soup. In a preliminary election report, we take a small sample of the votes to determine the overall trend, and determine who won or lost the election. This effort to take a sample to get an overall picture is inferential statistics.

Just as it is difficult to open all election ballots cast nationwide, there are many cases where it is difficult to collect all of the data. In such cases, we use this method of inferential statistics.

Differences between descriptive and inferential statistics

What is “data utilization”?

Through data analysis, information can be extracted, such as “height and weight gain are proportional” or “boys’ height growth is greatest between the sixth grade and the first grade of junior high school”.

Data application is the process of interpreting and leveraging the extracted information towards an objective.

For example, suppose there is an analysis result (information) that “eating food after 8:00 p.m. makes you gain weight”.

If this information is to be leveraged to achieve the objective of “gaining x kg” then the goal would be to “increase the amount of food you eat at night by y %.” However, if the objective is “to lose weight” then the goal would be to “finish eating dinner by a certain time.”

In this way, the interpretation of the analysis results and the actions to be taken will depend on the objective. Setting objectives is essential when utilizing data in business.

The 8 Steps of Data Analysis

The following is the process of data utilization in business in eight steps.

The most important thing when utilizing data in business is to follow the below process. Proceed in order from the beginning, and if you find something is “off” in each step, go back to the previous step. If you don’t follow this process, the data analysis is more likely to end in failure.

8 steps to data analysis

Step 1: Objective (Clarify your goal)

As mentioned earlier, the interpretation of data and its resulting actions vary greatly depending on the objective. Therefore, it is important to first clarify the objective of why you are analyzing data.

The first step in data analysis is to personalize the larger goals of the company, such as “to maximize sales” or “to make a new business venture successful,” and to frame them into the objectives that you personally want to achieve by utilizing data.

Pointer) Always have a sense of purpose.

This may seem obvious at first glance, but there are many cases in which analysis with unclear objectives results in the failure to provide useful insights even though a great deal of effort and expense has been invested in the analysis. It is important to have a strong sense of purpose in order to realize analysis, decision-making, involvement, and implementation in the organization.

Step 2: Problem (Identify the issue)

Once the objectives of the data analysis have been identified, the issues that need to be solved to achieve the objectives can also be identified. There are two approaches to identifying issues.

1. If you have a hypothesis of the issue: identify the issue from actual data

e.g. If the objective is to maximize sales:

(i) Break down the components of sales
Diagram of elemental decomposition of sales.
(ii) Create a simple graph of elements and compare results and costs
Line graph comparing existing sales with new sales.
Line graph comparing the number of conversions with the number of incoming flows
(iii) Identify the issue.
Line graph comparing the effectiveness of display and listing advertising.

2. When there is no past data so it is impossible to have a hypothesis of the issue (e.g. when creating a new business): identify the possible issues you may encounter in the future

e.g. If the objective is to maximize sales:

(i) Formulate a hypothesis for the future

(ii) Use past data with similarities to the hypothesis to infer causality

(iii) Identify the issue

Diagram illustrating the procedure for identifying issues from a future hypothesis (vision to be achieved).

Pointer) Identify the issues using one of the two approaches.

If you try to identify issues without using these approaches, you will end up setting  issues without any concrete standing. If you do so, you will run a high risk of not achieving the objectives even if you solve the identified issues.

Step 3: Hypothesis (Speculate on the factors causing the issue)

Once the issues are identified, we can infer the factors that are causing the issues. As in “Step 2: Problem (Identify the issue),” here again, start by identifying and structuring the factors.

e.g. If the issue is “low inflow of listing ads”

1. Identify the elements that make up the number of inflows for listing ads.
2. Structure the elements identified
Figure identifying the components of 'number of incoming listing ads'.

Pointer) After structuring, check two things: “Is the relationship between cause and result correct?” and “Is it MECE (mutually exclusive, collectively exhaustive)?”

During the structuring, if there are errors in the relationships between cause and result, or omissions or duplications in the factors, it will be impossible to formulate a highly accurate hypothesis. After structuring, check the following four points.

・Make sure KPIs and factors (measures) are not being handled in the same step
・Make sure the relationship between issues and KPIs/KPIs and its factors are not being reversed
・Make sure there aren’t any omissions in the elements that explain the issue
・Make sure there aren’t any overlapping elements

3. Infer the factors causing the issue

Pointer) When formulating a hypothesis, exchange opinions with team members and people outside the organization.

There are often cases where one’s own experience narrows the scope of a hypothesis, or one proceeds without noticing inconsistencies in the hypothesis. When formulating a hypothesis, it is important to listen to the opinions of people inside and outside the company to enhance the accuracy of the hypothesis.

4. Inferring the expected results of the analysis

Pointer) When creating a hypothesis, also assume the expected results of the analysis.

Once a hypothesis is formulated, it is important to consider what kind of analytical results will be obtained if the hypothesis is proven.

For example, if you hypothesize that “TV commercials and newspaper ads are effective in increasing sales,” you should also assume the results of the analysis, such as “TV commercials have an impact on XX% of sales and newspaper ads have an impact on XX% of sales.”

Data required for analysis.

Step 4: Data (Collect the data necessary to substantiate the hypothesis)

In “Step 4: Data,” we collect the data necessary to substantiate the hypothesis established in Step 3.

Data required for analysis.

Pointer) When collecting data, think about what data is necessary to prove the hypothesis.

If you try to prove your hypothesis using the data you have as it is, you may end up making a wrong analysis in some cases. Consider “what form of data is necessary to prove the hypothesis” by referring to the following example.

e.g. If you want to prove the hypothesis that “sales decrease on hot summer days,” do not use the temperature data as it is, but change it into the form necessary to prove the hypothesis.

■ When using the original data as it is

Temperature data such as “30 degrees Celsius on Aug. 1, 32 degrees Celsius on Aug. 2 …

→If the temperature data is used in the analysis as it is, the analysis results are not limited to summer, such as “sales decrease/increase by ●● yen when the temperature increases by 1 degree regardless of the season”. In some cases, it may lead to incorrect analytical suggestions.

■ When using the data in a different form

Data with flags such as “a marker of 1 for days with a temperature of 30 degrees Celsius or higher, a marker of 0 for days with a temperature of less than 30 degrees Celsius.”

→The results of analysis that shows the relationship between temperature and sales, such as “sales decrease/increase by ●● yen on days when the temperature is above 30 degrees Celsius,” can be obtained and the hypothesis can be verified.

Step 5: Analysis (Analyze the collected data)

The collected data will be analyzed using appropriate analytical methods to prove the hypothesis.

Step 6: Interpretation (Review the entire process from the objective to the analysis for any discrepancies)

The interpretation step is the last step in determining whether or not actions should be taken. It is easy to understand if you think of interpretation as the process of looking back to see if everything from the objective to the analysis is in order.

Step 6: Interpretation

Pointer) Review the “hypothesis” and “assumed analysis results” established in Step 3 and compare them with the actual analysis results.

(It is not necessary to go back and check each phase.)

1. When both the hypothesis and the results of the analysis differ from what was initially expected

(1) Go back to “Step 1: Objective” and confirm the objective of the analysis, and what the issues identified were.

(2) Re-establish the hypothesis based on a causal relationship different from the initial hypothesis.

(3) Proceed again with Step 4 through to Step 6.

Step 6: Interpretation

2. When the hypothesis is correct, but the results of the analysis differ from what was originally expected

(1) Go back to “Step 4: Data” and check if the data is correct.

(2) Collect the correct data.

(3) Proceed with “Step 5: Analysis” and “Step 6: Interpretation” again.

Step 6: Interpretation

Step 7: Involvement (Manage your organization based on the data)

Once the hypothesis is proven, proceed to implement the action, involving the organization.

8 steps to data analysis

Step 8: Execution (Execute the action decided upon)

In Conclusion

These are the 8 steps of data analysis.

If these 8 steps are not followed in order, it is easy for data analysis to become aimless, and it is likely that data analysis will not lead to the execution of actions.

In particular, the worst mistake you can make would be to run the steps out of order, such as starting with what kind of analysis you want to do, then collecting data, then formulating a hypothesis. Data analysis that does not lead to action cannot be utilized in business, and ends up being analysis just for analysis’s sake.

As you can see from these 8 steps, only “Step 5: Analysis” requires expertise in mathematics and statistical analysis. What is important in data analysis is a clear sense of purpose, highly accurate hypotheses, and the imagination and creativity that experience fosters.

If you are already involved in the business, you probably have all of these things.

If you are doing data analysis for the first time, we encourage you to experience all of these 8 steps.

XICA Co., Ltd. CEO Yoshiaki Hirao

XICA Co., Ltd. CEO Yoshiaki Hirao

A graduate of Keio University, Faculty of Policy Management. Yoshiaki witnessed the bankruptcy of his father’s company, and this drove him to discover the potential of management support through statistical analysis in university. After spending some time as a musician, he founded XICA Co., Ltd. in February 2012.

Recommended for you