[For beginners] What is statistics? Explaining what statistics can do using familiar examples

Since the book "Statistics is the Most Powerful Science" was published in 2013, statistics has begun to attract attention.
However, there are probably many people who think, "I don't really understand what statistics is in the first place," or "I don't know what use statistics is."
XICA was founded in 2012, when statistics was not yet widely known, and is a data science company that has consistently developed its business around statistical analysis to this day.
This article is aimed at those who are new to the field of statistics."What kind of study is statistics?" "What can you do with statistics?"I would like to explain this using some familiar examples.
table of contents
What is statistics?
What is "Statistics"?
Statistics isA science that elucidates regularities and irregularities from empirically obtained data.
We are surrounded by an endless amount of data, but simply looking at a list of numbers is meaningless.
Statistics analyzes these huge amounts of data and identifies the characteristics of the data, regularities and irregularities, and relationships between data. In this way, we can analyze the current situation and predict the future.
Information obtained from data through statistical analysis is used in a wide range of fields, from service development and marketing to company management.
The differences between "statistics," "data science," "data analysis," and "machine learning"
Terms similar to "statistics" that are difficult to distinguish from each other include "data science," "data analysis," and "machine learning." It is difficult to completely separate these terms, but here we will briefly summarize the differences and relationships between them in order to increase the resolution of statistics.
The difference between statistics and data science/data analysis
First of all, what is "data science"?The overall approach to using data to derive useful insightsIt is easy to understand if you think of "data analysis" as a process within it, and "statistics" as one of the methods for analyzing data.
In addition to statistics, methods for analyzing data include information science, algorithms, and mathematical techniques such as linear algebra, differential calculus, and integral calculus.

The difference between statistics and machine learning
Both statistics and machine learning are similar in that they analyze data, discover rules and patterns, and create models.
The biggest difference is,Differences in purpose.
Statistics"explanation"While machine learning aims to"prediction"The purpose is often to:
Statistics can also make predictions, and machine learning can also explain things, but because machine learning deals with large amounts of data, it follows an analytical process that is not intuitive and can sometimes be a black box. In that respect, statistics can logically explain why a model was created.
In terms of what is important, machine learning is"Highly accurate prediction"Statistics emphasizes the importance of predictive accuracy rather thanBeing able to explain dataWe place importance on:
Therefore, machine learning is generallyPursuing the accuracy of future predictionsStatistics is used inConsider your next move or decide on a surefire actionIt is used on occasion.
In practice, however, these boundaries are not so clear-cut.
Statistics is actually in our daily lives
When trying to explain the definition of "statistics," technical terms are used, which may give the impression that it is difficult to understand. However, statistics are actually used every day in our daily lives.
Here, we will introduce some examples of statistics used in daily life to help you feel more familiar with statistics.
Convenience store

Convenience stores, which have many customers, record purchasing data (POS data) such as what age group and gender customers purchase, what products they buy, at what time of day, and in what combinations.
Companies are using this data to influence how they purchase and display products, understand customer demand, and develop new products.
When you go to a convenience store, you may notice that the product layout is different than usual or that new products have been released. In most cases, this is because the store has decided on the layout based on data from your daily shopping.
tv set

Statistics are also used in television ratings. Television ratings are calculated using a method called a "sample survey," based on data collected from several hundred households.
Several hundred households are randomly selected from all households that own a television set as "monitor households," and an automatic measuring device is attached to the television set to automatically collect information on whether the power is on or off and what channels are being watched. Based on this data (sample), the sample survey estimates the characteristics of all households that own a television set. This is how viewer ratings are calculated.
The viewer ratings collected in this way can be used to revise the time slots for broadcasting programs, and to create and edit programs that better suit the interests of viewers. They can also be used to more effectively sell advertisements, such as placing advertisements that are targeted to the desired audience.
Estimated score for the Center Mock Exam

Those who have taken university entrance exams may have taken mock exams for the National Center Test for University Admissions. When you look at the results sheet for a mock exam, you will see a grade such as A or B, along with an estimated score (also called an estimated value) that will be converted into the score for the actual National Center Test.
This estimated score is calculated using statistical methods that combine mock test data from test-takers from last year, the year before, or even earlier, with their scores from the actual National Center Test.
First, we create a hypothetical distribution that assumes that "test takers from past mock exams would get the same results if they were to take the National Center Test," and then convert the scores by adjusting the score distribution from this year's mock exams to match the hypothetical distribution.
This converted score can be thought of as a prediction of how much your grades will improve by the time of the actual National Center Test. You can use this converted score as a reference to decide which school you want to attend and set your study goals.
Free downloads of related materials
Gain knowledge of "statistics" and accelerate your marketing
- Introducing the basics and "MMM," a statistical analysis method that has recently been gaining attention again -
What statistics can and cannot do
As you can see, we all come into contact with statistics without even realizing it, but what can and cannot statistics do? We will explain this with some examples.
*Some of the examples discussed in this chapter are based on statistics and are carried out using machine learning and physical models.
What statistics can do
Explain complex data in an easy-to-understand way
The statistics are:The task of finding features from a large amount of data.It is possible to easily explain characteristics that are difficult to grasp just by looking at numbers.
For example, the term "average," which we use in our daily lives, is one example that explains complex data in an easy-to-understand way.
Below is a list of the age ranges of our members. If you were suddenly asked to "tell us what characteristics you can tell from this," it might be difficult to give an immediate answer.

So, let's graph the distribution of this age data and then calculate the average. This will immediately explain that the average age is 32, and that the company has many members in their 20s and early 30s.

This is a very simple example, but in statistics,By analyzing complex data, we can explain the characteristics of the data in an easy-to-understand manner.Available
I can predict what the future holds

Election News
The same methods are used for election results as for public opinion surveys. In election results, candidates may be declared "sure to win" even when the vote count is 0% or 1%. Some people may have looked at these results and wondered, "Is it out already?"
Here too, statistical techniques are used that state that "if a certain number of randomly selected samples are available, the results can be used to infer with a fair degree of accuracy the overall movement of votes," and even with a low vote count, the overall results can be predicted and a candidate declared "guaranteed winner."
Eur-lex.europa.eu eur-lex.europa.eu
Have you ever wondered why weather forecasts talk about the weather for the next week or even two weeks? Statistics are actually used in the weather forecasts we see every day.
Weather forecasts are calculated from huge amounts of past weather data. Specifically, areas are divided into small blocks and the weather conditions within each block are measured. Similar patterns are then extracted from the past data to predict future conditions. The probability of precipitation is calculated by calculating how many times out of 100 times that rain will fall when that pattern occurs.
Statistics and data science are also used to study long-term climate change.
When estimating long-term climate change and its impacts, we first create an "emissions scenario" (an assumption of how much greenhouse gases that cause climate change will be emitted in the future), and then estimate how the climate will change in accordance with the emission scenario (the result is called a "climate scenario"). Next, we estimate what impacts society will have due to the climate changes shown in the climate scenario (the result is called an "impact model"). This is how research into the impacts of long-term climate change is carried out.
Create a winning strategy

baseball
Another example of statistics that surrounds us is baseball.
For example, the batting average of a player often seen in baseball games can be calculated by dividing the number of hits the batter has made by the number of at-bats.
Recently, the term "baseball statistics" is gradually becoming more well-known. Baseball statistics, also known as sabermetrics, is an analytical method that uses statistics to objectively evaluate players and then use them to consider team strategies. Baseball teams sometimes use statistics in this way to develop strategies.
Statistical analysis is becoming more and more common not only in baseball but in many other sports.
Safety can be guaranteed

Statistics are also used in areas that are important to our daily lives, such as the development of new drugs and quality control.
New drug development
When testing the effectiveness of a new drug, trials are conducted on patients to confirm the drug's effectiveness.
In this study, patients are randomly divided into two groups, one of which is given the new drug, and the other is given an existing drug suitable for comparison, or a placebo that looks and tastes exactly like the new drug. Statistical methods such as "hypothesis testing" are used to determine the effectiveness of the drug from the data obtained from this study.
Specifically, we first establish a hypothesis (null hypothesis) that "there is no difference in the effects of new drugs and existing drugs," and then look at the probability of the data occurring under that hypothesis. If this probability is small, we consider that "something that rarely occurs has occurred" under that hypothesis, and the hypothesis established at the beginning is deemed inappropriate. In other words, it is judged that "there is a difference in the effects of new drugs and existing drugs" (alternative hypothesis).
The effectiveness of new drugs is verified using statistical theories like these.
Inspection Management
Statistical methods are also used in quality control.
For example, one type of inspection carried out in quality control is sampling inspection, in which a sample is taken from a group of items called a lot, and the sample is tested, and the resulting data is used to decide whether the lot passes or fails.
Become a basis for business decisions

In the business world, the use of statistics is becoming indispensable. Here are some examples of businesses that have achieved results by using statistics.
Insurance companies don't go bankrupt because they use statistics
For example, how are so many insurance companies able to stay in business? It's because they predict the probability of death or illness based on age, gender, and past medical history.
Insurance companies calculate premiums for life insurance, non-life insurance, etc. based on the probability of death and accident occurrence rates obtained from past statistical data. Statistics are used here as well.
Sushiro optimizes product management with IC chips and statistics
Statistics is also used to generate business results.
For example, Sushiro collects approximately 10 billion pieces of data per year by reading IC chips attached to plates with an IC reader hidden in the lane.
This data is used to determine the popularity of sushi ingredients and adjust the number of ingredients ordered to reduce food waste.Furthermore, based on the number of times the IC reader reads, sushi that has been made a long time ago is removed from the conveyor belt, and the data is used to manage freshness.
DyDo DRINCO's sales increase by 3% thanks to statistical data on "people's gazes"
In addition, vending machine manufacturer Dydo DRINCO has adopted eye tracking technology to investigate purchasing behavior at vending machines. Based on this data, they are analyzing unconscious behavior, emotions at the time of purchase, and the deciding factors for the final decision.
This activity has enabled proper product placement and has resulted in a 3% increase in sales.
Rakuten Group analyzes data from over 50 businesses
Rakuten Group collects data from about 50 business activities and has data analysts analyze products. This data is used for the recommendation function on Rakuten Ichiba, and has achieved great results by reducing the frequency of updates to best-selling product rankings and segmenting categories.
In this way, by utilizing statistics, you can find valuable information that will lead to the next action required to succeed in your business, allowing you to correct the course of your policies and projects.
Rather than relying on experience and intuition, you can judge things from an objective perspective, thereby increasing your chances of success.
Able to prove (provide evidence for) hypotheses or theories

Statistics forms the basis of many academic disciplines, from the humanities, social sciences, and natural sciences (basic sciences) such as physics, economics, sociology, psychology, and linguistics, to applied sciences such as engineering, medicine, and pharmacy.
In these fields of study, we first formulate a hypothesis, then verify it to pursue its accuracy.A way to provide evidence for the idea that "maybe..."It is used in a wide range of fields.
In this way, the theory of statistics is used in a wide range of fields, from public opinion surveys to weather forecasts, and even the development of new medicines that affect our daily lives.Interpreting society's past and present, and predicting the future
What statistics cannot do
I can't ask for an answer
While utilizing statistics opens up a variety of possibilities, there are also some problems that cannot be solved by statistics alone.
That is, Seeking answers.
The results of statistical analysis are not the answer.Only when humans interpret the results of analysis can they create new value.
For example, new ideas can only be generated by combining the suggestions derived from data analysis with human creativity.
It is important to always harness human creativity, rather than relying solely on statistics and data..
Statistics will become even more important in the future
Statistics is a very important academic field in today's world where we are overwhelmed with data. As in the examples introduced in this article, data analysis using statistics is the basis of corporate activities not only in the research field but also in the business world. It is now essential for business people to keep up with statistical knowledge.
If after reading this article you feel that statistics are at least a little more familiar to you, we encourage you to pick up a book for beginners or consider whether you can incorporate statistics into your own work.
▼ This is a series of articles that explains how to incorporate data analysis into business for people who are new to statistics and data analysis. The articles also introduce recommended books for beginners, so please take a look..
↓ "Data Analysis from Scratch" series
#1 "8 steps of analysis" that beginners should know first
#2 Three analytical techniques that data analysis beginners should remember
#3 Communication tips to get management involved that data analysis beginners should know
#4 What data analysis beginners need to know: What management expects from data analysis and analysts
#5 4 tips for field staff who are new to data analysis to get management and the organization involved
▼We provide examples of how data analysis can be incorporated into work for each job type. Please use this to visualize how you can use data in your own work.
▼ In the business media "PIVOT", our CEO Hirao explains how to utilize data science in business. You can gain a deeper understanding of the significance of using data science in business.