What is big data? Its meaning, use cases, analysis methods, etc. [Basic knowledge you need to know now]

Update date: Column
MMMData scienceData analysisMarketing Strategystatistics

The term "Big Data" became a big trend in Japan between 2011 and 2012, and was even nominated for the New Words and Buzzwords Award in 2013. More than 10 years have passed since then, and it seems like we hear it less and less.

However, this does not mean that the importance of big data has decreased. Now that an increasing number of business people are involved in data analysis, it is time to re-examine the meaning of big data, its use cases, and its analysis methods.

The use of big data is no longer limited to large or advanced companies; it is a benefit of IT that all companies can enjoy.

In this article, we will explain the basics of "big data" in an easy-to-understand manner, which is especially important now. Please refer to this article to understand the significance of utilizing big data and examples of its use.

What is Big Data?

Big data is literally a huge collection of data. It does not increase in proportion to time, but grows exponentially.

The graph below shows:Intel Architecture Day 2020This is an excerpt from the materials presented in

A graph showing the trend of data increase
Source:Intel Architecture Day 2020|https://player.vimeo.com/video/447304765?dnt=1&app_id=122963|intel)
Click to go to the video playback page
*The above graph is shown at 12:00 in the video in the link below.

This graph shows the trend of data growth in the world. By 2025, the world's data is expected to grow to a massive 175 zettabytes (ZB). Zettabyte is a unit of data that is 10 billion times larger than a terabyte.

To help you visualize just how massive 175 zettabytes of data is, let's take an example.

For example, the New York Stock Exchange in the United States generates about 1 terabyte of trading data per day, which, if converted into 1 zettabytes, will generate about 175 million years' worth of stock trading data by 2025.

Unimaginably large amounts of data, such as the example given above, are called big data.

The "Five V's" of Big Data

However, if big data is simply defined as "the massive amount of data generated all over the world," companies will not be able to make progress in utilizing big data.

Therefore, the definition of big data used around the world is"Five V's".

The 5 Vs of Big Data

1. Volume (total amount of data)

2. Velocity (real-time data)

3. Variety

4. Veracity

5. Value

This definition was coined by former Gartner VP Analyst,Douglas RaineyThis is the addition of two new Vs (3 and 1) to the "three Vs (3 to 2)" proposed by . When defining big data on a company-by-company basis, consider whether it has these "five Vs" as one of the criteria (*4).

(*1) If the data does not have the "5 V's" listed here, it does not necessarily mean that it is not big data. For example, even if the total amount of data is about 1 terabyte, if the other "4 V's" are in place, it can be expected to be used for business purposes as big data.

The difference between structured and unstructured data

Big data is largely made up of two types of data."Structured Data"When,"Unstructured Data".

Structured data

Data that can be organized into a table with rows and columns, making it easy to query (search by specifying conditions).

Example: Data compiled in Excel, CSV, or relational databases

Unstructured Data

Data that has no regularity and cannot be organized in a table format. It has a high degree of freedom and is important in AI development and machine learning.

Examples: Chat, SNS posts, videos, IoT data, satellite images, etc.

In recent years, unstructured data has been attracting attention in the use of big data.But unstructured data is difficult to work with and requires data science expertise.

Attention is focused on the use of unstructured data

These daysApproximately 80% of the data generated in the world is unstructured.It is said that.

The spread of the internet and social media, the increase in IoT devices, and advances in AI development are driving an increasing number of companies to use unstructured data to find new value in their business.

Unstructured data can be difficult to handle, but if the power of data science can make it possible, it can become a major driving force for companies' business.

For information on utilizing unstructured data, see "Expanding use cases of big dataPlease refer to the detailed explanation in the " section.

Big Data Analysis Now Available

Data science expertise is essential to prepare data that does not have a specific structure, such as unstructured data, so that it can be analyzed. However, there are various tools available that make it possible to find insights from big data, even if you do not have data science expertise.

For example, you can check Google search trends."Google Trends"This allows you to analyze trends in station user numbers.

A graph showing the trends for "Shibuya Station" on Google Trends over the past five years
A graph showing the trends for "Shibuya Station" over the past 5 years | Google Trends

Above is a graph showing the trend of "Shibuya Station" over the past five years as seen on Google Trends. The trend dropped sharply between December 5 and May 2019. This coincides with the period when the COVID-12 pandemic rapidly spread and a state of emergency was declared.

After that, the trend for "Shibuya Station" went up and down repeatedly, and the trend index returned to pre-COVID-2023 levels around February 2. Incidentally, the downward trend in the graph coincides exactly with the first to seventh waves.

in this way,Anyone with unstructured data and the tools to visualize it can get insights from big data.

What is highly required of modern business people is to use visualization tools like Google Trends toThe ability to interpret data through analysis, think about how to apply it to business, and then put it into action.

Big data is essential for the digital shift of Japanese companies

It is believed that the future of Japan's digital industry will depend greatly on whether or not it can promote the use of big data.

The Ministry of Economy, Trade and Industry announced in September 2018,DX Report” I sounded the alarm about the “2025 cliff” that awaits the Japanese economy.

A chart explaining the "2025 Cliff"
Source:DX Report - Overcoming the "2025 Cliff" of IT Systems and Full-Scale Deployment of DX - | Ministry of Economy, Trade and Industry

The "2025 Cliff" is a scenario in which delays in the use of IT and data will result in economic losses of up to 2025 trillion yen per year between 2030 and 12.

In order to avoid this worst-case scenario, a fundamental review of the IT environment and promotion of digital transformation that enables data utilization are required. 

In order to promote DX, it is essential not only to fundamentally review the IT environment, but also to conduct big data analysis and take steps to create new products and services, new businesses, and a new corporate culture.

In other words, Big data is an essential element for the digital shift of Japanese companiesIt also plays an extremely important role in promoting DX and avoiding the "2025 digital cliff."

Big data expands the possibilities of AI and IoT

When talking about big data, it is indispensable toAI (Artificial Intelligence)IoT (Internet of Things).

AI covers a wide range of research fields, and among them, machine learning and deep learning are closely related to big data.

Machine learning involves feeding huge amounts of data into a program, analyzing the characteristics of the data, and enabling AI to analyze and predict the data. On the other hand, deep learning involves feeding huge amounts of data into a program and having it learn, developing AI that can make autonomous decisions.

ChatGPT, which has been a hot topic in recent years, is the result of machine learning and deep learning, and big data is also used to train ChatGPT and improve its accuracy.

IoT refers to devices that enable data collection through sensors and internet communication, and is also closely related to big data. The data generated by IoT is itself big data, and through analysis, various insights and services can be provided to users.

Big data, AI, and IoT are mutually complementary technologies that will support the digital industry now and in the future.

The use of big data combined with AI and IoT is gradually spreading in Japan. We will introduce some specific examples of its use.

Planning, development and production of products and services

By analyzing consumer data and corporate purchasing data generated around the world,This will enable us to understand the needs of consumers and businesses and plan and develop products and services that meet those needs.

As exemplified by the national strategic project "Industry 2011" announced by the German government in 4.0, the use of big data, AI, and IoT is also increasing in product manufacturing.

By building a sales prediction model for products and services through this kind of big data analysis, more efficient planning, development, and production can be achieved.

Servitization

Servitization is aA shift in business model to create new added value for products that were previously sold as "things" and provide them as "services"

For example, Rolls-Royce's "Power by the Hour" is a prime example of servitization, providing aircraft engines on a subscription basis.

By equipping the engine with sensors and connecting it to the IoT, it is possible to calculate the energy used to propel the aircraft and provide this as a pay-as-you-go service.

Smart Agriculture and Fishing

Big data is also being used in agriculture and fishing in recent years."Smart agriculture" and "smart fishing"It is attracting attention as such.

Big data analysis has made it possible to determine the optimal harvest time and implement efficient fishing methods, and the use of big data is even spreading to retail, such as by building sales models for direct shipments from the source.

In both smart agriculture and smart fishing, it will be possible to achieve "data-driven harvesting and fishing" that does not rely on intuition or experience.

Improving CX (Customer Experience)

For web services and e-commerce sites, we analyze customer data collected from the services and sites and use it to improve CX.

More familiarly, it will identify similarities between customers and automatically recommend products and services to them."Recommendation function"Big data is being used to:

In the future, it is expected that customer data will be collected and analyzed not only from specific services or sites but across the entire internet, improving CX and providing more comfortable services (data will be anonymized).

Enhanced compliance and security

It is said that leaks of personal and confidential information are "more likely to result from internal fraud or operational errors than from external cyber attacks."

By analyzing big data such as patterns of data usage and fraud within the company, and patterns of information leaks due to operational errors,We can develop systems that prevent security incidents caused by internal fraud or operational errors.

In fact, security software development companies around the world are working to improve their security products through big data analysis.

Optimizing marketing activities

In digital marketing, which has become mainstream in recent years, the success or failure of marketing activities depends on whether or not the big data generated after a campaign can be processed and analyzed in real time.

For example,MMM (Marketing Mix Modeling)is a statistical method that enables the measurement of the effectiveness of marketing initiatives across online and offline domains.

This makes use of big data generated by marketing platforms such as digital advertising and social media.

Ad optimization by importing external data

The big data usage in the advertising business is attracting attention.DMP (Data Management Platform).

DMPs can combine first-party data held by a company with third-party data that the DMP operator has obtained and organized independently, and analyze it as a single big data set (*1).

Combining these two types of data will help optimize your advertising and improve your ROI (return on investment).

(*1) In recent years, with the growing awareness of personal information protection, it has become more difficult to obtain data linked to individuals, such as cookies. It is necessary for companies to clearly determine guidelines regarding the use of personal data.

▼For future prospects for the use of personal data, please also refer to this article.

Future outlook for personal information protection regulations and actions that companies should take

Six systems for utilizing big data in business

In order to utilize big data as mentioned above, tools that enable data analysis are essential. In the marketing industry, these tools can even be used to execute measures such as ad delivery.

Here we will introduce six representative systems, but not all six are necessary; it is important to make the right selection depending on the purpose of using big data.

1. BI (Business Intelligence)

What is BI?A general term for systems and processes that collect, process, and analyze data generated through business operations to support management and on-site decision-making..

We specialize in data collection, accumulation, aggregation, analysis, and reporting, and provide functions such as data analysis in various formats, data mining (exploration), and report output, helping to speed up decision-making.

2. DMP (Data Management Platform)

A system that enables the use of big data by collecting and storing huge amounts of data and linking it with other systems..

DMPs generally refer to "open DMPs" that allow users to use the vast amounts of anonymous data collected and accumulated by service providers. In addition, the use of big data through "private DMPs" that collect, accumulate, and safely manage in-house business data is also on the rise.

DMP allows you to deliver ads and emails to each segmented user, making one-to-one marketing possible. It is one of the indispensable systems for advertising optimization in recent years.

3. ERP (Enterprise Resource Planning)

What is ERP?A large-scale business system that can centrally manage data from each company's departments, such as accounting, sales, production, inventory, and human resources, through an integrated database.You read it right!

By linking data from areas covered by ERP with BI and DMP, you can create an environment and foundation for utilizing big data.

It will also facilitate smoother data exchange between different business functions and enable real-time checking of a company's business status.

This is a tool for utilizing big data to improve data processing efficiency and contribute to optimal and rapid management decisions.

4. MA (Marketing Automation)

What is MA?A system that enables lead generation and lead nurturing through predefined scenarios and imported lead data..

While analyzing online and offline prospect data, you can automate parts of your marketing initiatives by triggering pre-defined scenarios.

By utilizing big data, we can improve the efficiency of marketing operations, allowing marketers to focus on creative work.

5. MMM (Marketing Mix Modeling) Tools

What is MMM?A statistical method that comprehensively analyzes marketing-related big data and visualizes the direct and indirect impacts that each marketing measure has on results.The MMM tool is a tool that embodies this mechanism so that anyone can use it.

"MAGELLAN" is a true MMM tool, and has been adopted by companies in a wide range of industries and sectors to visualize advertising effectiveness and optimize budget allocation.

6. RDBMS (Relational Database Management System)

What is RDBMS?A system for managing a relational database.

A relational database can store data in a table format. It can be simply thought of as a database that stores data in a table format like Excel. In other words, it is an environment or infrastructure that can manage structured data, which is one type of big data.

RDBMS uses a database language called SQL, which allows you to process the data stored in a relational database in various ways.

Main analytical methods used for big data

When utilizing big data, it is important to work backwards from your business goals, select the appropriate analytical method, and collect and analyze the appropriate data.

Here we will explain the main analytical methods used for big data, so if you are planning to get involved in data analysis in the future, please use this as a reference.

1. Cross-tabulation analysis

A method for analyzing data collected through questionnaires in detail.

For example, when investigating the approval rating of the Cabinet, rather than simply tallying up whether or not one approves the other, the data is analyzed by cross-referencing multiple items such as gender, age, and prefecture.

At first glance, this may seem like a simple analytical method, but the results will vary depending on the "axis of analysis," so it is important to be clear about the purpose of the data analysis.

2. Logistic regression analysis

An analytical method for predicting and explaining the probability of a binary response variable (outcome) occurring based on multiple explanatory variables (factors).Binary means that there are only two target variables, such as "YES" or "NO".

A typical example of how it can be used is direct mail (DM), a marketing measure. Users who make a purchase via DM are assigned a "1" and users who do not make a purchase are assigned a "0" to calculate the user's probability of purchase. By sending DM preferentially to users with a high probability of purchase, you can achieve results efficiently.

3. Association Analysis

An analytical method to find correlations in consumer purchasing behavior based on retail POS (sales performance) dataThis is called association analysis.

To put it simply, it is an analytical method for formulating hypotheses such as "Based on past POS data, women in their 30s are more likely to purchase Product A."

By incorporating association analysis into the algorithm, advanced recommendation functions can be implemented on e-commerce sites and VOD services.

4. Cluster analysis

What is cluster analysis?An analytical method that finds the characteristics of each data set in a population, classifies the data, and then identifies similarities and dissimilarities to observe the trends of the classified data (clusters)..

It is an analytical method that is used not only in marketing and branding, but also in a wide range of fields such as machine learning. It is also possible to analyze unstructured data, and is one of the analytical methods that is emphasized in the use of big data.

5. Decision Tree Analysis

A type of data mining for the purposes of prediction, discrimination, and classificationData mining refers to a technique for deriving new knowledge from big data using statistics and machine learning.

It is called "decision tree analysis" because the analysis results are in the form of a tree diagram.

Have you heard of the smartphone app "Akinator"? It's an app that will tell you which famous person comes to mind by asking you a few questions. The Akinator algorithm also uses decision tree analysis.

6. Principal Component Analysis

By aggregating data that has many explanatory variables, creating new ones, replacing them with variables (principal components), and analyzing them, you can understand the power balance for each data.

Taking the analysis of restaurants in a chain as an example, we can use survey data collected from customers to analyze the overall strength of each store as well as its capabilities in specific areas.

The analytical results can vary greatly depending on which explanatory variables are used and what principal components are used, so analytical sense and skill are required.

7. MMM (Marketing Mix Modeling)

What is MMM (Marketing Mix Modeling)?Statistical analysis to quantify the impact of marketing initiatives on resultsThis means that "A feature of MMM is that it can quantify the "influence on other marketing measures (indirect effect)" and the "influence on results (direct effect).".

As media and channels diversify, marketers are required to do more than just optimize each marketing measure. They are also required to use multiple media and channels and implement multiple marketing measures simultaneously. In order to maximize results in this situation, it is important to analyze the synergistic effects between measures and maximize results through overall optimization.

▼ MMM is explained in detail in the following two articles. It is an essential tool for business people who want to use data as a weapon, so please take a look.

Three reasons why Japan should incorporate MMM into marketing in the cookieless era

Why you should use marketing mix modeling to accurately measure advertising effectiveness

XICA's thoughts on the future of big data analysis

More than 10 years have passed since the term "big data" became a trend in the IT industry. Even within the industry, there is a widespread perception that "isn't it a dead word?" Despite this, why am I explaining big data again in this article?

That is, Far from being a dead word, big data is becoming more and more important in business every year..

Big data is everywhere and is not just for large companies; small and medium-sized businesses can also make full use of it. The development of the Internet and the explosive growth of social media have made this possible.

However, the reality is that companies are simply continuing to generate big data without realizing it. The battle for dominance in industries that utilize big data has already begun on a global scale.

Advanced foreign companies that are highly sensitive to big data are using data analysis systems to train ordinary business people into data analysts.

Most of the data analysis is left to the system and a small number of data scientists."Data analysts who can think from a field perspective"However, this system will be developed in the next few years.The six systems mentioned above.

Finally, IT human resource development programs have started in Japan. However, they place emphasis on "developing IT engineers." The shortage of IT engineers is certainly a serious social problem, but there are many parts of it that can be addressed by systems.

The important thing isThe presence of "data analysts who can think from the field's perspective" by selecting the right system based on business goals and objectives and using the analytical data obtained from the system.

In this video, we talk about the data analysis skills that are in demand in the business world. Please take a look.

I would like all business people who read this article to redefine what kind of IT personnel their company really needs.

Recommended articles