What if the data is missing? Four small ways to share with you

In the process of doing data projects, solving business problems, and in-depth research on a data application, the biggest problem we usually encounter is that there is no data, no data, no data.

Especially in terms of data application, not only the lack of data has become our difficulty, but also the small amount of data is one of the difficulties. This problem has been plaguing me, and even because of this problem, I am often complained by the leaders that I can't do anything. I can't say anything about the pain. Cooking without rice!

First, create data, first and then optimize

If we sometimes encounter a business that has just been established and has not yet had time to collect more data, or only very little data, or even no data at all, this method can also be used first, that is First, the simulation data is implemented according to the business logic.

The first step is to get the table structure of the business database. The table structure is the basis of a data table. Which can be understood as the title of the table in an Excel table. The table structure provides the fields, data types, and data formats of the business content, and the data created according to the table structure is more in line with business logic. If the business has multiple tables, you also need to get the association relationship between the tables, that is, the ER diagram.

The second step is to create data. Now there are many tools for data creation on the market. After product managers coordinate the things that technical partners need to create data. They can hand them over to create data. Sometimes the created data will have a large deviation, and then manually adjust the created data.

Use this method to first generate data that meets the needs of the business. And use these data to try to solve the problem. If the difference is relatively large. Continue to optimize the generated data until the estimated error value is reduced based on experience and test comparison. First come first!

Ask the internal team to help with small work

If the simulation data we created is difficult to convince the leaders, then let’s create some fairly real business data.

Take a case I have done as an example.

We want to do the application of tourist flow monitoring in scenic spots. I found a lot of pictures of tourist flow in scenic spots. And then made avatars of people one by one to make a data set for the algorithm team.

Friends who are familiar with the insider may know that for a job like this, a special data supply company outsources this part of the work, or it often publishes some part-time jobs, and mature data labelers can view more than 10,000 pictures a day. , so if 50 people participate in the labeling, more than one million pictures can be in two days.


