over the past few months, people from different industries have asked me whether I can provide an end-to-end view, so that they understand the process of thinking as a data scientist. To find the answer to this question, I think not only to provide an end-to-end view of the process, but in the face of an analysis of the problem we should be more in-depth understanding of what he / she thinks.
next I will be divided into five plates to lead you to experience the way scientists think data. The first half of the article will introduce the data scientists how to carry out the task of modeling and data points of the project, so that we can provide data for the follow-up of the scientific journey and direction. We will also look at the other two important factors in the life cycle, namely exploratory data analysis and feature engineering. These processes are important in formulating the correct model of the problem.
when we try to unravel the thinking process of data scientists, we need to go through five processes:
is a bird’s-eye view of data scientists try to locate the problem when thinking of the maze. So let’s go along these paths and set foot on the minds of data scientists.
: we began to explore business
always has a number of business challenges or problems at the beginning of the year, and these difficulties pave the way for future data science.
in order to be more understanding, let’s give an example of an agricultural company that produces eggs and then finds us, hoping to help them predict egg production. In order to solve these business forecasting problems, they give us the available historical data in our internal system.
where do you think we should start the task? The best way is to build up our intuition and assumptions about the variables that are not good for us. We can call it a response variable, in which the egg production. In order to obtain the intuition of the key factors that affect our response, we must take some supplementary research and contact the relevant personnel of the company. We can take this stage as a stage of familiarity and business discovery. At this stage, we build our intuition about the key factors that affect our response variables. These key factors are called independent variables or characteristics. Through the service discovery (also translated as above, we can see that) affect the key features of egg production is temperature, electric power, good water, nutrition, chicken feed quality, diseases, vaccination etc.. In addition to the identification of key features, we also build on the relationship between feature and response variables.
for example –
What is the relationship between
temperature and egg production?
does that chicken feed affect production?
Is there an association between
power and output?
the intuition that begins to build will help us