The Data Science Approach – Key Features of a Successful Analytics Initiative
Across industries, businesses have benefited from the falling cost of data acquisition and ease of automated data collection. Smart companies have established processes and teams to leverage their data in order to gain a competitive advantage. In this post, we’ll explore the crucial features of an organization’s data science initiative.
Understand the Questions Data Can Answer
To begin, the business problem must be clearly communicated. You’ll need to identify the context of the problem, the rationale behind the problem, and the business units that will benefit from a solution. Next, project administrators must determine internal or external data that could solve the business problem. Not every problem can be solved with data analytics, but there are some key distinctions that will support project success.
Supervised vs. Unsupervised Techniques
After the business problem is established, you can focus on what specific information will provide insights, which determines the necessary data mining technique to be used. Essentially, a supervised technique will have a specific target variable (e.g. sales, new customer acquisition) and an unsupervised technique does not have a target (e.g. natural customer grouping).
To understand the differences, let’s look at some examples of business problems:
- How can we improve sales?
- We can solve this question with a supervised technique because we have sales data that can be used as a target variable.
- How can we improve customer marketing?
- This question can use supervised or unsupervised techniques. We can either predict the probability of sales (yes or no target), or we can effectively group similar customers to increase marketing efficiency. (This is unsupervised because we do not know the groupings beforehand.)
- What factors effect the sales channel that our customer uses?
- We can solve this question with a supervised technique. We know that our customer purchased through channel 1, channel 2, channel 3, or none.
- What questions are important enough to remain in our customer survey?
- This question can be solved with an unsupervised technique because there isn’t information available on question importance.
Data Mining Techniques
Now that we know if the problem can be solved through supervised or unsupervised methods, we need to choose a data mining technique to employ. In this discussion, we will introduce some common techniques to consider, and you can feel free to ask questions (click) or look for future posts (click) to learn more details.
Common supervised techniques include:
- Linear Regression – Model a continuous target variable using one or more predictors
- Logistic Regression – Model a categorical target variable using one or more predictors
- Time Series Regression – Model a continuous target variable using seasons (time) and one or more predictors
- Survival Analysis – Model a time frame from a specific baseline
Common unsupervised techniques include:
- Clustering or Segmentation – Grouping customers based on similar customer attributes
- Principal Components Analysis – Grouping variables based on similar information provided in the variables
- Link Analysis – Creating connections between individuals based on similar preferences
- Market Basket Analysis – Suggesting outcomes based on previous behavior
Business Analytics Growth
As business questions are answered, more sophisticated questions will arise. This ensures everlasting benefits of an in-house analytics team, which becomes more familiar with the data, underlying issues, and stakeholders as time progresses. The typical progress of data science within an organization is shown in the following figure:
Develop A Strong Team
- Team Leader – The person who manages the overall goals of the team. The leader is focused on assigning sprint (1-2 week time frames) goals for each member, and evaluating each sprint to ensure the end-deliverables are completed on time. It is important for the team lead to focus on the big picture, and allow his/her team members to hone in on the details.
- Scrum Master – The individual who motivates team members during each sprint. He/she handles the day-to-day issues and difficulties that the team faces, allowing the team lead to manage the overall progress.
- Technology Lead – The team member who has a strong technical background. This position will focus on ensuring data collection, aggregation and cleaning is efficient and correct. Another important responsibility for this member is data security and interactions with outside stakeholders.
- Other Team Members – A scrum team typically consists of 5-10 contributing members. These could be programmers, analysts, UI developers or software engineers. Keeping the sprint goals in mind, the team works autonomously until the end of each sprint.
An effective team will collaborate often, although the majority of progress will be through individual contributions. All team members should contribute during each sprint.
Implement The Results and Evaluate
After the team has developed a solution to the business problem, they will need to successfully convey their findings to the project stakeholders and business units that will implement the solution. This could entail passing the prototype to a development team for implementation, designing a dashboard to share with others, or changing business policies behind the scenes. Before implementation, it is important that any data mining models are completely validated and project implications are thoroughly considered.
Following implementation, it is critical to re-evaluate the business problem. Some important questions to ask would be:
- Did the project actually solve the business problem?
- How can we improve the data models?
- What new questions can we ask to piggy-back on our project?
It is common to rebuild statistical models often to account for new data, customer behavior or business landscapes. We recommend to evaluate implementations immediately, and on at least a continual annual basis.