Problem Definition — Convert Business Problem to Data Problem

Photo by Markus Winkler on Unsplash

Problem definition is the first step in the data science project lifecycle. A well-defined problem statement encapsulates the core objectives and constraints of a project. It serves as a bridge between the business domain and the data scientist, effectively translating real-world challenges into analytically tractable questions. By articulating the problem with precision, stakeholders gain clarity on the project’s purpose, scope, and potential outcomes. Without this fundamental groundwork, projects are susceptible to continues changes in the scope, delivering solutions that do not benefit the users, stakeholder dissatisfaction, and waste of resources and time on irrelevant or inconclusive findings.

In this tutorial, we will guide you through a comprehensive framework to formulate a well-defined problem statement, comprising a five-step process:

Step 1: Understand Business Problem (TOSCAR)Step 2: Frame the Problem Statementstep 3: Break into Smaller Problemsstep 4: Convert Smaller Problems to Data Problemsstep 5: Find Solutions to the Data Problem

Step 1 — Understand Business Problem (TOSCAR)

TOSCAR framework serve as a valuable tool to gain a comprehensive understanding of the business problem, its significance, and the parameters within which the project must operate. It stands for Trouble, Owner, Success, Constraints, and Actors

TOSCAR FrameworkTrouble: The first aspect to address is understanding the underlying trouble that the project aims to solve. This involves asking critical questions such as

What is the underline trouble? why do you need to predict …? Why now? Is there any time sensitivity? Are you working on timeline? What is the current process that you are currently having? Being specific about these aspects ensures clarity in problem formulation.

Owner: Identifying the owner of the problem is essential to determine who should be involved in the project. Understanding whose problem is being solved and its significance to them, as well as who will finance the project, provides essential context for project management and stakeholder engagement. Here you can ask:

Whose problem we are solving? How critical is the problem for them? Who is going to pay for this?

Success Criteria: Defining success criteria is crucial for measuring the effectiveness of the solution. This involves understanding:

What does success look like? How would we know if the usecase is successful? How would you act differently if we provide you such prediction? How the AI model will be utilized? How it will be integrated to your current process? What are the metrics that they will be tracking?

Constraints: Every project operates within certain constraints, whether they be budgetary, temporal, or logistical. Identifying these constraints upfront enables realistic goal-setting and helps in determining acceptable trade-offs. Understanding what is within scope and what lies beyond it is vital for managing expectations and prioritizing objectives. And you should question for those constraints:

Budget? Boundaries? communications? ROI? Time? is there a trade-off? What trade-off is acceptable? What is in-scape? What is out pf scape? What is the priority and what we want to achieve in this timeframe? How much historical data we have? how much time the user needs to act on the predictions?

Actors: Identifying stakeholders and understanding their interests is essential for project buy-in and collaboration. Recognizing the roles and motivations of various actors allows for effective communication and ensures that the project addresses the needs of all involved parties.References: Reviewing past attempts to solve similar problems provides valuable insights and learnings that can inform the current project. You can ask:

Did people try to solve this problem in the past? what was the outcome? Are there any learnings you can use?

Now let’s take an Example to apply TOSCAR. We will focus on what to ask to thoroughly understand the problem:

Business problem: The Sales Director of an ecommerce company expresses the need to increase revenue from existing customers.


What is the Trouble?Why are we specifically targeting existing customers for revenue growth?What is prompting this initiative now?Where does it fit in your (Sales Director) priority?Do we have any Cross-sell practice in place?Any competition benchmarks?


Who are the Owners of the problem?Should the Customer Management Director also be involved in addressing this problem?Who will own the implementation of proposed solutions?Are there other departments or stakeholders who need to be aligned with this objective?Owner of the problem?


What would success look like?How do we measure success?What is Average spend from existing customers?What is the % of customers buying over a time period?What would we want to achieve in what time frame?Can this success be achieved by some other manner?


What are the constraints?What limitations exist on our communication frequency with customers?What budgetary constraints do we need to consider for implementing revenue growth strategies?Do we have access to sufficient CRM data and behavioral insights?Are there any regulatory or compliance constraints related to customer communications or data usage?What potential trade-offs should we anticipate, such as impacts on customer satisfaction or unsubscribe rates?How much time the business need to act on the predictions?


Who are the actors and stakeholders?Who are the key players within the sales team who will be involved in implementing revenue growth strategies?Should the Customer Management team be consulted for their input and involvement?How will executive leadership, including the CEO, be engaged in this initiative?What roles will the Operations and Marketing teams play in executing revenue growth strategies?


How are Current Cross-sales to existing customers handle?Have we previously undertaken initiatives to increase revenue from existing customers?What were the results of these past efforts?How effective are our current cross-selling practices in driving revenue from existing customers?What lessons have we learned from past experiences that can inform our approach to this initiative?

Step 2: Frame the problem statement

Based on TOSCAR outcomes, we can drive to the problem statement. Below is an example:

Step 3: Break into Smaller Problems

In this step, the objective is to break down the larger business problem into smaller, more manageable components that can be addressed using data tools and techniques.


To start, we decompose the larger problem into smaller. We convert each problem into a mathematical equation, typically in the form of addition or multiplication. For instance, consider the goal of making a business profitable. We can represent this as:

Profit = Revenues — Costs

Next, we further decompose revenues and costs into their constituent parts to identify areas where data can offer insights. For instance:

Revenues = Number of Customers × Return Rate × Conversion Rate × Average Purchase Value

Here, we break down the revenue generation process into components such as the number of customers, their likelihood of returning, conversion rates, and average purchase values. This allows us to pinpoint specific areas where data analysis can inform decision-making and drive improvements.

MCEC Principles (Mutually Exclusive Collectively Exhaustive):

Another approach is to apply MCEC principles, which involve breaking down the entire problem into mutually exclusive sub-problems. For example, if the objective is to improve customer satisfaction across a company, we may break it down into functions such as sales, marketing, operations, and customer service. Each of these functions is non-overlapping and collectively addresses the entire problem.

Further decomposing each function, such as sales, may involve projects or initiatives like predicting customer buying behavior using data and proactively reaching out to them. Similarly, other functions like marketing and operations can be decomposed into smaller, more specific components, each of which can be addressed using data tools and techniques.

By employing decomposition strategies like mathematical equations and MCEC principles, we can effectively break down the larger problem into smaller, more manageable components, enabling targeted analysis and solution development using data tools and techniques.

Step 4: Convert Smaller Problems to Data Problems

Once we have decomposed the larger problem into smaller, more focused components, we leverage data tools and techniques to address each sub-problem effectively.

Continuing from the revenue example in Step 3

Revenues = Number of Customers × Return Rate × Conversion Rate × Average Purchase Value

We can take each component in the equation and think of how we can use data techniques to improve it.

Number of customers:

Forecasting future customer acquisition based on historical trends and market data.Customer segmentation to identify different customer groups with varying needs and preferences.

Return rate:

Predict the likelihood of a customer coming back to your site within 30 days.

Conversion rate:

Predicting the likelihood of customer conversion based on website interaction data and past purchase behavior.Optimizing website design and user experience to improve conversion rates through A/B testing and experimentation.

Average purchase value:

Forecasting future purchase values using predictive modeling techniques and customer lifetime value analysis.Create Recommender systems to personalize product recommendations and upsell/cross-sell opportunities based on past purchase history and customer preferences to increase the customer purchase value.

Step 5: find solutions to the data problem

Before implementing any solution, it’s crucial to assess whether it is feasible within the constraints of resources, technology, and time available. Additionally, understanding the complexity of each solution helps in determining the level of effort required for implementation and potential risks involved.

It is recommended to go with implementation solutions with lower complexity and higher business value.

Common pitfalls in defining the problem

1- Solution confirmation: the business leaders come with the solution and define that as the problem. We need to make sure the problem is a problem and not a solution. This commonly happen in organizations where the senior business leaders are not challenged by their team. For example, they may ask the data scientist to implement a use clustering to create marketing segments, without understanding the actual problem.

Best way to Mitigate it is to ask why multiple times to try to reach the reasons behind that solution.

Seek alternative solutions, for example, instead of creating clusters what about ranking each and every customer.

2- Wrong framework: Understand your assumptions and black spots. Here an example:

Business Problem: To improve hiring decisions, a call center company is tempted to use machine-learning algorithms to select job applicants for the personality traits of current, longer-tenured employees. They have data of longer-tenured employees which you can use to build a set of desirable features and build your model.

The assumption: the longer-tenured employees are the best employees to hire, which is a flowed assumption. As there is a good change that those employees are there because they are not able to fine other job

So, make sure that your assumptions are robust and correct.

3. Narrow farming: Implementing solutions from past experiences to address current problems without thorough verification can lead to narrow farming. This occurs when solutions are applied without assessing their applicability to the current context.

4- Lack of alignment with goals: Problem definitions may not align with overarching business objectives, resulting in solutions that do not contribute meaningfully to organizational success. It is good to get the business sign-off before implementing.

5- Sunk Cost Bias: Continuing to invest resources (time, money, etc.) in a failing solution due to previous investments, rather than objectively assessing the viability of alternative options.

6- Anchoring Bias: Giving disproportionate weight to the first piece of information encountered (the “anchor”), which can influence subsequent judgments and decision-making related to the problem.

7- Assumption bias: Assumptions about the problem may go unchallenged, leading to solutions that do not address the root cause or fail to meet the needs of stakeholders.

Best practices to reduce biases in problem definition and decision-making:

Use a process: Implement a structured problem-solving process that guides decision-making and encourages systematic analysis of the problem from multiple angles.Start with a clean slate: Approach each problem with an open mind, free from preconceived notions or biases. Avoid making assumptions based on past experiences or personal biases.Challenge the Status Quo: Question existing assumptions, norms, and practices. Encourage critical thinking and innovation by challenging the status quo and exploring alternative perspectives.Seek Multiple perspectives: Gather input from diverse stakeholders with varying backgrounds, expertise, and viewpoints. Consider how different perspectives can enrich problem definition and decision-making.Search for more information and data: Gather comprehensive data and information relevant to the problem at hand. Avoid relying solely on anecdotal evidence or limited sources of information.Play Devil’s advocate: Encourage team members to challenge each other’s assumptions and proposed solutions. Adopting the role of Devil’s advocate can help uncover potential biases and weaknesses in problem definition.Reflect on your own views & values: Take time to reflect on your own beliefs, values, and biases that may influence problem definition and decision-making. Consider how your personal perspectives may impact your analysis.


In conclusion, the significance of problem definition in data science projects cannot be overstated. It serves as the cornerstone upon which successful initiatives are built, guiding efforts toward meaningful insights and actionable solutions. By investing time and effort in crafting a precise problem statement, organizations can mitigate risks, maximize resources, and unlock the transformative potential of data-driven decision-making. In this tutorial, we explored a process for effective problem statements and provided practical guidelines for their formulation.

Crafting Effective Problem Statements for Data Science Projects was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.