Addressing bad data for AI success in marketing analytics

AI drives business insights especially in marketing. But here’s the catch: AI is only as good as the data it works with. If that data is flawed, whether due to inaccuracies, outdated information, or bias, the AI will give you poor predictions and decisions. Bad data can amplify issues and lead to misguided strategies.

For AI to be truly effective, data needs to be clean, validated, and structured correctly. This is where companies should focus their resources. Data cleaning involves finding and fixing errors, while validation makes sure everything is accurate and up to the necessary standards. Governance is just as important, setting clear rules for how data is collected, stored, and used to keep things consistent. If you’re not doing this groundwork, your AI isn’t going to help you. It will make decisions based on bad data, which leads to poor results.

Analysts are the key here. They need to know the context of the business and how to optimize data so that AI can produce the best insights possible. It’s about making sure that data aligns with business goals and is truly actionable. It’s a foundational step, and one that shouldn’t be overlooked.

Identifying corroborating data

When data looks unreliable, it’s tempting to discard it, but that’s often a mistake. Instead, consider corroborating it with other data sources. This is where the power of cross-checking comes in. If your dataset seems off, find a secondary source that can confirm or challenge what you’re seeing.

Take, for example, a retailer struggling with inaccurate inventory data. Their stock levels seemed off, but when you dug into their point-of-sale (POS) data, things started to make sense. The sales numbers were showing fast-moving products that had zero sales, an obvious indicator that the inventory data didn’t match reality. The problem wasn’t just that their system was wrong; the sales data was telling them a different story, one that helped identify stock issues and correct inventory practices. This process of validating data with a second source allows you to find actionable insights faster, even when the primary data source isn’t perfect.

Sometimes it’s just about making sure you’re using the right information at the right time. The beauty of corroborating data is that it can reveal blind spots, which may lead to smarter decisions and better outcomes, like making sure you have enough stock on hand to meet demand.

Bad reputations of datasets are often due to noisy outliers

Data often gets a bad reputation because of noisy outliers, those anomalies that seem way off from the rest of the data. These outliers are easy to focus on because they stick out, but they don’t always represent the bigger picture. In many cases, these noisy data points are small errors in a sea of otherwise accurate information. If you let them distract you too much, you could miss out on the valuable insights the data provides.

Take an insurer’s dataset of household policies as an example. There were numerous errors, incorrect addresses, policies that were wrongly grouped, or misassigned by different agents. Those errors caused the data to look unreliable at first glance. But once you focused on fixing those specific issues, the dataset became much more useful. The majority of the data was accurate and valuable; it just needed a bit of cleaning to get rid of the noise.

“The lesson here is simple: don’t let outliers trick you into thinking the whole dataset is bad. Target those specific anomalies, fix them, and keep the valuable data that’s hidden underneath. “

Understanding the difference between zero and null values

When you’re dealing with data, not all missing values are created equal. There’s a big difference between a value that’s truly missing and a value that’s recorded as zero. Understanding that distinction is needed when making the right decisions.

A “zero” value typically means there was no activity in that field, like no sales for a particular product in a given time period. It’s intentional and tells you something valuable. On the other hand, “null” means there’s no data at all. It could be because the data point wasn’t collected, it’s been forgotten, or there’s simply no relevant information. Knowing the difference between these two can help you determine the best way to handle the data.

In many cases, missing data (null) isn’t an insurmountable problem. If you know why it’s missing, you can sometimes estimate it by using related data (a process called imputation). This allows you to keep going with your analysis. Alternatively, if a value is zero, you know there’s no activity, so your analysis can proceed without needing to fill in that gap. Getting this distinction right makes sure you’re not misinterpreting data, which could lead to wrong conclusions.

In short, don’t treat zero and null the same way. Treating them differently can help you manage missing data more effectively and keep your insights on track.

Random errors in datasets can still provide useful insights

Not all data errors are created equal, and sometimes, random errors can actually work to your advantage. These are errors that don’t follow any clear pattern, essentially random noise. While you can’t fix them all, if the errors are random, they often cancel each other out when you look at the data in aggregate. This can still allow you to derive meaningful insights.

Consider the case of two brands merging their web traffic data. Both brands had their own analytics platforms, each with different ways of measuring traffic. The result? Slight differences in data that were random and not necessarily indicative of a bigger issue. In this case, assuming the errors were random allowed the team to continue analyzing their segment-level data without getting bogged down by imperfections. They didn’t need to fix every small error. Instead, they focused on what mattered: the trends at the segment level, which saved the company millions.

So, while perfect data is always the goal, random errors don’t have to be a dealbreaker. If you assume these errors cancel out, you can continue to make smart decisions, even when the data isn’t flawless. It’s all about focusing on the signal, not the noise.

Interim strategies for flawed datasets

One of the biggest mistakes companies make is waiting for perfect data before acting. The reality is that data isn’t always going to be flawless, and waiting for it to be perfect can slow you down. But there’s good news: you don’t need perfect data to make smart decisions. In fact, many businesses make great strides by using interim strategies to work with the data they have.

Corroborating data, cleaning up noisy outliers, and understanding the difference between zero and null values are all practical strategies that allow you to keep moving forward. These strategies help you extract value from the imperfect data available while you continue to improve the quality over time. The key here is not to get stuck in the weeds of data perfectionism. Instead, focus on extracting meaningful insights from what you have right now.

This is especially important in fast-moving fields like digital marketing, where new data is constantly being generated. You can’t afford to wait until every data point is perfect. When taking advantage of interim strategies, you can start making informed decisions right away, while still improving the quality of your data over time. An agile approach allows businesses to stay competitive and make progress without being paralyzed by imperfection.

Key takeaways

  • Data quality is key for AI effectiveness: Poor data leads to inaccurate predictions and flawed insights. Prioritize investment in data cleaning, validation, and governance to ensure your AI systems provide actionable, accurate results. Decision-makers must treat data optimization as foundational for AI success.

  • Use corroborating data to validate insights: Don’t discard unreliable data immediately. Use secondary sources to cross-check and confirm insights, allowing you to derive accurate conclusions even from imperfect datasets. This practice leads to more informed decision-making, especially when primary data is questionable.

  • Address noisy outliers to improve data reliability: Isolate and fix noisy outliers that distort dataset reliability. Many datasets contain errors that are isolated to a few data points. In focusing on correcting these, you can improve the overall accuracy of your data without throwing away valuable information.

  • Act quickly with interim strategies for imperfect data: Waiting for perfect data can slow decision-making. Use interim strategies, such as managing zeros versus null values or leveraging random errors, to extract insights from available data. This approach helps maintain momentum and ensures smarter, faster decisions even when data isn’t flawless.

Alexander Procter

January 27, 2025

7 Min