In a significant initiative, a team at Avito has developed a robust HealthScore system aimed at improving data quality across more than 6,700 data warehouses. Dmitry Melezhikov, who leads Business Intelligence in the Marketing domain, emphasizes the importance of this metric for reliable AI applications. The HealthScore serves as a composite measure of data quality, ensuring that only trusted data sources are utilized to mitigate the risk of errors when deploying AI Copilot technology.
As the volume of data warehouses grew, traditional methods of managing data quality became ineffective. With numerous data sources scattered across platforms like Vertica and Trino, the team faced challenges in quickly identifying usable data and understanding ownership amidst the chaos. For instance, when tasked with analyzing advertising click-through rates, analysts often struggled to find reliable data due to poorly named tables and a lack of documentation.
To address these challenges, the Avito team recognized three core issues: blurred accountability, resource scarcity, and lack of trust in data. Previous attempts to assign BI partners to oversee business domains yielded incomplete results, as not every data source had a designated owner, leading to resource wastage without assurance of data accuracy.
This prompted the creation of a unified HealthScore system that ranks data quality and importance, ensuring that analytics resources are allocated efficiently. The HealthScore is calculated daily, incorporating factors such as data quality, performance, and governance. This dual focus allows the team to prioritize fixing critical issues while avoiding unnecessary efforts on less significant tables.
The HealthScore comprises several key components, including data quality assurance, performance metrics, and governance criteria, all designed to ensure the integrity and efficiency of data usage. By automating the lifecycle management of data warehouses through this metric, Avito aims to transition from mere monitoring to proactive management of data health.
This strategic enhancement of data quality not only streamlines analytics processes but also sets a new standard for competitors in the market. As companies increasingly rely on AI-driven insights, having a comprehensive health assessment of data sources will become crucial for maintaining a competitive edge.
Informational material. 18+.