Back to Top

Blog Search Click Analysis

Data Quality Enthusiasts:

Recently, while reviewing the click counts of visitors on the Conformed Dimensions of Data Quality (CDDQ) website’s blog search, I found some interesting stats that I thought I’d share with you. Below is a chart with a list of all the dimensions (and underlying concepts) within the CDDQ and the respondent’s self-identified use of them on the 2019 Annual Dimensions of Data Quality survey. Then I brought in the percentage of clicks on the same dimensions and underlying concepts as listed on the CDDQ website’s blog search.

The blog search, if you haven’t used it, lists each of the Underlying Concepts and then when a user clicks on the name of the concept it searches the blogs to find a list of all blogs written on the topic (see screenshot below).

This directs the users to a list of blogs (like the following):

Now you're eager to see the stats, right? Here are my high-level observations:

  • I sorted the set by the 2019 survey data, in descending order, to see which dimension was the most used by organizations and I expected to see that a similar proportion of users click on the associated definitions, but this wasn't always the case. Compare the top 4 or 5 to the 3rd one (Record Population)...For some reason many people click to read blogs on Record Population.
  • Over-all, perhaps due to bot clicks on every link on the page, it seems that the blog search clicks are high across the board, but drop off for the survey data. In other words- organizations don't use some of the concepts listed at the bottom, but the click rate on those topics for blogs is still relatively high. Maybe they are searching for documentation on how to measure those dimensions.
  • "Accuracy- Agree with the Real-world" is zero because of a programming issue on the site which I have now fixed, but all of the data was not collected over time so we'll never truly know where people placed that (in other words it's an Existence DQ issue). Guess we'll start a new benchmark going forward in order to start measuring that area.
  • It seems safe to generally assume that the larger the discrepancy between the two levels means that people either:
    • already understand the associated topic (underlying concept) and use it, so they don't click for blogs on that topic
    • OR
    • don't use the underlying concept but want to understand what it is and click

How about you? What are your thoughts? Email me and I'll update this blog with a post script!