Data analytics generalist. I publish notes, lessons, and tools for data analytics and investing.
Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position.
As Uncle Ben told Peter Parker (aka Spiderman), “With great power, comes great responsibility”. The proverb echoed by Uncle Ben perfectly sums up an unspoken reality for data professionals of all levels and types. You have to wonder if Peter Parker’s real superpower was data expertise. Unlike Spiderman, our enemies are not quite as obvious as a flying green monster. As a data professional, we must remain vigilant on topics such as data privacy, algorithmic biases, and presenting information objectively.
Data Ethics in the Government
My first encounter with sensitive data came at the U.S. Census Bureau back in 2016. My team was responsible for compiling and disseminating the U.S International Trade in Goods and Services report each month. The reports show how much the U.S. imports and exports various commodities with other countries. To the average person, this might not impact their lives, but to an investor, this information is incredibly valuable.
Being an ambitious employee, I wanted to add a little pizzazz to their webpage. My plan was to display a fancy, Tableau chart (yes, they were fancy back then) relating to the Trans-Pacific-Partnership. This would be the equivalent of a news agency reporting the relevant facts for any major economic event. Sadly, I was shut down. I was told that the Census could not appear biased on the new free trade agreement. At the time, I did not quite understand. However, looking back on it, I can fully appreciate the sensitivity. The Census controls incredibly valuable information that could have wide implications on the economy and its people. In order to be effective, it must remain non-partisan. Otherwise, the numbers will become politicized and then the truth becomes questionable.
When a measure becomes a target, it ceases to be a good measure
– Goodhart’s Law
I see the above statement quoted often, yet KPIs remain incredibly common in organizations. One of my previous digital transformation projects required my department to adopt a new CRM (Contact Relationship Management) software. With this new system, leadership requested KPIs to measure participation in the tool. Anyone who has installed a new system knows the challenges of culture change and adoption. The software and the process must go hand-in-hand to be successful. Therefore, we needed the best method for measuring and incentivizing user activity in the CRM.
In our system, users were expected to enter and update potential public policies that would impact the organization. We had users responsible for different regions around the globe. Some regions, such as Europe, had more policy activity than other regions. Some regions had more users to help keep the records up to date. Each region could vary in its importance from a financial perspective. In our CRM, you could measure logins, views, edits, added records, deleted records, and more. Each metric had an inherent bias in the calculation. To simplify things, we will assume that we can only calculate metrics at the region level and this will be on a biweekly basis. Let’s take a look at some of the options and their implications.
KPI | Bias Explanation
KPI #1: Added + Deleted Records by Region
Bias #1: Encourages users to create records that have little legitimacy
KPI #2: Total Edits by Region
Bias #2: Favors regions with more employees
KPI #3: Unique Count of Records Edited by Region
Bias #3: Favors regions with more policy activity
When designing the appropriate KPIs for this new system, there were biases, assumptions, and incentives at play no matter which metric we chose. While mindlessly scrolling through Twitter, I recently came upon a quote that perfectly sums up the above process.
The very act of turning something into a number is an assumption.
Integrity is a Must
A few months back, I was working with a colleague who needed some assistance with the analysis and presentation of information that would be available to the public. As soon as you hear the words, “public data”, any data professional’s mind will immediately gravitate towards data security. Fortunately, this was not an issue.
My colleague proceeded to explain what data we had (i.e. very little) and the purpose of the presentation. After some exploration, I realized that we could not provide any summary statistics at the requested level of detail. We could only provide an estimate of the overall total. This was insufficient for their project. There was pressure to “make some magic happen”; especially, if I wanted to impress a few senior level colleagues. The short term would yield a reputational boost for myself, but over the long term, it risks significant reputational damage for the organization (and myself).
As data is becoming seamlessly woven into every process, there comes ethical risks that aren’t talked about enough. When data professionals start implementing black-box algorithms into your decision-making processes, it will be too late. Organizations need to instill a culture of ethical , data-driven decision making from the top.
As a data professional, you will frequently find yourself at the center of difficult decisions, especially, if you work with colleagues who struggle with data and numbers. Your job is to bridge the gap between their subject matter expertise and the appropriate analysis or presentation of the information. In that gap, lies an opportunistic, invisible enemy who wants you to take the shortcut. Follow in Spiderman’s footsteps and proceed with integrity.
Previously published at https://thedatageneralist.com/the-arbiters-of-truth/