Applied Data Science Case Study, Part 2: Assessing the Data

Reading Time: 6 minutes

That depends on what we’re building. Demographic data poses the highest risk where we are deciding either:
  1. how people are portrayed (in an ad, article, white paper, or educational material, for example), or
  2. whether people access to a resource (like loans, insurance, or safety, for example).

We’re marketing a cardiology-related course. That’s a very specific product predicated on cardiologist customers. So maybe we don’t have to contend with the bias baked into who becomes a cardiologist in the context of building this model.

Suppose that, instead, we were marketing home loans aimed at luxury home buyers because we knew cardiologists make a lot of money. In this case we’d need to account for the bias because it’s ethically dodgy to offer different access to resources based on a variable that we know for a fact correlates with race. Home loans in particular also happen to be legally protected precisely because of historical and contemporary racist loaning practices.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.