Machine Learning: Using Regression Analysis to Detect Biases in Compensation

Reading Time: 3 minutes

I made an app that fits a linear regression model to salary data, then reports on how heavily the model weights each of five pieces of employee information to predict salaries.

Bias Inspector Homepage
Bias Inspector Homepage

The idea for this app sprang from a discussion of the persuasive value of data for helping companies chart their course. Industry-wide data is usually too broad for holding an individual company accountable, so this tool gives individual companies a quick way to identify patterns in their compensation practices. However, the app uses no identifying information on companies or employees, so its users are accountable only to themselves. The hope is that this will encourage companies to use the tool who otherwise wouldn’t.

The tool has no minimum number of employees that it needs to form a model (though it helps to have at least 8 so an individual record is not wholly responsible for any given weight). We’re not going for actual prediction accuracy here, after all: we’re trying to get a sense of whether a company’s compensation practices follow a pattern, particularly with regard to demographic factors like gender and ethnicity. In a fair compensation system, the coefficients for both of these features should be very close to zero.

A company gives Bias Inspector a CSV that looks something like this:

Sample CSV
Sample CSV

It runs a regression model based on that information, and then it outputs a report on the patterns it found in the employee data:screen-shot-2016-09-23-at-9-07-39-pm

Employers have the opportunity to define roles (like engineer and product manager)  as well as years of experience, both of which might reasonably correlate with compensation.

Employers also put in age, which might co-correlate with years of experience and for that reason might also reasonably correlate with compensation.

Should ethnicity or gender play a greater role than any of those three factors, that may be an indication to the company that their review and compensation practices have overlooked underrepresented employees. This gives those companies the opportunity to hold themselves accountable and look for ways to recognize and reward those employees’ achievements more fairly.

Because companies will want anonymity to use this tool, I’m looking to have it reviewed by another set of eyes for potential holes before I deploy it. However, anyone can clone it to their own machine from right here and run it locally on their own data.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.