One-Page Notes: Weapons of Math Destruction, by Cathy O’Neil

Folks ask me about the dangers of trusting computer-generated algorithms and artificial intelligence. The conversation usually brings up a future scenario in which the machines outsmart humans.

But there’s a more current problem: we trust machines to build algorithms based on incomplete or biased data that we feed them, and they perpetuate poor and unfounded decisions under the guise of ‘scientificness’ because a computer made the decision. Continue reading “One-Page Notes: Weapons of Math Destruction, by Cathy O’Neil”

Diagramming Data, Part 3: Preventing and Curing Data Deficiencies

In part 1 of this series, we discussed the relationship between data and code complexity. In part 2, we talked about some of the deficiencies datasets might have and how they happen.

Now we’ll talk about some starting points for building healthy datasets—and nursing deficient datasets to health as much as possible. It’s important to note that these starting points apply chiefly to datasets obtained via human data entry—generally via a form.

Continue reading “Diagramming Data, Part 3: Preventing and Curing Data Deficiencies”

Diagramming Data, Part 2: Reversible and Irreversible Data Issues

In Part 1 of Diagramming Data, I talked about the relationship between code simplicity and the underlying data. I touched on some of the issues that organizations face with their data. Now, we’ll categorize those issues according to how easily they can be fixed.

We can take a sampling of data-related issues and place them on a continuum from easily reversible to irreversible. Some data issues can be fixed with relative ease. Others cannot be fixed without re-collecting all the data. There are also cases in the middle where we can fix issues with the data by writing complex code.

Continue reading “Diagramming Data, Part 2: Reversible and Irreversible Data Issues”