Suppose we’re writing an app called Patient Care Dashboard to help hospitals visualize their data. The hospitals store their data in CSVs.*
*Let’s say that legal limitations prevent our app from storing any patient data in databases, so all data has to come in via these CSVs.
On the homepage of the app, providers can choose from several types of visualizations: one to view patient infection rates by department, one to view a schedule of friend and family visits, one to view a pie chart of patient post-surgery food requests, et cetera.
Once providers choose a visualization type, they get a modal where they can upload their CSV file, click a button, and see their gorgeous visualization.
We want to store a record of each of your visualization types in the app’s database so our homepage can fetch the list of their names and descriptions to display all the options on the homepage. We also want to keep a record of each time a user runs a visualization task was run along with the visualization type.
But each different visualization type requires different code to run on the CSV. First of all, each CSV has different data, so each visualization type has its own list of CSV column names to fetch. Second of all, each CSV has different chart code: some use calendars, others use pie charts, and so forth.
How can we store a representation of each visualization type in the database while running different code for each one?
We could make a model with a giant conditional.
Our Visualization model could contain a method like you see here on line 24:
That we then call in the view like we do here on line 22:
As we add new visualizations, this giant conditional becomes even longer. Maybe that’s not so bad. But suppose we want to add complexity to the visualizations? Maybe we want them to also include custom error messages, shell out to third party services, or track changes in the data from visualization run to visualization run.
Even if multiple visualizations are supposed to have the same new functionality, they either each need their own copy of the implementation in their conditional block, or we DRY things up by writing blocks of code before the conditional—or worse, before only certain parts of the conditional—until we have several branches of code to peruse before we can make any changes.
This approach would be quite testable, but the test file would be enormous because each conditional block’s tests would have to live together in the test class. Alternatively, we would have to make several test files for the same class, which can lead to confusion as well.
What if, instead, we made each type of visualization responsible for its own behavior?
In Django, several different classes can inherit from the same Model class with called Proxy Models. Our regular old model might look something like this:
and we would subclass it for each particular type of visualization like so:
In this version, each subclass of Visualization could have its own test file and its own circumscribed area for its unique functionality. Meanwhile, if we want to add behavior that should apply to all instances, we can add that behavior to the original
Visualization superclass—or, if we prefer, to an intermediary subclass that inherits from Visualization, while all the final visualizations inherit from the intermediary subclass.
A note on this: if you do it that way, every subclassed model down the tree must possess the in-class annotation `class Meta: proxy = True`.
That is, the final subclasses won’t work properly without this, even if the intermediary class from which they inherit has this annotation. I’m not sure why. If you know, I’d love to hear.
The Drawback: Dependency Injection
So how does Django know which of the subclasses to use in the
One option is to inject it with a dictionary (lines 9-13) in
Alternatively, we could do it with a method. Let’s define that method (lines 9-15) and call it (line 29). Remember that a leading underscore on a method name in python denotes that this method is intended to be private to this file.
The advantage to this over a dictionary is that Django will only instantiate each model instance if that branch of the conditional gets executed, rather than eager-evaluating and instantiating them all. This approach can reduce your memory usage, especially if you have many bespoke implementations or very complicated bespoke implementations.
I think this approach looks clunky, and we also have to update it every time we add a visualization. I’m not a huge fan of this, and I’ve cast around for an alternative to this for a while now. It should be noted that one could do some fancy meta-programming à la Rubyism. Let’s add some code to do that (lines 9-22) and then change line 36 to see how that might look:
Code credit here for the snake case conversion and to the below linked article for the constantize method.
This approach is specifically not pythonic and, more generally, not recommended for a number of performance , stability, and security reasons.
That said, if I am adding developers to my Patient Care Dashboard team, I would rather include documentation and context around “you have to add a line to this method right here” than ask them to wade through a giant conditional inside my
What do you think?
Django’s Proxy Models generally get deployed when a Model has several categories that each require a separate implementation. This example takes that to the extreme and imagines a case for exactly one model instance in the database for each of our implementations. Does this seem like a reasonable application of the Proxy Model approach in Python to you? How would you solve the dependency injection issue? What sort of tradeoffs do you see in the different approaches we have discussed? Did you learn anything new for a project you’re working on?
If you found this example interesting, you might also like:
Streamlining your Automated Test Strategy with Risk Maps (RubyTapas listeners, this post was the initial rough draft for the screencast you saw—enjoy my sparkly hand drawn diagrams in this version!)