Over Thanksgiving break, I once again had the opportunity to pair with the lovely Coraline Ada Ehmke. This time, instead of working on an existing code base, we started from scratch on something I had spiked out: a CSV parser! The original version worked, but the design contained some duplicate code and included no tests.
I had spent so much time on the original version that I could not imagine letting go of it: I intended to go back and add tests after the fact. The problem with this approach is that it gives the test suite no opportunity to help figure out which code isn’t needed: we write tests, one by one, before we implement them to ensure that we aren’t adding extra code, and also to keep the code’s design flexible and modular: that is, easily testable (well-written code should have a passing test suite regardless of the order in which the tests are run). For that reason, Coraline recommended that we start over from scratch, suggesting that, if there were any really important implementation details that I had learned while spiking, I would remember them.
“Spiking” is when a developer hacks on a solution quickly and without much regard to the code’s structure, just to figure out how to solve a problem. In this case, the “spiking” had encompassed an entire project. Do not do this. Instead, if you’re solving a problem for the first time, I’d recommend breaking the problem down into little pieces: “spike” a small piece of the problem—not the whole thing. Then, once you get that little piece working, trash the code you wrote, write a test, and do the thing properly from scratch. You’ll end up saving time in the long run and creating production-ready code more quicky—and production-ready code is generally what companies are looking for.*
* There is one possible exception to spike-test-implement-refactor: a hackathon. Maybe. There is significant time pressure, after all, at a hackathon, and there isn’t really a “long run”: just a bunch of judges who will look at your entry in 8-48 hours. That said, if you end up ever wanting or needing to take your hackathon project further, you will probably have to start over from your spiked version as explained above.
Okay, so we started over. How did we start over?
We started by setting up the basic skeleton of a Ruby project.
RoR people: not a Rails directory. It is valuable, as a Rubyist, to practice building self-contained project directories from scratch, rather than just yanking down Rails or a gem skeleton. You’ll understand better what all the stuff in a directory is for if you’ve done it yourself a time or sixty.
This is what our skeleton looked like. We shall take a tour of it:
A clarification on the way Sublime lists file names: sometimes Sublime makes it look like a folder’s siblings are actually inside the folder. Above, the spec, lib, and csvs folder are all siblings of .rspec, csv_parser.rb, Gemfile, and Gemfile.lock.
Let’s start with csv_parser.rb:
In here we’ve required each of our four models, all of which are stored in the “lib” folder (we put all models in the lib folder to keep them organized. Naming it “lib” is a standard practice that makes it easy for other developers to know where they are).
We’ve required a thing called ‘pry’ for debugging purposes—you can ignore it if you like.
Then we define a module called CSVParser. We do this so that we can namespace our models to that module (I’ll show you namespacing later).
Next, let’s look at Gemfile.
This is where we put all of the gems (external code libraries) that we will use in this project. There’s pry (you can still ignore it). Then there’s rspec, a gem that we use to build our testing framework. Ruby has a testing framework built into it (minitest), but rspec makes it easier to clearly express the intent of each test.
In the .rspec file, we specify our preferences for the output of the rspec tests. The option –format documentation transforms the default output of . for a passing test, F for a failing test, and * for a pending test into the words PASSED, FAILED, and PENDING. The –color option turns the output red and green for failing and passing tests, respectively.
This file is inside the spec folder, where we put all of the files containing actual tests for the program. The spec helper requires rspec on behalf of all the test files as well as the csv_parser file, so all the test files can access both of those things.
The actual test files
Here’s an example of one of the test files from the project. This one contains tests for the model called “course”:
Notice that spec_helper is required at the top. This test file exemplifies the testing syntax you get from rspec, too.
A syntactic note, unrelated to the skeleton of a Ruby project but nonetheless a useful distinction that I learned from Coraline: we use let() above to describe a :course for each test. rspec also has before() and after() methods, which allow you to set up and take down state before and after each test. let() does both for you, so :course gets set up for each test, then taken down after it’s over. This is so that, if you change the state of :course during the test, that state change doesn’t leak into the next.
The lib folder
We place all of the models inside the lib folder. Here’s what the parser model looks like:
Notice two things about this file containing the Parser class:
1. The class Parser has the same name as the file, parser.rb
2. The class Parser is placed inside (that is, namespaced to) the CSVParser module. In Ruby, many classes can inherit from any one module, and one class can inherit from many modules. We place Parser inside our CSVParser module so that, when we reference it later, we can do so like this: CSVParser::Parser. Does that look annoying? Let me tell you what would be much more annoying: dealing with conflicts and confusion if one of our dependencies, like rspec or pry, also had a Parser class, and we hadn’t namespaced ours. It’s much less likely that a dependency will have a CSVParser::Parser than just a parser.
The work in progress that Coraline and I had made when we ended our pairing session looked very different from the original, “spiked” one: the files all had a designated place, rather than just floating around as siblings of all the other files in a single directory. The models sat together, and the code looked organized. The setup for the test harness removed a big psychological barrier to testing each piece of the implementation before writing the code. Most importantly, perhaps, the test coverage made it relatively straightforward to design for the behavior we wanted and refactor without worrying about breaking things.
Sadly, I cannot open-source the parser itself to show you all the code. However, here is a repository containing the basic skeleton of a Ruby project. Feel free to clone it down and use it as a template for yourself:
$ git clone https://github.com/chelseatroy/skeleton.git
You can just rename the directory to the name of your project, like so:
$ mv skeleton / YOUR_PROJECT_NAME
Alternatively, you can copy it and rename the copy, so you still have the skeleton for other projects:
$ cp -r skeleton YOUR_PROJECT_NAME