Uploading Metadata: A Case Study in Adjusting Undocumented Behavior on Legacy Code Bases

Reading Time: 4 minutes

I sometimes live stream my development work on the NASA Landsat Data Processing Pipeline. It’s an asychronous task management app written in Python.

NASA Vehicle Assembly Building with Launch Pad 39A in the Background.
NASA Vehicle Assembly Building with Launch Pad 39A in the Background. Photo by @mgde_visuals.

The pipeline’s tasks include:

  • collecting and unpacking massive data collected by Earth-orbiting satellites about the radiation reflected from Earth’s surface at various wavelengths
  • turning that data into image representations that the human eye can interpret
  • manipulating those images, say, to emphasize a particular wavelength, or divide the image into tiles for easier analysis
  • uploading the resulting images to the Zooniverse, a platform for people-powered scientific research.

Sound cool? It is. If you’d like to learn more about it and/or enjoy the live streams where I write code for it, I recommend taking a look at this technical briefing I made for you. I explain what the app does and why it matters, then I guide you through the code…all while sipping coffee, bikini-clad, in front of a gorgeous Space Coast sunrise 😉.

The technical briefing done, it was time to get to work.

What you see below are the recordings of about two hours of streams. In these streams, I’m working on including additional data with the images uploaded to the Zooniverse—data that records the latitude and longitude coordinates where the image was taken, the size of the image, and other information.

You get to see me make design decisions about how our app should work for Earth science research teams that need metadata uploaded versus teams that don’t.

You also get to see me spelunk through a fairly complex app to identify:

  • What’s already working
  • Where functionality is missing
  • Where new functionality should live

In the past I’ve done detailed show notes for these streams, but nowadays I’m streaming enough that I don’t have time to do that. Instead, here’s how this will go:

  1. Eventually if enough people support the Patreon I’ll hire someone to help create show notes so you get them again 😉
  2. In the meantime, I will pull out enticing quotes from the streams to reeeel you into watching them. That’s right—just like you’re a fish.

6:23 – This is where I explain the decisions we (well, I, with you as an audience) have to make about how to make this app work for multiple potential use cases.

59:20 – “GAH! LOOK AT THIS!”

1:04:00 – “So we may need to parse a CSV which…I’m not super pumped about…but whatever”

1:17:30 – “So we open, blah blah blah, blah blah blah, then we do this dict reader AND THEN if we were to print row it would look like…this, supposedly.”

1:26:50 – “I want to make reads fast, because we only have to do [the insertion] once, but we have to read constantly.”

Here’s some additional footage after we took a break:

2:30 – “And then I don’t ever set any config. HelLO?…No…NO…..THERE we go!”

8:08 – Go here if you want to hear me sing a Backstreet Boys song for some reason

10:55 – Oh, this  was probably the reason.

We left off on this stream asking the research team for the wgs row of a location with a visible coast, which I ended up solving by using the Landsat Acquisition Map to locate a tile that likely included coastline and get the WRS (World Reference System) row and path for that tile.

While I finished this feature up off-stream, in future streams we’ll look at details of UI design, deployment, and maybe even some researcher-facing features!

If you liked this piece, you might also like:

The debugging posts (a toolkit to help you respond to problems in software)

The Listening Series (Prepare to question much of what you know about how to be good at your job.)

Skills for working on distributed teams (including communication skills that will make your job easier)

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.