A Technical Tour of the NASA Landsat Data Processing Pipeline

Reading Time: 7 minutes

In the two-part series called The Case of the Failing UploadI set out to turn software development into a murder mystery.

I failed—the similes were getting too absurd 😂. But I still got to walk you through a debugging exercise on a pretty cool piece of software: a pipeline to help researchers access NASA Landsat data, especially for community science initiatives.

I’d love to live stream my continuing development work on this so you can get involved, level up as a programmer, and even learn a thing or two about the data available from NASA! To facilitate that, I made a video of myself touring you through the code. You can think of this as your technical briefing: it will prepare you to fully engage in future live streams where I actively work on the code.

Before we begin: who will use this code, and why?

The piece of software we’ll see today has a foundation client: an individual customer with whom we develop a new technology that we expect to be useful for other customers later. In this case, the foundation client is the Floating Forests research team.

Some background:

The Floating Forests Project organizes images of Earth’s surface taken by NASA Landsat satellites. Volunteers help spot trends in kelp growth along our coastlines—which tells us about climate change, biome changes, and more.

Screen Shot 2020-03-07 at 9.47.11 AM

Does that seem like a cool project? It is. We’d love to facilitate more Earth science projects like this one. The problem: getting the images ready is a lot of work. Researchers fetch the Landsat data, pluck specific channels of that data (say, red light, or blue light), from the available options, filter the data for minimal cloud coverage and maximum coastline, resize the resulting images, and add them to the project online to be kelp-searched.

We hope for the pipeline to make this process easy. Researchers should be able to choose which satellites and which locations they want to get data from. They should be able to mix and match the things they need done to the image, like resizing, choosing channels, or filtering for cloud cover, ocean, or land. Finally, they should be able to upload their newly readied data to community science projects.

Okay, time for the technical briefing.

Here’s a high-level diagram of how this app will work:


At the beginning, the application requests data from the NASA Landsat service, called ESPA. At the end, the application makes it easy to put the processed data onto a community science platform called The Zooniverse.

In between, researchers can insert a series of tasks to perform in sequence. They get to decide which tasks to perform, and in what order. This characteristic of the application—its modularity—is why it can be useful to lots of different research teams. Each team can customize their processing pipelines to the exact specifications of their own end products.

Now, let’s get our eyeballs on that code.

I recorded this screencast yesterday morning in Cocoa Beach, FL before heading to Titusville to watch some science experiments launch to the space station (which is, by the way, how we get all this cool data to work with).

If I am going to invite you into my software home, I should be a good host and give you a tour 😊. That’s what I do here: I describe the primary request that the Floating Forests team (or another research team) would make in this application, and I walk you through the high-level control flow of the application from “NASA service, please give us some data” to “let’s put our finished products on a platform where the world can help us analyze them.”

Outline: What we cover in this live stream

0:55 – I describe, at a high level, what this app does. We also talk about the data that we get from NASA Landsat satellites and why referring to it as “images” is inaccurate, even though that’s often how we see this data.

2:40 – I take you to see where I have the app running. We discuss the framework that the application runs on (django, and specificall django-rest). Here I also introduce the primary request that a research team would make: an ​ImageryRequest (I know, we should change the name to something more accurate).

4:42 – I describe a pipeline. This is the process that a research team builds and customizes for their specific project. I have put a lot of work into ensuring the modularity of the pipeline stages, and there’s even more work to come.

7:22 – I make an ImageryRequest for you to see. We talk about some of the custom controls that researchers can use when making the request.

10:28 – I switch over to the code itself to show you what happens when we make a request. We look at the ViewSet that accepts the incoming request and the libraries that enable our work. We notice that the bulk of the work happens in a class method that runs on our ImageryRequest model after saving it.

14:46I show you the tasks that run in succession when we make the request. These are the things that researchers get to put into a custom sequence.

18:19 – We have not built a user interface for researchers yet. So in the meantime, I use a built-in piece of functionality from django to provide a nice experience for researchers to build and update their pipelines without having to curl the API or make raw API client requests.

20:10 – I show you the tasks running in a worker thread. We talk about how long these requests can take and why. We discuss how important it is to have ways to test and simulate the requests with a shorter feedback loop for efficient development.

22:35I take you to the Zooniverse community science platform to see the products of our data processing pipeline. Then I show you the production version of Floating Forests to see how volunteer researchers analyze the images our pipeline creates.

25:20 – Why do we need this pipeline if researchers already have scripts? We talk a little more about the value of having a flexible, maintainable data model rather than a script.

26:57Where are we on the image processing pipeline so far? I provide a brief “state of the application” and let you know what to expect in upcoming live streams on its development.

I can’t wait to start sharing some of this development work with you. It’s an exciting project with exciting implications for science, and it’s also an excellent opportunity to introduce lots of interesting and valuable software engineering concepts!

If you liked this piece, you might also like:

This post on my entree into live coding (plus, how to get a transcript of a YouTube video)

This series about Structure and Interpretation of Computer Programs—in which I share what I learned in a week-long course on the classic book

The listening series—Unrelated to live coding, but hopefully useful 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.