I come from a straight software engineering background, but I’m taking my career in a direction that incorporates machine learning. To that end, for the past year and a half I have toyed with several popular libraries for implementing machine learning solutions. One of those is OpenCV, a C++ and Python library for computer vision.
OpenCV Resources I Recommend
Ardit Sulce’s Python Mega Course on Udemy inspired me to look into this library, and his minimal example inspired the example in this post. If you’re more interested in the options available to you as Python developer, I highly recommend Ardit’s course. He offers an editorialized collection that, in my view, is worth a look. That course also recommends a graphing library called Bokeh as an alternative to the ever-popular matplotlib, which I’ll be using in a regression visualization that I’ll share with you in a future post :).
Back to OpenCV. If you’re interested in further context on OpenCV specifically, Satya Mallick’s LearnOpenCV course provides the perfect deep dive—including an enlightening explanation of convolution kernels alongside the most comprehensive OpenCV documentation I’ve ever seen (yes, it’s in both languages).
OpenCV: My Review of the Library
I agree with Dr. Mallick’s sentiments about OpenCV: the library has surprising range and power. The reason for the surprise, though, is that the API is remarkably undiscoverable.
First of all, the API is inconsistent in its architecture. For example, many of the frame manipulation methods are static methods that take a frame instance argument instead of living as instance methods on the Frame class. This makes it hard to grasp the raison d’etre of each class in the library. For another example, numerous methods (like the read() instance method on VideoCapture) return two values, the first of which is a check value that developers are unlikely to need or use. So its primacy in the return value is confusing.
Second of all, the API is inconsistent in its naming. I’m not a fan of extraneous abbreviation because it makes names harder to guess and understand: if the point of the method is to convert the color of an image, I’m going to look for convertColor(). Why is it cvtColor()? Developers have to guess how things will be abbreviated. To add another layer of inconsistency, not everything is abbreviated. The GaussianBlur() method, for example, is not abbreviated. But that method has its own inconsistency—the method name is capitalized. This is inconsistent with cvtColor(). Additionally, I understand that capitalized method names are common in C and its immediate descendant languages, but this is not the case for Python. In Python (as well as Ruby, Java, Kotlin, and Swift), a capitalized camel case thing typically denotes a class—a small camel case thing denotes a method. So the capitalized method names in OpenCV’s Python API add another element of “wait…what?” to the developer experience.
Third of all, the API incorporates a number of historically-inspired peculiarities, perhaps the most widely-acknowledged of which is the use of BGR format for color channels instead of RGB. There isn’t even an RGB option. I’m all about nostalgic nods, but not at the cost of discoverability.
I don’t hate this library. It can do an impressive array of things with just a few lines of code. I wish the lines were easier to write because I can see this library enjoying much wider adoption and use with a more intuitive API.
But anyway, onto the thing I suspect you’re here for…
OpenCV: A Short Example
OK. Here we have just under 40 lines of code demonstrating a motion detector written with OpenCV.
import cv2 first_frame = None video = cv2.VideoCapture(0) while True: _, frame = video.read() frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) frame = cv2.GaussianBlur(frame, (21,21), 0) if first_frame is None: first_frame = frame continue delta_frame = cv2.absdiff(first_frame, frame) threshold_frame = cv2.threshold(delta_frame, 90, 255, cv2.THRESH_BINARY) threshold_frame = cv2.dilate(threshold_frame, None, iterations=2) contours, _ = cv2.findContours(threshold_frame.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) for contour in contours: if cv2.contourArea(contour) < 850: continue (x, y, w, h) = cv2.boundingRect(contour) cv2.rectangle(frame, (x, y), (x+w, y+h), (0,255,0), 3) cv2.imshow("Bounding Rectangles", frame) key = cv2.waitKey(1) print(frame) if key == ord('q'): break print(a) video.release() cv2.destroyAllWindows()
Let’s go over what’s happening here. First, we capture a video (line 5). Then, we have an infinite loop (line 7) that breaks when a user presses ‘q’, as seen on (lines 33 and 34).
Inside the while loop, we read in the video (8), convert it to black and white (9), and add a blur (10) to reduce sensitivity on the motion detection (or else small changes in lighting, etc would trip the detector constantly). We set up a frame and save it off, so we always have a copy of the frame immediately previous to the one in the loop right now (12-14). We compare the current frame to the immediately previous frame (16, 17), and we look for differences larger than a given threshold (lines 19, 20). We capture those areas of difference and draw a rectangle around each one (20-28). When we run this, we get green rectangles around movement areas on the video (plus a few false positive areas, depending on lighting).
This example demonstrates a limited amount of what OpenCV can do. In fact, chances are you can use OpenCV to implement your favorite Instagram or Snapchat filter! Professional applications of this library include changing a model’s lipstick color, eyeglass frames, or hairstyle based on a picture or video. These virtual fittings have an important role to play in the future of bespoke products—and that’s just one example!
OpenCV is a powerful library with a lot of flexibility. Its API is unintuitive, though, which works against it for growing adoption and building a developer community. Luckily, there are some excellent resources available to help you get started, though they come chiefly from third parties and not the OpenCV documentation itself. Also, you won’t have to struggle through too many lines of code, because it often just takes a few lines to do what you’d like to do. You can start with a short example like the one above and progress to some fun photo or video filters. After that, there are a wide array of potential applications for computer vision technology. Try out something you’ve read about or come up with your own thing!