Machine Learning in Action in the Classroom and the Field
Darin Stephenson, Ph.D. | Professor of Mathematics
Teaching is foundational for professors at Hope. In Dr. Darin Stephenson’s case, he’s teaching not only students, but also computers. And they speak a different language. From the binary language of dune topography (0 = on the ground, 1 = not on the ground) to a birdsong sound wave turned visual, Stephenson is fluent in what computers need to solve a problem consisting of hundreds to millions of data points.
“Mathematics doesn’t have to live in a silo by itself; it can be used to do real-world things,” Stephenson emphasizes. “These things can be beneficial to humanity, but also to individuals who want to accomplish a certain task, to understand a certain thing. I guess that’s what I think of: How do we solve problems of interest?”
Stephenson’s recent work has featured three machine-learning problems of interest — in notably different fields, but with a common need for math. He’s been training computers to match birdsong recordings to the correct species, differentiate between complete and incomplete engineering drawings, and map out dune topography. A report on the first project was published in early 2020; the others are ongoing.
The task of identifying bird species by their songs — or, rather, teaching a computer to do so — began with a database of some 24,000 audio recordings representing nearly 1,000 different species. Stephenson and his (human) team pared this down to 4,000 recordings and four species, and converted the audio files to visual sound maps in which dark pixels represented the sound amplitude. These were arranged on a graph: the vertical represented frequency, and the horizontal denoted time.
“It’s almost like sheet music for the sound,” Stephenson says. The computer reads this music, and — having been given many example problems with the answer already worked out — matches it to the correct bird species.
The project was a team effort that included Stephenson, his Department of Mathematics colleagues Dr. Mark Pearson and Dr. Paul Pearson, and Hope graduates Russell Houpt ’18, Sarah Seckler ’18, Taylor Rink ’19 and Allison VanderStoep ’19. Their jointly authored article about it is a chapter in An Introduction to Undergraduate Research in Computational and Mathematical Biology (Springer, 2020), an edited volume about preparing undergraduate students to direct mathematical research to biological applications.
“And that was kind of the goal,” Stephenson says, “to learn something about machine learning, and also to learn something about applications in the biology field.”
In the future, undergraduates may also be able to learn something from a machine about engineering. Stephenson and several collaborators are teaching a computer to read a student’s engineering drawing, compare it to an expert’s drawing of the same thing, and provide feedback about how the student’s work can get closer to matching the professional’s. The project is in its incipient stages, and getting a machine to provide constructive (as well as correct) feedback is a challenge. “That’s the part that is still mysterious at this point,” Stephenson says. “I’m looking forward to seeing what it brings.”
Much of his 2020 summer was occupied with Michigan Space Grant Consortium-funded work applying computers and mathematics to the study of dune topography and movement — itself part of a 20-plus-year exploration of the Great Lakes dunes by Hope’s Dune Research Group, a team that involves several members of Hope’s Department of Geological and Environmental Sciences together with Brian Yurk of the Department of Mathematics. Over the years the group has tapped other Hope math professors for specialized assistance.
The dunes shift and change shape in the face of waves and wind, and particularly during large, dramatic weather events. Sand transport and vegetation coverage affect where sand accumulates and which direction dunes tend to move toward. This environment is a complex system demanding complex modeling. Thus: drones, machine learning, and mathematics.
“The overall goal of this drone imagery is to fly the drone high over the dune complex to create very specific land-shape models at various points in time,” Stephenson says. The drone collects vast tracts of data of many types: color, infrared, “red edge” (the part of the color spectrum lying between visible red and infrared), thermal, height and position. Each flight yields 100 million to 150 million data points.
“This becomes a big problem,” Stephenson says. “How do you take these data points that are spatial, and also have six bands of color information and heat information? How do we take that and make some sense out of that? What we’d really like is a very clear feeling for what the land looks like.” For humans, it’s an impossible task. For computers, it’s manageable — with some help from the Dune Research Group.
They literally lay the groundwork for the drone and computer, setting out PVC frames to be photographed at various heights to later help align imagery, and lugging the batteries and other materials needed by the drone. They also conduct “ground-truthing,” physically measuring the heights of trees and bushes to help the machine learn to classify land types and model surface topography using the drone imagery.
As with the other machine learning projects, the researchers feed the computer some of the data with the desired outcomes already determined.
“What we typically do is tell the machine what small parts of the data are,” Stephenson says. “We know all of this is sand, we know all of this is trees, we know all of this is the lake — and then we try to teach the computer to make those associations” with new data.
Part of the challenge is that the new data is so extensive. With so many data points, the computer tends to smooth some over, averaging out a clump when each point should remain distinct. This creates a “ground-point problem,” whereby the computer incorrectly judges some ground points to be in the tree canopy because of their close proximity to those trees.
“We don’t want the computer to smooth over that,” Stephenson says. “We want to find those points.”
Despite the challenges, having thousands to millions of data points is essential to machine learning models.
“Part of the reason they work well is that if you have enough data, you can use a complicated model that has millions of parameters,” Stephenson says. “If the model has millions of parameters, it can fit the data really well. It learns these associations between the data it’s given and the target data.”
The computer learns to minimize error as it goes, just as mathematicians do, but it can do so with millions of inputs — far beyond the reach of a single human mind. Mathematicians call this sort of problem-solving “successive approximations.” Given the example of a correct engineering diagram, or bird identification, or surface height, a computer looks at an approximate value for the parameter with a new piece of data, and sees how great the error is with that value. It determines whether it needs to set the parameter higher or lower, and tests a new number. It does this time and time again, successively improving the output.
“The algorithm starts off knowing some correct answers. If the model predicts something that is contrary to those answers, the algorithm will recognize ‘there’s some error there,’” Stephenson says. “Whenever the computer’s getting a wrong answer, it wants to update the model to get a better answer. We give it the data, we set up what the model structure looks like, but then it automatically adjusts all those parameters to do as well as it can, based on the specified inputs and outputs. It’s what the computer’s really good for, that we can’t do.”
Something the computer can’t do is come up with the questions. That’s a task rightly left to the researchers. Of all his research questions, one in particular guides much of Stephenson’s work.
“How do we use mathematics, computer science, statistics — whatever goes into this whole big balloon of data science — to make life better in some way?”