In my last two posts, I’ve talked about machine learning (ML) and how it can help you get more out of your analytics and data integration efforts. Because ML is not a specific technology, but rather a deep and complex set of mathematical algorithms, it’s important to understand which types of algorithms will help you get the insights you seek from your data—and which will give you insights you didn’t even know you wanted.
I’m going to discuss three very broad categories of ML algorithms: supervised learning, unsupervised learning, and hybrid models that combine elements of the other two. The use of each type is dependent on your goals and your appetite for uncertainty.
Supervised learning is the workhorse of ML. It involves training a machine with paired data—a series of inputs where the output is known. Feed the machine enough of these data pairs and it learns which data go together.
For example, if you feed the machine information on the stock market, along with date and economic information, you can construct a relatively accurate predictive model. Of course, it won’t be 100% accurate (humans value and run companies, so there’s irrationality, and therefore unpredictability) but with enough time and data, the model will get really good at predicting the Dow-Jones Average.
You can also build supervised learning models that classify things. For example, researchers can feed the machine population and epidemiological data and build a model of people who are likely to get cancer, heart disease, or diabetes.
Companies can build predictive models of customer segments that are likely to churn, demand forecasts, project outcomes, financial performance—the list goes on and on. The big benefit here is that with ML, you can more accurately predict events or behaviors, and you can devise and implement strategies that capitalize on those models.
Unsupervised learning is the powerful wildcard of ML, although its power is unfortunately sometimes hindered by its unpredictability. With unsupervised learning, the inputs are known, but the predicted outputs aren’t. Like many humans, the machine learns by trial and error.
Inputs and outputs are paired by experience. Given enough data and time, the algorithm will show you patterns in the data that you would never discover using supervised methods. However, because the outputs aren’t known in advance, it’s often difficult to attest to the validity of the model.
Clustering, is one common technique used for unsupervised learning. It involves grouping set members with common traits together. For example, you can segment customers with similar buying habits or other behaviors. The difficulty lies in knowing whether or not these groups provide useful insights, how many of them should exist, or whether they’re even grouped correctly. You can refine the model over time, but there’s always a level of uncertainty. If you can live with that, though, unsupervised models can provide unique and very valuable insights.
Hybrid Algorithms—The Best of Both Worlds
The best of both worlds are hybrid algorithms that combine elements of both supervised and unsupervised learning methods to couple the relative certainty of supervised learning with the power and novel insight generation of unsupervised learning. One of these so-called hybrid models is reinforcement learning.
You might have heard of this type of algorithm if you’ve read about computers that have trained to beat opponents at games like go, Atari, and chess. Reinforcement learning algorithms basically pair observations and measurements to a prescribed set of actions in the process of trying to achieve and optimize a reward. The computer interacts with its environment in attempt to learn how to master it.
The outcomes aren’t known in advance, but desired outcomes are rewarded. Reinforcement learning can be applied to all sorts of business activities such as risk management, inventory management, logistics, product design. The list is huge. The bottom-line benefit is that reinforcement learning can help you discover the optimal outcomes you seek, and it can reveal outcomes that you didn’t seek, but that you can leverage to optimize your operations.
There are two caveats to keep in mind with ML. One: it’s easy to let bias creep into ML algorithms, so you must constantly measure your results against your goals, applicable standards, and ethics. Two: it takes a huge volume of clean data to achieve valid, predictable results with ML algorithms. And I mean really, really large data volumes, so it’s a perfect use for all that big data you have, but the data has to be reliable–it’s that old GIGO rule.
I’ve only made a very small scratch on the vast surface of ML here. There are many other techniques—such as anomaly detection to help detect fraud and bolster risk management efforts—that can help you improve your analytics and increase your bottom line. From here, it’s a matter of learning all you can about the technology and selecting what’s right for your business goals and analytics ecosystem. So, what are you waiting for?