Mitigating Information Asymmetry – Unsupervised vs Supervised Learning

The distinction between “supervised” and “unsupervised” learning has existed for a long time. Only recently however, has there been such a storm of misunderstanding about what these terms actually mean. Read this article to help you discern which consultants your organisation needs to fire.

Learn to discern the wolves

A supervised learning problem has labels and features. Labels are what you want to predict. Features are the inputs that you will have access to at the time when you make a prediction. What you are asking is “learn to predict these outcomes.” We have some examples below to make it concrete.

  • A real estate automated valuation model (AVM). The labels are the prices at which houses have previously sold for. The features are location, number of bedrooms, number of bathrooms, land size, …etc.
  • A credit risk model. The labels are whether or not a customer is deemed to have defaulted on their loan. The features are time with bank, number of credit enquiries, income, living expenses…etc.
  • Sales forecasting model. The labels are past sales numbers. The features are also past sales numbers. This is a time series forecasting example.
  • Image recognition. The labels are the description of what is in the image. These are usually discrete categories. For example, “cat”, “hot dog”, “tiger” and “number 7”. The features are the images.

A unsupervised learning problem does not have labels. Only features. What you are asking is “find me some interesting patterns in this data.” We have to think harder to come up with some good examples – see below.

  • Customer feedback text clustering. Our company has a free text feedback form. We take customers’ text feedback and assign it to several groups. We then read a few responses from each group to understand the nature of the feedback in that group.
  • Suppose that we don’t have house prices, but we still want to see what kind of houses we can buy. The features are location, number of bedrooms, number of bathrooms, land size, …etc. So we investigate further and examine how big the land size is in different suburbs.
  • Customer segmentation. Assign customers to segments. The marketing team uses the segments to plan their marketing strategy.

How can we implement these systems in a large organisation? Check out our example data science architecture.

Productionise Properly – come see how it’s done. The Enterprise Data Science Architecture Conference focuses on how to properly productionise data science solutions at scale. We have confirmed speakers from ANZ Bank, Coles Group, SEEK, ENGIE, Latitude Financial, Microsoft, AWS and Growing Data. The combination of presentations is intended to paint a complete picture of what it takes to productionise a profitable data science solution. As an industry, we are figuring out how to best build end-to-end machine learning solutions. As the field matures, knowledge of best practices in end-to-end machine learning pipelines will become essential skills. I invite you to view our list of confirmed speakers and talks at https://edsaconf.io because this is the right place to meet the right people and up-skill.

Meet the right people and up-skill. The conference will be on the 27th March at the Melbourne Marriott Hotel. A fully catered conference with coffee, lunch, morning/afternoon tea and evening drinks & canapes. I invite you to reserve your place at https://edsaconf.io this is the best place to learn the emerging best practices.

Slava Razbash

Leave a Reply