Mitigating Information Asymmetry – Unsupervised vs Supervised Learning

Slava Razbash, has worked in data science roles in multinational enterprises,
startups and even a university. He has a solid track record that includes working in CBA’s big data team and helping start Sportsbet’s data
science and personalisation capability. Slava is the Founder of the Enterprise Data Science Architecture Conference.

Reserve your place today at https://edsaconf.io because you must keep your skills current.
Slava Razbash

The distinction between “supervised” and “unsupervised” learning has existed for a long time. Only recently however, has there been such a storm of misunderstanding about what these terms actually mean. Read this article to help you discern which consultants your organisation needs to fire.

Learn to discern the wolves

A supervised learning problem has labels and features. Labels are what you want to predict. Features are the inputs that you will have access to at the time when you make a prediction. What you are asking is “learn to predict these outcomes.” We have some examples below to make it concrete.

  • A real estate automated valuation model (AVM). The labels are the prices at which houses have previously sold for. The features are location, number of bedrooms, number of bathrooms, land size, …etc.
  • A credit risk model. The labels are whether or not a customer is deemed to have defaulted on their loan. The features are time with bank, number of credit enquiries, income, living expenses…etc.
  • Sales forecasting model. The labels are past sales numbers. The features are also past sales numbers. This is a time series forecasting example.
  • Image recognition. The labels are the description of what is in the image. These are usually discrete categories. For example, “cat”, “hot dog”, “tiger” and “number 7”. The features are the images.

A unsupervised learning problem does not have labels. Only features. What you are asking is “find me some interesting patterns in this data.” We have to think harder to come up with some good examples – see below.

  • Customer feedback text clustering. Our company has a free text feedback form. We take customers’ text feedback and assign it to several groups. We then read a few responses from each group to understand the nature of the feedback in that group.
  • Suppose that we don’t have house prices, but we still want to see what kind of houses we can buy. The features are location, number of bedrooms, number of bathrooms, land size, …etc. So we investigate further and examine how big the land size is in different suburbs.
  • Customer segmentation. Assign customers to segments. The marketing team uses the segments to plan their marketing strategy.

How can we implement these systems in a large organisation? Check out our example data science architecture.

Productionise Properly – come see how it’s done. The Enterprise Data Science Architecture Conference focuses on how to properly productionise data science solutions at scale. We have confirmed speakers from ANZ Bank, Coles Group, SEEK, ENGIE, Latitude Financial, Microsoft, AWS and Growing Data. The combination of presentations is intended to paint a complete picture of what it takes to productionise a profitable data science solution. As an industry, we are figuring out how to best build end-to-end machine learning solutions. As the field matures, knowledge of best practices in end-to-end machine learning pipelines will become essential skills. I invite you to view our list of confirmed speakers and talks at https://edsaconf.io because this is the right place to meet the right people and up-skill.

Meet the right people and up-skill. The conference will be on the 27th March at the Melbourne Marriott Hotel. A fully catered conference with coffee, lunch, morning/afternoon tea and evening drinks & canapes. I invite you to reserve your place at https://edsaconf.io this is the best place to learn the emerging best practices.

SWOT Analysis of Data Science Projects in Large Enterprises

Slava Razbash, has worked in data science roles in multinational enterprises,
startups and even a university. He has a solid track record that includes working in CBA’s big data team and helping start Sportsbet’s data
science and personalisation capability. Slava is the Founder of the Enterprise Data Science Architecture Conference.

Reserve your place today at https://edsaconf.io because you must keep your skills current.
Slava Razbash

Your organisation has started its data science journey – what should you watch out for? This broad brush analysis will hopefully resonate with your experience. You will see these Strengths, Weaknesses, Opportunities and Threats (SWOT) in most large organisations, on most data science projects. The leaders in the field are overcoming the weaknesses and threats described below – some of these leaders will be presenting at the Enterprise Data Science Architecture Conference.

Strengths: The strength of a large and established enterprise is its current business. Your organisation has built up its business over a long time. You have an established customer base, established processes, economies of scale and brand equity. If you are a bank, then you probably have a competitive advantage in cost of funds.

Weaknesses: To work properly, data science must be integrated into your organisation. You will need new processes, new teams, new specialised roles, new ways of working, new infrastructure. The data science architecture as presented in the previous article, is radically different to what most large organisations currently have implemented. For example, many ASX50 companies do not have real time personalisation on their websites. Change takes time in large organisations. The right companies are building their data science platforms right now.

Information asymmetry. Data science is a new field and everyone is a self proclaimed expert. Senior leaders who have come up through the business side need to sift through the salespeople – despite the information asymmetry. Which consultants to engage? Whom to hire? Whom to promote? Whose experience is relevant?

You will also need to recruit leaders to take your company on this journey. These leaders will need to have relevant skills and experience. Managing an older style BI team is not relevant experience. This is the opportunity someone who understands the latest technology to level up. Meet the right leaders at the Enterprise Data Science Architecture Conference.

Opportunities: Properly productionised data science can increase the profitability of any large enterprise. Although some companies are further ahead than others, everyone is just beginning their journey. This is the opportunity to pull ahead of your competitors – if you get it right. Employees at all levels have the opportunity to grow their careers by building up a track record with the right experience.

Threats: If your competitors go further along the data science journey by a meaningful amount, then you will lose market share. Properly productionised data science can be an unfair advantage against an unprepared competitor. For example, your competitor could send a just-in-time retention offer before you have even noticed that you have acquired a new customer.

Career threats for technical staff. Working on successful, cutting edge projects is career gold. Working with stale technologies and irrelevant KPIs is career death because you will be de-skilling yourself. Team members who understand this will leave to work on cutting edge projects. On the other hand, team members who work on cutting edge projects will also leave when they find higher paying jobs.

Career threats for management staff. Building a track record as a leader in a leading company is a great career boost. However, if you recruit the wrong team, the implementation of data science solutions in your company may fall in the wrong direction.

The Enterprise Data Science Architecture Conference focuses on how to properly productionise data science solutions at scale. We have confirmed speakers from ANZ Bank, Coles Group, SEEK, ENGIE, Latitude Financial, Microsoft, AWS and Growing Data. The combination of presentations is intended to paint a complete picture of what it takes to productionise a profitable data science solution. As an industry, we are figuring out how to best build end-to-end machine learning solutions. As the field matures, knowledge of best practices in end-to-end machine learning pipelines will become essential skills. I invite you to view our list of confirmed speakers and talks at https://edsaconf.io because this is the right place to meet the right people and up-skill.

Meet the right people and up-skill. The conference will be on the 27th March at the Melbourne Marriott Hotel. A fully catered conference with coffee, lunch, morning/afternoon tea and evening drinks & canapes. I invite you to reserve your place at https://edsaconf.io this is the best place to learn the emerging best practices.