The problem that we are solving: Organisations have several places where different pieces of data are stored. We need information from each source to paint a complete picture that adds business value.
Imagine that we have the following data sources in our organisation:
- The company data warehouse, with most of the information required by the finance and risks teams. However it’s missing the fields that we need to build a cross-sell model.
- The database with historical customer transactions – essential to your data science project. It’s owned by marketing and your team can’t have access.
- The finance team’s special database. The company can’t calculate EBIT without it.
- The company CRM.
- The company CRM for B2B customers.
- The web analytics data store. The web analytics team are kind enough to provide your team with monthly extracts.
- The company data lake. It stores outputs from your team’s machine learning models. Some source system data has been loaded as well.
If we could have access to all of these pieces of information then we could build the best machine learning models, report the deepest insights and place our company firmly in first place. But how? Our data science team can’t get access to most of those databases. Copying them into the data lake is an ongoing two year project. Data virtualisation could help.
The data virtualisation software will connect to and query our data sources. Our data users will connect to and query the data virtualisation software as if it were any other database. They will be able to query and join all of the data across all of the data sources.
Users will only need to apply for access to one system and their credentials only need to be removed from one system if they leave the company. The data virtualisation software may also be able to mask certain sensitive fields for certain types of users. For example, we can hide customer names from teams who don’t need to know them.
Data virtualisation is one piece of the of the picture. What you do with the data makes the difference between best practices and wasting money. As a specialist, you would have seen countless examples of adding value: increasing profit, saving lives, managing risk, automating manual labour. On the other hand, if you are a non-specialist, check out our example for non-specialists.
Productionise Properly – come see how it’s done. The Enterprise Data Science Architecture Conference focuses on how to properly productionise data science solutions at scale. We have confirmed speakers from ANZ Bank, Coles Group, SEEK, ENGIE, Latitude Financial, Microsoft, AWS and Growing Data. The combination of presentations is intended to paint a complete picture of what it takes to productionise a profitable data science solution. As an industry, we are figuring out how to best build end-to-end machine learning solutions. As the field matures, knowledge of best practices in end-to-end machine learning pipelines will become essential skills. I invite you to view our list of confirmed speakers and talks at https://edsaconf.io because this is the right place to meet the right people and up-skill.
Meet the right people and up-skill. The conference will be on the 27th March at the Melbourne Marriott Hotel. A fully catered conference with coffee, lunch, morning/afternoon tea and evening drinks & canapes. I invite you to reserve your place at https://edsaconf.io this is the best place to learn the emerging best practices.
- Mitigating Information Asymmetry – Unsupervised vs Supervised Learning - 31 January 2020
- Data Virtualisation – The Value Add - 22 January 2020
- SWOT Analysis of Data Science Projects in Large Enterprises - 6 January 2020