Agenda
Predictive Analytics World London
etc.venues, 200 Aldersgate, 11-12 October, 2017
Review – Agenda 2017
Predictive Analytics World for Business - London - Day 1 - Wednesday, 11th October 2017
Welcome!
- Every analytics challenge reduces, at its technical core, to optimizing a metric
- As algorithms improve over time, one could imagine obtaining a solution merely by defining the guiding metric
- But are our tools that good? Are we aiming them in the right direction?
Every analytics challenge reduces, at its technical core, to optimizing a metric. Product recommendation engines push items to maximize a customer's purchases; fraud detection algorithms flag transactions to minimize losses; and so forth. As modeling and classification (optimization) algorithms improve over time, one could imagine obtaining a solution merely by defining the guiding metric. But are our tools that good? More importantly, are we aiming them in the right direction? I think, too often, the answer is no. I'll argue for clear thinking about what exactly it is we ask our computer assistant to do for us, and recount some illustrative war stories. (Analytic heresy guaranteed.)
HAVI is global, privately held company that serves customers in more than 100 countries. Thanks to decades of experience, working for some of the biggest brands, we support our customers with cutting-edge cross-functional analytics that can drive success in both marketing and in the supply chain. Our award-winning Marketing Analytics tools apply the very latest in visualisation techniques to offer at-your-finger-tips business insights. And our revolutionary new Prescriptive Marketing Platform uses big data analytics to predict the outcomes of planned limited-time-offers and promotions, and to prescribe marketing recommendations that can better meet business goals.
- How operational research techniques can enhance predictive analytics
- Optimisation methods to find the profit optimal footprint for Retailers
- Simulation techniques to model the flow of patients in Emergency departments
Powerful, user-friendly tools are critical for successful implementation and to empower decision makers for their ongoing and more robust decision making. Two tools are presented to demonstrate how operational research techniques can enhance predictive analytics by exploring alternative scenarios under future uncertainty and change. The first tool employs optimisation methods to find the profit optimal footprint for Retailers, taking into account how business flows around the network as it changes. The second tool uses simulation techniques to model the flow of patients in Emergency departments. It has been used at two Emergency departments to help understand the impact of reconfiguring the departments, changes to workforce rotas and changes to capacity levels.
- The core reason for data visualization
- Learning the patterns corresponding to various multivariate relations
- A topology of proximity opening the way for visualization in Big Data
A dataset with M items has 2M subsets anyone of which may be the one satisfying our objective. With a good data display and interactivity our fantastic pattern-recognition defeats this combinatorial explosion by extracting insights from the visual patterns. This is the core reason for data visualization. With parallel coordinates the search for relations in multivariate data is transformed into a 2-D pattern recognition problem. Together with criteria for good query design, we illustrate this on several real datasets (financial, process control, credit-score, one with hundreds of variables) with stunning results. A geometric classification algorithm yields the classification rule explicitly and visually. The minimal set of variables, features, are found and ordered by their predictive value. A model of a country’s economy reveals sensitivities, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. An overview of the methodology provides foundational understanding; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. A topology of proximity emerges opening the way for visualization in Big Data. Learn how to answer questions you did not know ... how to ask
- Anti-ML efforts are a large part of financial crime compliance
- Successful ML detection is inherently different from classical anomaly detection in time-series data
- Risk-based approach to anti-ML and what this means for ML detection techniques
A large part of financial crime compliance for financial institutions is in the anti-ML efforts, where banks are required to conduct ongoing clients due diligence and transaction monitoring. Predictive analytics are being more relied upon for the latter. However, successful ML detection is inherently different from classical anomaly detection in time-series data. Global financial action task forces advocate for a risk-based approach to anti-ML, and this presentation will cover what this means for ML detection techniques.
- How one of the largest logistics company in the Middle East automated the delivery of shipments using machine learning
- No postcodes in the region, instead customers need to give a description of the address
- How GPS was used to predict the delivery location using clustering and classification algorithms
This presentation will give an insight into how we helped one of the largest logistics company in the Middle East to automate the delivery of their shipments using machine learning. The problem was particularly interesting because there are no postcodes in the region. Instead our client would have to make a phone call to each customer and get a description of the address instead, e.g. house with brown door around XYZ roundabout. This process became more difficult because of the cultural dynamics existing amongst the demographics.This presentation will describe how we leveraged the GPS data the client had to predict the delivery location using clustering and classification algorithms and how we operationalised the model in order to drive actions in real time.
Neural networks have been proven to be universal approximators but this still leaves the identification task a hard one. Beside using data we should focus our attention on the underlying structure of our subject of interest. In case of dynamical systems this is time, leading us to state space models and recurrent neural networks. After an introduction of small (open) dynamical systems we will study dynamical systems on manifolds. Here manifold and dynamics have to be identified in parallel. We will move on to large (closed) dynamical systems with hundreds of state variables and will combine causal and retro-causal models of the observations. This combination leads us to an implicit description of dynamical systems on manifolds. Finally we will discuss the quantification of uncertainty in forecasting. In our framework the uncertainty appears as a consequence of principally unidentifiable hidden variables in the description of large systems. Together with the mathematical concepts we will see applications in economics and engineering.
- Public construction contracts are awarded through the competitive bidding process
- Presentation of a model that accurately predicts competitor’s bids
- Case study of client who achieved a CAGR of 46%
Each year, a significant number of public construction contracts are awarded through the competitive bidding process. For most contractors, this represents the proverbial "Catch 22." Bid too high and you lose, bid too low and you jeopardize profit. What if you knew the other bids? Paul will present a model that accurately predicts competitor’s bids. The results are astonishing. One Contractor increased the number of winning bids to 80% from 58% but more importantly maximized their profit potential for each award. With the help of “Confucius,” this client achieved a CAGR of 46%.
Neural networks have been proven to be universal approximators but this still leaves the identification task a hard one. Beside using data we should focus our attention on the underlying structure of our subject of interest. In case of dynamical systems this is time, leading us to state space models and recurrent neural networks. After an introduction of small (open) dynamical systems we will study dynamical systems on manifolds. Here manifold and dynamics have to be identified in parallel. We will move on to large (closed) dynamical systems with hundreds of state variables and will combine causal and retro-causal models of the observations. This combination leads us to an implicit description of dynamical systems on manifolds. Finally we will discuss the quantification of uncertainty in forecasting. In our framework the uncertainty appears as a consequence of principally unidentifiable hidden variables in the description of large systems. Together with the mathematical concepts we will see applications in economics and engineering.
- True innovation starts with asking Big Questions
- New perspectives on the Big Data and Data Science challenges we face today
- How learning from the past can help you solve the problems of the future
The “Big Data” and “Data Science” rhetoric of recent years seems to focus mostly on collecting, storing and analysing existing data. Data which many seem to think they have “too much of” already. However, the greatest discoveries in both science and business rarely come from analysing things that are already there. True innovation starts with asking Big Questions. Only then does it become apparent which data is needed to find the answers we seek. In this session, we relive the true story of an epic voyage in search of data. A quest for knowledge that will take us around the globe and into the solar system. Along the way, we attempt to transmute lead into gold, use machine learning to optimise email marketing campaigns, experiment with sauerkraut, investigate a novel “Data Scientific” method for sentiment analysis, and discover a new continent. This ancient adventure brings new perspectives on the Big Data and Data Science challenges we face today. Come and see how learning from the past can help you solve the problems of the future.
- GDPR regulations will bring some unsettling new requirements for data scientists
- Five topics of interest
- Illustrated with concrete examples built using open source software
The wide-ranging GDPR regulations to be implemented by all EU countries on 25 May 2018 will bring some unsettling new requirements for data scientists that use data considered personal in the EU – not just for “consumer” data but also for business-to-business data. This presentation focuses on five specific topics of interest: notification, permission for use, the right to be forgotten, discrimination and “pseudo-discrimination”, as well as anonymization. It will be illustrated with concrete examples built using open source software, so you can try a few of the ideas yourself.
Predictive Analytics World for Business - London - Day 2 - Thursday, 12th October 2017
- What Artificial Intelligence and Machine Learning are, how they work, and their relevance to forecasting
- Real case studies of AI and ML algorithms employed by leading manufacturers
- The power of forecasting algorithms that learn, adapt to context and find hidden insights
With more and more data becoming available, Artificial Intelligence (AI) and Machine Learning (ML) for Analytics are the latest hot topics pushed by the media, with companies like Facebook, Google and Uber promising breakthroughs in areas such as speech recognition and predictive maintenance. However, in the forecasting world, reality looks very different. An industry survey of 200+ companies shows that despite substantially growing data sources, most companies employ very basic statistical algorithms from the 1960s, and even market leaders have been slow to adopt intelligent algorithms to enhance planning decisions. This reveals a huge gap between scientific innovations and industry capabilities, with opportunities to gain unprecedented market intelligence being missed. In this session, we will highlight examples of how industry thought leaders have successfully implemented artificial Neural Networks and advanced Machine Learning algorithms for forecasting, including FMCG Manufacturer Beiersdorf, Beer Manufacturers Anheuser Bush InBev, and Container Shipping line Hapag-Lloyd. I will leave you with a vision not of the future, but of what’s happening now, and how it is revolutionizing your field.
- Using machine learning to predict when product decline starts to occur
- Early warning system to proactively manage product life-cycle, generate additional margin and avoid costly write-offs
- How to compare performance from a variety of state of the art techniques
Consumer goods companies introduce on average 4 products each day and withdraw 3, trend increasing. The timing of product introduction and withdrawal is essential to maximise enterprise value. Shell lubricants has developed a unique approach using machine learning to predict when product decline starts to occur. This early warning system enables to proactively manage product life-cycle, generate additional margin and avoid costly write-offs. This approach has been developed and validated using historical data with 1000s of saleable products exhibiting a high degree of sensitivity and specificity. In this session Laks will show you how to compare performance from a variety of state of the art techniques.
- Case Study Barclays
- Pulling different data sources together
- Creating an accurateforecasting/predictive model and intuitive simulator
Barclays had ambitious growth targets for 2016 against a backdrop of declining marketing investment. To achieve more with less they needed to determine optimal media spend and mix, messaging, and targeting. In this case study the Modellers and Barclays will present how different data sources were pulled together and an accurate forecasting/predictive model and intuitive simulator that ensured informed and aligned marketing decision-making within the organisation built. This is now recognised as a key asset within Barclays, which continues to evolve through a test, learn, adapt cycle.
- Data arrives in a continuous fashion and the data stream continuously refreshes and changes
- Fit regression models online, updating the parameters of the model as new data arrives, so that the model updates to reflect the data from the stream
- Spark Streaming Regression Model to illustrate how to predict a better forecast than traditional batch prediction method
In many applications, data arrive in a continuous fashion, in data streams of sensor data, transaction data, communication data, etc. The data stream continuously refreshes and changes. If time to insight is crucial, it is useful to fit regression models online, updating the parameters of the model as new data arrives, so that the model continually updates to reflect the data from the stream. In this session, I will walk through Spark Streaming Regression Model to illustrate how to predict a much better forecast than traditional batch prediction method.
- Building a price prediction model for AutoScout24, making this model available and learning from user feedback
- Gaining transparency in current and future market values
- Helping each seller to find his individually optimized selling price
Scout24 is a leading operator of digital marketplaces specializing in the real estate and automotive sectors in Germany and other selected European countries. Scout24 aims to transform data into Market Insights to empower their users making informed decisions. This session is a journey about building a price prediction model for AutoScout24, making this model available and learning from user feedback. After gaining transparency in current and future market values, the next steps are about helping each seller to find his individually optimized selling price dependent on the sellers’ personal preferences regarding speed of sale and revenue.
- Time-sensitive machine learning model to mine customers’ demographic and behavior features and predict time of likely conversion
- Deep learning and survival analysis for extending traditional linear survival analysis to non-linear data
- Applying DeepSurvival allows to outperform any of the traditional targeted advertising algorithms
Targeted advertising is a form of advertising that focuses on certain attributes of the customers. The advertisement should influence the best consumer for their company’s product at the right time. In this talk, we demonstrate a time-sensitive machine learning model to mine customers’ demographic and behavior features and predict when the customer is likely to convert. We adopted deep learning and survival analysis (a Deep Cox Proportional Hazards Network) for extending traditional linear survival analysis to non-linear data. Our results demonstrated significant effectives on time-sensitive marketing campaigns. Applying DeepSurvival allows us to outperform any of the traditional targeted advertising algorithms.
- How to design a predictive model to carry out a segmentation process with the massive data coming from Twitter
- Behavior, cluster analysis, feature engineering and ensemble methods
- Python a R languages to build the model
Customer segmentation is one of the most important aspects on marketing field. It is a factor when launching any kind of campaign. Either for commercial purposes as well as political targeting. In this session, Miguel Barros will explain how to design a predictive model to carry out a segmentation process with the massive data coming from Twitter. Topics that are going to be considered will be user behavior, cluster analysis, feature engineering and ensemble methods as well. Python a R languages were used to build our models, making use of the advantages of each one.
- What are the latest software and technical developments in credit scoring of retail borrowers.
- How can new sources of data (social network analysis and text mining) add value to the credit scoring process.
- How can we overcome regulatory challenges to leverage new developments in machine learning.
In this tutorial, we will showcase the best practices in modern credit scoring. The first part will show how off-the-shelve software can quickly streamline the process of creating in-house scores, while giving an introduction to what credit scoring is and which problems it can help to solve. The second part of the tutorial will show modern techniques to approach the problem, and the added value they bring. We will focus in two: social network analysis and text analytics. This will be done by implementing techniques such as an adaptation of PageRank to credit scoring and deep learning for text analytics using a combination of Python and R.
- Real image recognition at Badoo (expression detection)
- Recurrent Neural Network to predict the facial expression out of a photo
- How to create a multi layer network
Pictures are a crucial factor for the online dating business and Badoo has 4 million photos uploaded every day. Leonardo will present Deep Leaning concepts and will examine a real image recognition project in Badoo (expression detection). Facial expression recognition is used for user segmentation and analysis. A Recurrent Neural Network was created to predict the facial expression out of a photo. This is part of a pipeline that recognise the faces, crops them and pass them into the algorithm. Then the algorithm will answer with a probability for each facial expression. In this session, Leonardo will present how to create a multi layer network and then we will go through more sophisticated topics showing the iterations to arrive with the final deliverable.
- How do we get started to move towards predictive analytics
- How to enable prediction in businesses with multiple, complex dependencies
- Lessons from healthcare that can be applied across other businesses
In an emerging market like India, the Government does not fund healthcare. Patients bear the cost which is often quite high. Apollo Hospitals, India envisions a system to predict the cost to the patient, but without affecting quality and accessibility. By employing data mining, sampling and simple statistical tools, a predictive model was created. First a surgery package price where volumes were unaffected but value-based variances were lower was standardised. Next, the focus was on tackling the dependencies for 100 surgical procedures among a large group of patients. Which questions must be asked to predict a patient’s cost? What factors could be controlled to reduce the probability of an incorrect prediction? The same logic can be applied to any business that wishes to adopt PA: study of data to confirm patterns, identify dependencies after removal of outliers, and improve probability of prediction.
- Creating user lists based on purchase probability
- How machine learning can increase the overall e-commerce earnings and the revenue per customer
- Open source tools and query samples in a step-by-step guide to implement your own predictive customer journey
Together with our client 220-volt.ru, one of the largest Russian DIY online-stores, we tried to understand customer behaviour and to find an algorithm to re-distribute marketing budget according to the input of each channel. To do this, we took historical user data and calculated the probability of purchase. In this deep dive / case-study we will show how we used machine learning to increase the overall e-commerce spendings up to 20%. The revenue per customer also increased significantly: twice in comparison to control group. At the beginning, we have tested different machine learning models: logistic regression, random forest, XGBoost. We checked it out then with AUC ROC metric and used holdout and cross-validation. To make sure our predictions are working we applied finally an A/B test. In our workshop-like session we will provide all tools and query samples in a step-by-step guide, so that everyone can try to play with their own raw data using free and open source tools: Yandex.Metrica Logs API, ClickHouse open-source DBMS, Python, Pandas, XGBoost. This session enables you to implement your own predictive customer journey.