Abstract:
With the steep emerging of machine learning and big data technologies, it is easier every day for companies to take advantage of all their data about their customers. One of the hottest topic is personalization to improve user experience, for example the “Did you forget this product?” suggestions a few supermarket implements, or recommendations based on the user’s basket and purchase history.
One of the companies to utilize machine learning in their daily life is REWE Digital, oneof the biggest supermarket-chains in Germany, which handles millions of customer events every day.
The broad goal of this thesis is to present a way to effectively predict the persona
(for example “Family with two high school children”, “Couple without children”, “Working single”, “Older family with a teenager”, “Old couple” or “Student”) of a web session in real-time based on their behavior, like events of visiting and buying certain products, basket size and price, et cetera.
First, it will be shown how web sessions can be clustered without hundreds of URL
features expected by the paper of Olfa Nasraoui, Hichem Frigui, Raghu Krishnapuram and Anupam Joshi in case pages can be broken down to a fix amount of attributes. Then, the customer events shared by REWE will be prepared and transformed to feature vectors, with which several clustering methods will be tested and compared to find out which ones are useful for web sessions. After having the clusters, this thesis will introduce the possible ways to predict the corresponding cluster of a new session real-time. Lastly, possible methods to validate the effectiveness of the formed clusters will be discussed.