Behind Great Product: Machine Learning and AI for a Marketplace Ecosystem

Published in

Tokopedia Product

5 min readNov 7, 2020

On September 30, we hosted a Behind a Great Product event discussing Machine Learning (ML) and Artificial Intelligence (AI) in a Marketplace Ecosystem. Jose Curto, Professor at IE University Spain, opened our session by discussing the use case of Machine Learning in the e-commerce context. Leonardo Saerang, our PM Lead from Tokopedia, goes into more detail on the product development process at Tokopedia that we follow and explains how Tokopedia brings a ML product to live.

Algorithms are mechanisms for us to accelerate our biases to help users achieve their desired outcome faster. Machine learning is the study of computer algorithms that improve automatically through experience and is a subset of Artificial Intelligence.

Machine Learning is quite commonly used in products, so it’s great for us to learn what Machine Learning is, the applications of Machine Learning and how we develop machine learning models for our use cases.

Session 1: Machine Learning for a Marketplace Ecosystem by Professor Jose Curto

There are 4 common use cases of machine learning:
1. Assessing customer churn

2. Dynamic forecasting

3. Customer segmentation

4. Recommendation generation

Recommendations help companies to: (1) Initiate user engagement (2) Increase user engagement (3) Promote the discovery of new content. Jose focuses the discussion on recommendation generation, which allows companies to help customers discover better products faster.

Recommendation systems help companies to initiate user engagement, increase user engagement and promote the discovery of new content.

Building recommendation systems

We can first differentiate Recommendation Systems on whether they are personalized or not personalized. If information about the user is available, we can provide personalized recommendations where different users receive different suggestions. If information about the user is not available, we provide non-personalized, generic recommendations that are typically rule-based. Examples of non-personalized recommendations are popular products, trendings articles.

In practice, companies start by building non-personalized recommendations to initiate their recommendation system when user information is not available, while building the foundations for personalized recommendation systems.

There are 3 major types of personalized recommendation systems:

1. Content-based filtering

This is based on the item-content. Recommended items are described by keywords, while user mode is structured based on preferred keywords (or classifier). The pro here is that we do not need user data to build this. Another major pro is that this model has the ability to recommend to users with unique tastes, and recommend new and unpopular items, which are long-tail in nature.

Content-based filtering is also more “explainable” — you can understand the attributes behind the algorithm and make sure that the algorithm is aligned to the logic you want to promote and avoid information bubbles. The con of content-based RS is that the machine requires content that can be encoded as meaningful features. Second, feature extraction may be difficult and there is also the danger of overfitting our data. Finally, recommendations are not related to the user’s tastes.

2. Collaborative filtering

In collaborative filtering, recommendation is based on explicit or implicit opinions of users on items (e.g. star rating or reviews). The company has to define a metric to measure similarity between users, such as Jaccard Index, Intra-list Similarity and other similarity measures.

The pro is that this model produces good enough results in most cases. The con is that this model requires a large number of reliable user feedback data points (e.g. star rating, reviews) and products have to be standardized. Additionally, this model assumes that prior behavior determines current behavior.

3. Hybrid

A hybrid approach combines the best features of two or more recommendation systems. A typical combination is to combine collaborative filtering recommendation system with other recommendation systems.

A hybrid approach combines the best features of two or more recommendation systems.

Netflix, for example, combines different recommendation systems. On your Netflix homepage, you will typically see generic recommendations such as new releases, as well as personalized recommendations based on your taste in sections like “You might like this.”

Final tips from Jose

Start with data governance and ensure high quality data before machine learning
Collect data in the right way. Think about the attributes of products and capture the relevant user engagements and attributes to fuel your RS and ML in the future
Give time for your company to generate enough data
Take into account seasonality in your startup when analyzing data and designing your models

Session 2: Building Machine Learning in Tokopedia by Leonardo Saerang

Machine Learning models are commonly used in Tokopedia. We use ML for product recommendation, search, ads placement, fraud detection, chatbot and many more. We continually improve our models and try to shift away from rule-based models into the more scalable and automated ML models that can serve customers better.

Leonardo gives a sneak peak inside the collaboration that happens behind every new product or product iteration.

The product development cycle for ML products

The Search teams follows the typical product development cycle:

1. Problem statement: Defining a clear problem statement for the customer

2. Defining solution: Design the model or logic to solve the customer pain point

3. Developing the product: Bringing the product — in this case, a new model — to live, with data and development

4. Validating the product: Various testing to validate the product hypothesis and performance

5. Rollout: Introduce the product to users and experiences

A key part of the development team are Data Scientists, Data Analysts and Data Engineers. Data Analysts are experts in analyzing the data and helps the product team develop and validate the product hypothesis. Data Scientists, on the other hand, are the experts on the ML model and build the ML model. Finally, Data Engineers manages the data and creates the data flow to the products.

Final tips from Leonardo

Write clear and sharp problem statement and expected output from the product
Choose iterative model: Plan models that can be created in stages
Manage development timeline: Build in buffers in timeline and proactively manage the timeline. Overcommunicate especially during the development stage
Validate and learn: Continue to test and iterate the model with offline validation as well when necessary
Ensure that data quality is high and continue to improve data quality. “Garbage in, garbage out” — your model is only going to be as good as the data input
Learn more about data science from YouTube and online classes available on Coursera