Test Drive Google Discovery AI

Google-quality Search and Recommendations for all of your digital properties

Published in

Google Cloud - Community

12 min readMay 17, 2023

In this article, we’ll explore Google’s Discovery AI suite of products — Search/Browse and Recommendations AI— by creating a demo environment and test-driving each of the core components using sample data.

What is Discovery AI?

While ChatGPT, Bard, and Generative AI are currently in the headlines, Google has been quietly infusing AI into its products for years to make them more intelligent, predictive, and more accessible to the masses. Google’s Discovery AI suite is no different.

Discovery AI employs advanced understanding of user intent and context, Natural Language Processing (NLP), and Google’s expertise in AI that enables advanced query understanding and personalization — delivered through familiar API integration patterns with client libraries for all of the major programming languages. Everyone now has the ability to provide Google-quality search, browse and recommendations on their own digital properties with Discovery AI.

Creating a Discovery AI Instance in under 10 minutes!

Using an open dataset of movie ratings from MovieLens, we’ll construct a product catalog of movie titles and user-event data consisting of over 25 million movie ratings to simulate customer behavior on an ecommerce site.

Clone the quickstart Github repo

I’ve created a Github repo that will do most of the heavy lifting for you so you can focus on the fun stuff in the Google Cloud Console;). To start, you’ll need a terminal with git and the Google Cloud SDK installed. The Google Cloud Shell is a great place to run these commands since both git and the Cloud SDK are already installed!

The full explanation can be found here:
https://github.com/cloud-jake/recai-moviedb

Otherwise, let’s get started by cloning the repo.

git clone https://github.com/cloud-jake/recai-moviedb.git

This will create a new folder with the quickstart code inside. CD into the folder.

cd recai-moviedb

Setup variables

Now that you are in the recai-moviedb folder, list the files by running the ls command.

$ ls -n

-rwxr-xr-x 1 1000 1000  632 May 17 01:33 00-project-setup.sh
-rwxr-xr-x 1 1000 1000 1024 May 17 01:33 01-prepare-dataset.sh
-rwxr-xr-x 1 1000 1000 3396 May 17 01:33 02-create-views.sh
-rwxr-xr-x 1 1000 1000 1348 May 17 01:33 98-import-retail-data.sh
-rwxr-xr-x 1 1000 1000  878 May 17 01:33 99-create-bq-tables.sh
-rw-r--r-- 1 1000 1000  982 May 17 01:33 README.md
drwxr-xr-x 2 1000 1000 4096 May 17 01:33 schema
-rw-r--r-- 1 1000 1000  184 May 17 01:33 variables.inc

Notice a file called variables.inc. We’ll edit the file to update the following variables that will be needed by the quickstart scripts:

# Name of project to create
PROJECT=

# Billing account ID to attach to project
BILLING_ACCOUNT=

# Location and Region
LOCATION=US 
REGION=us-central1

PROJECT — the name that you will give to your project. Make sure that this is globally unique and adheres to the Google Cloud project naming conventions.

BILLING_ACCOUNT — an existing billing account ID in the format of 012345–678910-ABCDEF

Optionally, update the LOCATION and REGION to match your locale. These parameters are used for Bigquery and Cloud Storage bucket locations.

Run the quickstart scripts

Now that you’ve set the variables, you are ready to run the quickstart scripts. Run the scripts in order, one by one, and take note of any error messages or output. Only the following scripts are required:

00-project-setup.sh
01-prepare-dataset.sh
02-create-views.sh

The remaining scripts provide sample code for scheduling the data import jobs (98-import-retail-data.sh) and creating the template Bigquery tables using the Retail User Event schema (99-create-bq-tables.sh).

Congrats! You have just created a basic Discovery AI foundation. Now it’s time to head over to the Cloud Console to load data and test drive Search and Recommendations AI!!

Access the Discovery AI Cloud Console

To access the Discovery AI Cloud Console, open your web browser to:
https://console.cloud.google.com/ai/retail

The first time you access the console, you’ll need to activate the Retail API and click through the Data Use Terms.

Turn on the Retail API

Screenshot of the Retail API page in the Google Cloud Console. Instructions for enabling the Retail API

Accept the data use terms.

Screenshot of Retail API in the Google Cloud Console showing the Data Use Terms.

Turn on Retail Seaarch.

Screenshot of Retail API in the Google Cloud Console prompting to enable Retail Search

You should receive a confirmation that the Retail API has been enabled and the Recommendations AI and Retail Search components are both on.

Screenshot of Retail API in the Google Cloud Console showing the welcome screen with all services enabled

Load Product Catalog and User Event data

The quickstart scripts that we already ran populated a number of Bigquery tables with the same movie data as well as created 5views following the Retail Schema format in the movielens dataset:

products — full list of movies in the format of a product catalog
user_events_homepageview — user ratings ≥ 0 to simulate a customer accessing the site homepage and firing the home-page-view tag
user_events_detailpageview — user ratings ≥ 4.0 to simulate a customer accessing a product detail page and firing the detail-page-view tag
user_events_addtocart — user ratings ≥ 4.5 to simulate a customer adding an item to their cart and firing the add-to-cart tag
user_events_purchasecomplete — user ratings ≥ 5.0 to simulate a customer completing a purchase and firing the purchase-complete tag

Reference: User Event Types for Discovery AI https://cloud.google.com/retail/docs/user-events#types

We’ll need to complete the data load process 5 times — once for each of the Bigquery views listed above.

To load the Product Catalog, start by clicking Data from the Retail menu.

Click Import at the top of the screen to open the import dialogue.

In the Import Data dialogue, select the following:

Import type = Product Catalog
Source of data = BigQuery
Import branch = Branch 0
Schema of data = Retail Product Catalogs Schema
Big Query table = select the products table from the movielens dataset

Product Data import dialogue box showing the options for importing product catalog data

Click Import to kickoff the import process. You should get a black pop-up box with a confirmation and a code snippet at the bottom of the screen that you can use to automate future imports (we can ignore that for now). You can safely close the Import Data dialogue box by clicking out of the box or clicking cancel.

To check on the status of the product catalog import, click the Activity Status menu at the top right of the screen.

We should see the “import-products” job in process.

We also need to load the 4 different event types from the user_event_* views that we created earlier. We’ll repeat the same process that we used to import catalog data, but instead of selecting Import type = Product Catalog, we’ll now select Import type = User Events. For each of the 4user_event tags, we’ll perform a separate import, selecting the appropriate Bigquery view for each.

Click Import at the top of the screen to open the import dialogue.

In the Import Data dialogue, select the following:

Import type = User Events
Source of data = BigQuery
Schema of data = Retail User Events Schema
Big Query table = select one of the following tables (views) from the movielens dataset for each user_event type (repeat the process for each view):

— user_events_homepageview
— user_events_detailpageview
— user_events_addtocart
— user_events_purchasecomplete

After you’ve completed the import for each of the 4 user_event types, check your import status by again clicking the Activity Status link at the top of the page. Note that you’ll need to click the User Events tab. Since we are importing millions of user events, it may take a few minutes for the imports to complete.

Explore Data Quality

Once you’ve completed importing data, it’s a good idea to review any warnings about data quality. Since this is sample data using only the minimal required fields, we should expect to see some warnings — especially related to the data catalog. For this demo, we can safely ignore warnings about missing descriptions and searchable attributes (we omitted those fields in our sample data import). It may take some time after your initial import for the data quality results to populate.

One of the most important warnings that you’ll want to reconcile before moving forward is any unjoined events in the Events data. Since we constructed the data imports from a single dataset, we already ensured that each user_event mapped back to a product (movie) in the product catalog. In real-world scenarios, you’ll want to design your data imports to ensure consistency between the product catalog and user_events — and have monitoring in place to address inconsistencies as they arise.

Once you’ve loaded catalog and user_event data and verified data quality, you are ready to test-drive Search and Recommendations AI!

Test Drive Discovery AI features

Search

Everyone is familiar with search and has come to expect site-search to perform as well as Google. That’s rarely been the case, but now Discovery AI lets you use the same technology as Google on your website and/or app. To start evaluating the search capabilities of Discovery AI, head on over to the Evaluate link in the Retail menu.

Click on the Search tab at the top of the screen and enter a query in the search box. Note that we can also evaluate personlized results for a particular website visitor based on the Visitor ID or User ID that we capture in our user_event data. In a real-world scenario, this data would likely come from GA4 (Google Analytics) and/or GTM (Google Tag Manager).

You’ll see in the results section below that a number of Toy Story movies are returned and ranked at the top of the list. There are also a number of related results that do not include the keyword “Toy Story” such as “Buzz Lightyear of Star Command: The Adventure Begins (2000)” and “Pixar Story, The (2007)” — which are ranked higher than a number of other less-related matches.

Thinking about the main character in Toy Story, the cowboy named “Woody”, let’s try a search with just the character’s name:

Note that the results include 2 prominent “Woody” characters — Woody Allen and Woody Woodpecker, but the next 2 most relevant results are the top 2 Toy Story movies. Recall from the data quality section that we omitted both description and searchable attributes in our product catalog import. None of the data that we loaded includes the name of the characters or any keywords for “Woody”, but Google’s understanding of intent identifies Toy Story as relevant results for the search term of “Woody”. Pretty cool, huh?

Discovery AI takes care of misspellings and synonyms automatically as well — one of the top pain points and areas where manual efforts to maintain and get ahead of search issues can be eliminated! For example, try your worst at misspelling a movie name — “yiy stry”….. (I can’t believe that actually worked…)

Beyond the the default Search serving config, you can explore adding controls to the Search serving config to further customize the out-of-the-box results. While this is a powerful feature to further customize Discovery AI’s search capabilities to include domain or business specific configurations, oftentimes over 90% of manual rules and configurations can be eliminated with Discovery AI.

Recommendations AI

The initial configuration for Recommendations AI includes a simple model called recently viewed based on the past user_event history for a particular Visitor ID. TBH, it’s not really a model but rather a list of past detail-page-view events. To test out this “model”, navigate back to the Evaluate link in the Retail menu. In the Recommendations tab, enter the following Visitor ID: 210

We can see that this particular customer recently viewed 3 movies. This will be a good example when we build a recommendations model. You can find other Visitor IDs in the ratings table in Bigquery. Here is the query that I used to identify customers with a detail-page-view event for American Pie (submitted a rating of ≥ 4.0). Try some variations to get customers with a purchase-complete event (rating ≥ 5.0).

-- Get users who have rated American Pie (2706) 4.0 or greater
-- These users have the detail-page-view event for the movie
WITH AmericanPie40 AS (
SELECT userid
FROM `movielens.ratings` 
WHERE movieId = 2706
and rating >= 4.0
)

SELECT A.userid, count(DISTINCT movieid) as countmov
FROM `movielens.ratings` M, AmericanPie40 A
WHERE A.userid = M.userid  
group by userId
order by countmov ASC

Build your first Recommendations AI model

Billing Alert: Building and Training models will result in accelerated billing consumption. Refer to the Pricing page for Discovery AI for details.

Up until now, loading data and performing evaluation queries has accumulated little to no billing activity. Building, training, and tuning Recommendations AI models will start to consume billing resources. Be sure that you have created billing alerts and monitor costs accrued from using Recommendations AI.

With that said, Discovery AI is still a very cost-effective solution!

Creating RecAI Models

To create our first RecAI, click the Models link in the Retail menu.

Next click Create Model at the top of the screen. Notice the default recently viewed model that we already queried.

Let’s first explore the different model types and the data requirements for each model. Pay close attention to the “Data requirements met?” section. In this case the Recommended for you model requires 5 different data metrics to be satisfied. In our case, all are green and meet the requirements.

Also notice that changing the Business Objective also changes the data requirements for the model. Change the business objective from Click-through rate (CTR) to Revenue per session. Notice that there are now 8 data metrics that we need to satisfy.

To continue creating the Recommended for you model, give the model a name and select a Tuning preference frequency and Filtering by attribute values, then click Create. Your. model will take 2–5 days to complete training and be ready for querying.

Training a model will incur costs, which may be significant if not monitored.

In order to query your model, you’ll need to create a Serving Config. From the Retail menu, select Service Configs.

Click Create Serving Config to open the dialogue. Select Recommendation. Give your serving config a name and click Continue.

Choose the model that we just created: rfy

Select your Price reranking and Result diversication rules. You can leave the defaults.

Finally, click Create.

You’ll need to wait until your model has completed training to query and evaluate it. You can create up to 20 different variations of the models that are currently available, optimized for different business objectives.

Summary

In this article you learned how to stand up a Discovery AI environment in Google Cloud, load sample data in the Retail Schema format, and evaluate search and recommendations queries. If you found this useful, please comment below and check out the Google Cloud Discovery AI solutions:
https://cloud.google.com/solutions/retail-product-discovery

Attribution

This article is based on the tutorial Create personalized movie recommendations here:
https://cloud.google.com/retail/docs/movie-rec-tutorial