Machine Learning on Google Cloud using AutoML Vision

Train and evaluate a classification model on the MNIST images using AutoML Vision

Hrishi Shirodkar
8 min readMay 21, 2021
Photo by Christopher Burns on Unsplash

This is the second story in a 3-part series to perform Machine Learning on Google Cloud. In this story, we will focus on training a classification model using AutoML Vision on the MNIST Dataset. In order to get started, I would encourage you to review the first story to ensure you meet the pre-requisities and to know more about the MNIST Dataset. The first story can be found here.

AutoML Vision enables you to train machine learning models to classify your images according to your own defined labels.

1. Train models from labeled images and evaluate their performance.

2. Leverage a human labeling service for datasets with unlabeled images.

3. Register trained models for serving through the AutoML API.

More information on AutoML Vision can be found here.

What is covered in this story?

In this story, we will train a classification model on a labeled dataset of MNIST images and then evaluate its performance. We will then register this trained model using AutoML API and serve predictions from it.

Step 1: Download the MNIST train and test images from GitHub.

Go ahead and download the AutoML_Training and AutoML_Test directories from the git repository on your local machine.

Step 2: Create a cloud storage bucket and upload the MNIST images.

Log on the GCP console and confirm that you have a $300 trial credit from the Billings page.

From the Project Dashboard, copy the Project ID. We will use this Project ID to setup a globally unique cloud storage bucket to upload the MNIST datasets.

From the navigation menu, select Cloud Storage and click on Browser.

Click on Create Bucket.

Use the <ProjectID>-automl as the name of your bucket and the region closest to you. You may choose default values for storage class, access control and advanced settings. Click on CREATE.

Once the CS bucket is created, click on Upload Folder to upload the AutoML_Training folder from your local machine into the storage bucket.

Once the folder is uploaded, the CS bucket should look as follows. There is no need to upload the AutoML_Test folder. We will use it directly from your local machine to generate predictions later in this tutorial.

Step 3: Data Preparation.

In order to train a custom model with AutoML Vision, you will need to supply labeled examples of the kinds of images (inputs) you would like to classify, and the categories or labels (the answer) you want the ML systems to predict.

The data.csv file inside the AutoML_Training folder provides the cloud storage location of the training images and their associated label. We only need to replace gs://placeholder on each record in this file to use the full path of your cloud storage bucket where we uploaded the training MNIST images.

Activate the cloud shell from the top right corner of the console.

Once you click on activate cloud shell, a terminal should open up on the same page at the bottom of the screen. You may choose to open the terminal in a new window using the icon in the bottom right corner.

We will execute the below commands in the cloud shell terminal. Before executing the first command, simply replace the <Project-ID> with your Project ID from the GCP dashboard.

These commands will accomplish the following:

  • The data.csv file is first copied locally to the shell storage from cloud storage.
  • Each record in the data.csv is updated to replace gs://placeholder with your Project-ID.
  • The updated data.csv file is then copied back to cloud storage.

Step 4: Enable AutoML Vision API.

Look up AutoML Vision in the products and resources search bar. Click on Vision.

From the dashboard, click on Get started under Image Classification.

Enable the AutoML API.

Step 5: Upload MNIST images to AutoML Dataset from Cloud Storage.

Click on new dataset.

Use mnist as the dataset name and select single-label classification as the model objective. Click on create dataset.

Choose the option to select a CSV file on Cloud Storage and then browse to select the data.csv file in the AutoML_Training folder of your cloud storage.

Click on continue.

Once continue is pressed, AutoML will start importing the images from the AutoML_Training folder in your cloud storage. Click on Images to see the imported images.

It will take a few minutes for the import to finish. Once imported, you should be able to see the images. Explore the images for each label by choosing a label from the left menu.

Now click on Label Stats.

We imported 60 images for every label. AutoML will automatically split the instances into Train, Validation and Test sets.

Step 6: Train the Model.

Click on the Train tab and then Start Training.

Go ahead and use the default name for the model and select Cloud hosted so that we can serve predictions later in real-time. Click on continue.

Set the node hour budget to 8 node hours and check the box to deploy the model to 1 node after training. Click on start training.

Training will take a ~1–2 hours. Once the training is complete, you should receive an email notification. You may then log back in to continue with the rest of this tutorial.

Once the model is trained, you can click on the evaluate tab to review the metrics. At a confidence threshold of 0.5, the precision and recall of our classifier stands at 91.67%. You may experiment for different values of the confidence threshold by dragging the slider either to left or right and see how it affects the precision and recall metrics.

Based on the confusion matrix, we can observe that the model performs really well on all numbers except 2, 4 and 5. If you would like to know more about confusion matrix, please refer to the first story.

Precision: It is the accuracy of positive predictions (TP/(TP+FP))

Recall: It is the sensitivity or detection rate of our classifier (TP/(TP+FN))

Step 7: Time to make some predictions.

Now, click on Test & Use and upload the images from the AutoML_Test folder on your local machine.

Once the images are uploaded, the model will be give you the predictions for every image. I’m including samples for 0 and 5. As you can see, the model was able to predict the digit 0 with a very high confidence of almost 1 while digit 5 has a slightly lower confidence of 0.85.

Step 8: Disable the Cloud AutoML API.

Once your assessment is complete, look up Cloud AutoML API in the products search bar. Click on Cloud AutoML API from the search results.

Click on Manage.

Go ahead and disable the API. If you forget to disable the service, you will continue to get charged for it as long as the service is up and running.

Step 9: Delete Cloud Storage Bucket

Go to the cloud storage console and check the box for the <Project-ID>-automl bucket. Click Delete.

Confirm that the bucket does not exist any more.

Conclusion:

As the name suggests, you can use AutoML Vision to build a ML model for images with no code and ML expertise. However, it comes with its own downsides such as limited options to customize and fine tune the model.

Thank you again for sticking around till the end.

Here are links to the other two stories in this series:

Word of Caution:

Please be careful while using the services on Google Cloud to stay within the $300 trial budget. Do not forget to turn off APIs, deleting any un-used instances or clearing off the cloud storage bucket after use so as to avoid any additional costs billed to your account.

--

--

Hrishi Shirodkar

Passionate about building products using ML & Big Data Technologies to solve real world problems! Have Masters in CS [ML specialization] from Georgia Tech Univ.