Setting machine Learning Pipeline on Google Cloud Platform

by SkillAiNest

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud PlatformPhoto by Editor | Chat GPT

. Introduction

Machine learning has become an integral part of many companies, and businesses that do not use its use are at risk of being left behind. It is natural that many companies want to integrate them into their system.

There are many ways to set up a machine learning pipeline system to help business, and one option is to host it with a cloud provider. The cloud has many advantages of developing and deploying machine learning models, including scalebuability, cost efficiency, and easy process than the entire pipeline construction at home.

The choice of cloud provider depends on the business, but in this article, we will discover how to set the machine learning pipeline on the Google Cloud Platform (GCP).

Let’s start.

. Preparation

You must have a Google account before you move, as we will use GCP. Once you create an account, get access to this Google Cloud Console.

Once in the console, create a new project.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

Then, before anything, you need to set up your billing order. From the GCP platform you need to register your payment information before you work mostly despite the free test account. You don’t have to worry, though, for example we will use most of your credit.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

Please add all the billing information needed to start the project. You may also need your tax information and credit card to ensure that they are ready.

Along with everything, let’s start making your machine learning pipeline with GCP.

. Machine Learning Pipeline with Google Cloud Platform

We will need an example of a dataset to build our machine learning pipeline. We will use Heart attack predicted Details from Kagal for this tutorial. Download the data and store it somewhere.

Next, we have to set up a data storage for our dataset, which the machine learning pipeline will use. To do this, we have to make a storage bucket for our datastas. Find ‘cloud storage’ to make a bucket. It should have a unique global name. For now, you don’t have to change any default settings. Just click the Creation button.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

Once you become a bucket, upload your CSV file on it. If you have done this properly, you will see a dataset inside the bucket.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

Next, we will create a new table that can make us inquire using the Big Core service. Search for ‘Bigkie’ and click ‘Add data’. Select ‘Google Cloud Storage’ and select the CSV file we have created from this bucket.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

Fill the information, especially the project floor, the Datasta Form (create a new dataset or choose one), and the table name. Select the scheme’s ‘Auto Detict’ and then create a table.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

If you have successfully created it, you can inquire from the table to see if you can access the datastas.

Next, look for Vertex AI and enable all the recommended APIS. After its expiry, select the ‘Kolab Enterprise’.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

We select ‘Create’ notebook ‘to create a notebook used for our easy machine learning pipeline.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

If you are familiar with Google Kolab, the interface will look very similar. You can import a notebook from an external source if you want.

With a notebook ready, contact the run time. For now, the default machine type will be enough because we do not need many resources.

Let’s start your machine learning pipeline development by asking data from our large table. First of all, we need to start the Big Curi client with the following code.

from google.cloud import bigquery

client = bigquery.Client()

Then, let’s ask your dataset in the Big Corey table using the following code. Change the project ID, Dataset, and table name to match what has been created before.

# TODO: Replace with your project ID, dataset, and table name
query = """
SELECT *
FROM `your-project-id.your_dataset.heart_attack`
LIMIT 1000
"""
query_job = client.query(query)

df = query_job.to_dataframe()

The data is now in our notebook Pandas data frame. Let’s turn your target variable (‘results’ into a numerical label.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

df('Outcome') = df('Outcome').apply(lambda x: 1 if x == 'Heart Attack' else 0)

Next, let’s prepare your training and test datases.

df = df.select_dtypes('number')

X = df.drop('Outcome', axis=1)
y = df('Outcome')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

âš  âš  Note: df = df.select_dtypes('number') All non -numerical columns are used to simplify the example. In a real -world scenario, it is an aggressive move that can destroy useful class properties. It is done here for simplicity, and is usually considered to be the feature of engineering or encoding.

Once the data is ready, let’s train a model and evaluate its performance.

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"Model Accuracy: {accuracy_score(y_test, y_pred)}")

The model accuracy is just around 0.5. This can definitely be improved, but for example, we will move forward with this simple model.

Now, use your model to make predictions and produce results.

result_df = X_test.copy()
result_df('actual') = y_test.values
result_df('predicted') = y_pred
result_df.reset_index(inplace=True)

Finally, we will save our model predictions on a new large table. Note that the following code will overturn the destination destination table if it already exists, instead of saying it.

# TODO: Replace with your project ID and destination dataset/table
destination_table = "your-project-id.your_dataset.heart_attack_predictions"
job_config = bigquery.LoadJobConfig(write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE)
load_job = client.load_table_from_dataframe(result_df, destination_table, job_config=job_config)
load_job.result()

Along with this, you have created a simple machine learning pipeline inside the vertex AI notebook.

To smooth this process, you can schedule a notebook to operate automatically. Go to your notebook actions and select ‘Schedule’.

Setting machine Learning Pipeline on Google Cloud PlatformSetting machine Learning Pipeline on Google Cloud Platform

Choose the frequency you needed to run the notebook, for example, every Tuesday or the first day of the month. This is an easy way to ensure that the machine can run the learning pipeline as needed.

This is the same for setting up a simple machine learning pipeline on GCP. There are many and ready -made ways of setting the pipeline, such as the use of coboflo pipelines (KFP) or more integrated vertex AI pipelines service.

. Conclusion

Google Cloud Platform provides users with an easy way to set up machine learning pipeline. In this article, we learned how to set up a pipeline using various cloud services such as Cloud Storage, Big Cory, and Vertex AI. By schedule a pipeline in the notebook form and run it automatically, we can create a simple, functional pipeline.

I hope it has helped!

Cornelius Yodha Vijaya Data Science is Assistant Manager and Data Writer. Elijan, working in Indonesia for a full time, likes to share indicators of data and data through social media and written media. Cornelius writes on various types of AI and machine learning titles.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro