

Photo by Author | Canva
According to the Data Science Report by Data Science AnecondaData scientists spend about 60 % of their time cleaning and management. These are usual, time -consuming work that makes them the ideal candidate for the GPT to take over.
In this article, we will find five normal tasks that can handle chat GPT if you use the correct indicators, including clearing the data and organizing them. We will use a real data project from the London Black Taxi app, Gate, like London used in their recruitment process, to show how it works practically.
Case Study: Analyzing the failed ride orders from GATT
I This data projectThe gate asks you to test the key matching matrix and analyze the failed rider order to understand why some users have not been able to get a car successfully.
Here is the details of the data.
![]()
![]()
Now, let’s upload it to Chat GPT and discover it.
In the next five stages, we will go through normal tasks that can handle the Chat GPT data project. The steps below are shown.


Step 1: Data Search and Analysis
In search of data, we use the same functions each time, such as HeadFor, for, for,. InformationOr Describe.
When we ask Chat GPT, we will immediately add key functions. We will also paste the project details and connect the data.


We will use the gesture below. Just change the text within the square bracket with the details of the project. You may find Project Description Here:
Here is the data project description: (paste here )
Perform basic EDA, show head, info, and summary stats, missing values, and correlation heatmap.Here is an output.


As you can see, Chat GPT summarizes the data by highlighting the key columns, lost values, and then creates a connective heat map to discover the relationship.
Step 2: Cleaning data
Both datases have lost values.
![]()
![]()
Let’s write an indication of working on it.
Clean this dataset: identify and handle missing values appropriately (e.g., drop or impute based on context). Provide a summary of the cleaning steps.Here’s a summary of what Chat Jeptt did:


Chat GPT changed the history column, dropped wrong orders, and converted the lost values to M_order_eta.
Step 3: Create concepts
It is important to imagine the right things to take the most of your data. Instead of developing a random plot, we can guide Chat GPT by providing a link to the source, called The collective generation from recovery.
We will use These Article. Indicated. It is:
Before generating visualizations, read this article on choosing the right plots for different data types and distributions: (LINK). hen, show most suitable visualizations for this dataset and explain why each was selected and produce the plots in this chat by running code on the dataset.Here is an output.


We have six different graphs that we have prepared with Chat GPT.


You will see why the relevant graph has been selected, the graph, and description of this graph.
Step 4: Make your dataset prepare for machine learning
Now that we have handled the missing values and detected the data, the next step is to prepare it for the machine learning leaking. This includes such steps The encoding category variables And Scaling numeric properties.
Here is our gesture.
Machine Learning This Datest Format: Encoded category variables, scale numerical features, and return clean data frame ready for modeling. Define every step.
Here is an output.


Now your features have been scaled and encoded, so your datastate machine is ready to apply the learning model.
Step 5: Applying Machine Learning Model
Let’s go ahead Machine Learning Modeling. We will use the following quick structure to apply the basic machine learning model.
Use this dataset to predict (target variable). Apply (model type) and report machine learning diagnostic matrix (accuracy, precision, memory, F1-score). Just use the relevant 5 features and describe your modeling steps.
Let’s update this gesture based on our project.
Use this datastate to predict order_totats_K. Apply a multi-class rating model (eg, random jungle), and report diagnostic matrix such as accuracy, precision, memory, and F1-score. Use only 5 highly relevant features and describe your modeling steps.
Now, paste it in the ongoing conversation and review the output.
Here is an output.


As you can see, the model performed well, maybe very well?
Bonus: Gemini CLI
Gemini has a launch Open Source Agent That you can talk to your terminal. You can install it using this code. (60 model applications per minute and 1,000 applications without charge).
In addition to Chat GPT, you can also use Gemini CLI to handle usual data science tasks, such as cleaning, exploration, and even making these tasks a dashboard to automatically.
The Gemini CLI provides a straight command line interface and is available without any price. Let’s start installing it using the code below.
sudo npm install -g @google/gemini-cliAfter running the above code, open your terminal and paste the following code to start construction with it:
Once you run the above orders, you will see the Gemini CLI as shown in the screenshot below.


Gemini CLI allows you to run the code, ask questions, or create apps directly from your terminal. In this case, we will use Gemini CLI to create a streamlit app that we have automatically make everything ever, EDA, cleaning, imagination and modeling.
A to make a Streamlit The app, we will use a gesture that will include all steps. It is shown below.
Create a streamllate app that automatically make the data cleaning, automatically creates data visualization, produces a dataset for machine learning, and applies the machine learning model after the user selection of target variables.
Step 1 – Basic EDA:
• Display. Head (), .info (), and .describe ()
Show the lost values per column per column
Show the concrete heat map of the Perfational numerical properties
Step 2 – Cleaning Data:
Find columns with lost lost values
DATA Lost Data Consider Handle (drop or defects)
Summary of Creeted Cleaning measures
Step 3 – Auto Visual
Before plotting Plate, use these conceptual principles:
Use Histgrams for Gunical Numeric Distribution
use bar plots for distriblic distribution
Use box plots or violin plots to compare categories
Relationsions use plots scattered for numeric relationships
Use Cravitory Heat Map for Mult Multi -Lionatory
Use line plots for Time Time Series (if applicable)
Prepare highly relevant plots for this dataset
• Tell me why every plot was chosen
Step 4 – Machine Learning Preparation:
• encoded variable
• Scale numeric properties
Return a clean data frame ready for Modul Modeling
Step 5 – Apply machine Learning Model:
Offer a target variable to the USER user.
Apply several machine learning models.
• Report a diagnostic matrix.
Each step should be displayed in a different tab. After its construction, run the Streamllate app.
When you create a directory or running code on your terminal, it will indicate your permission.
![]()
![]()
After some approval measures we did, the streamllate app will be ready, as shown below.
![]()
![]()
Now, let’s examine it.


The final views
In this article, we first used Chat GPT to handle usual tasks, such as data cleaning, exploration, and data vessel. Next, we took a step further using the machine learning and applied machine learning model to prepare our datastas.
Finally, we used Gemini CLI to create a streamlit dashboard that performs all these steps with just one click.
We all to show them, we have used a data Project from Gate. Although AI is not yet fully reliable for everything, you can take advantage of the usual tasks, which can save you a lot of time.
Net Razii A data is in a scientist and product strategy. He is also an affiliated professor of Teaching Analytics, and is the founder of Stratskrich, a platform that helps data scientists prepare for his interview with the real questions of high companies. The net carrier writes on the latest trends in the market, gives interview advice, sharing data science projects, and everything covers SQL.