For data science (free 7 day mini course)

by SkillAiNest

For data science (free 7 day mini course)For data science (free 7 day mini course)
Photo by Editor | Chat GPT

. Introduction

Welcome For data science, a free 7 day mini course For early people! If you are starting from data science or want to learn basic skills, this is the initial friendly course for you. In the next seven days, you will only learn how to work on the data task using the core.

How will you learn:

  • Work with the basic Uzar data structure
  • Clear and develop messy text data
  • Summary and group data with dictionaries (exactly as you do in SQL or Excel)
  • Write reusable functions that keep your code clean and efficient
  • Handle the mistakes beautifully so that your scripts don’t crash on dirty input data
  • And finally, you will create an easy data profiling tool to inspect any CSV dataset

Let’s start!

🔗 🔗 Link from the code on the Gut Hub

. First Day: Variables, Data Types, and File I/O

In data science, everything starts with raw data: survey response, logs, spreadsheets, forms, scrapped websites, etc. Before you can make or analyze anything, you need:

  • Load the data
  • Understand its shape and types
  • Begin to clean or start inspecting it

Today, you will learn:

  • Types of basic Uzar data
  • How to read and write raw .txt files

!! 1. Various

In the meantime, the variable is a nominal reference. In terms of data, you can think of them as fields, columns, or metadata.

filename = "responses.txt"
survey_name = "Q3 Customer Feedback"
max_entries = 100

!! 2. Data Types You will often use

Don’t worry about the unclear types yet. You will use most of the following:

Type of azagarWhat is used for thisExample
strapNames of raw text, column“Age”, “unknown”
IntimCounting, variable variants42, 0, -3
FloatContinuously variable3.14, 0.0, -100.5
BiceThe flag / binary resultsTrue, wrong
NoneLost/banned valuesNone

When you are dealing with everyone – and how to check or replace them – it is to know that the steps to clean the data are zero.

!! 3. File input: Reading raw data

Most real -world data is .txt, .csv, or .log files. You often need to load them line byline, not together (especially if big).

We say you have a file responses.txt:

Here’s how do you read it:

with open("responses.txt", "r") as file:
    lines = file.readlines()

for i, line in enumerate(lines):
    cleaned = line.strip()  # removes \n and spaces
    print(f"{i + 1}: {cleaned}")

Output:

1: Yes
2: No
3: Yes
4: Maybe
5: No

!! 4. File output: Write processed data

We say you just want to save “yes” responses on a new file:

with open("responses.txt", "r") as infile:
    lines = infile.readlines()

yes_responses = ()

for line in lines:
    if line.strip().lower() == "yes":
        yes_responses.append(line.strip())

with open("yes_only.txt", "w") as outfile:
    for item in yes_responses:
        outfile.write(item + "\n")

This filter is a very simple version of the transform-cyv-pipeline, a concept that is used daily in data pre-processing.

!! ⏭ Exercise: Write your first data script

Create a file that says survey.txt And copy to the following lines:

Now write a tiger script that:

  1. File reads
  2. Counts how often “yes” appears (case unnoticed) You will learn to work with the wires later in the text. But let him go!
  3. Counting prints
  4. The clean version of the data writes (Capitalized, no White Space) cleaned_survey.txt

. Day 2: Basic Data structures

Data science is about to manage and create data so it can be made clean, analyzed or model. Today you will learn four essential data structures in Corezagar and how to use them for actual data tasks:

  • List: For a row series
  • Tupile: For Fixed Position Record
  • DOCT: For labeled data (such as column)
  • Set: To track unique values

!! 1. List: For the Data Queue series

Lists are the most flexible and ordinary structure, which are suitable for representation:

  • A column of values
  • A combination of records
  • Unknown size dataset

For example: Read the values ​​from a file in the list.

with open("scores.txt", "r") as file:
    scores = (float(line.strip()) for line in file)

print(scores)

These prints:

Now you can:

average = sum(scores) / len(scores)
print(f"Average score: {average:.2f}")

Output:

!! 2. Topal: For the Fixed Structure Records

Tapes are like lists, but unacceptable and excellent use for unknown structure rows, such as, (name, age).

Example: Read the names and ages file.
Suppose we have the following people.txt:

Alice, 34
Bob, 29
Eve, 41

Now read in the contents of the file:

with open("people.txt", "r") as file:
    records = ()
    for line in file:
        name, age = line.strip().split(",")
        records.append((name.strip(), int(age.strip())))

Now you can access fields by position:

for person in records:
    name, age = person
    if age > 30:
        print(f"{name} is over 30.")

!! 3. DOCT: For labeled data (such as column)

Dictionaries store key value couples, the nearest thing in the basic age that has a table row with designated columns.

For example: Change each person’s records in a duct:

people = ()

with open("people.txt", "r") as file:
    for line in file:
        name, age = line.strip().split(",")
        person = {
            "name": name.strip(),
            "age": int(age.strip())
        }
        people.append(person)

Now your data is too much readable and flexible:

for person in people:
    if person("age") < 60:
        print(f"{person('name')} is perhaps a working professional.")

!! 4. Set: for individuality and fast membership checks

Set automatically remove copies. So the sets are very good:

  • Counting of unique varieties
  • Checking whether a price has been seen before
  • Detect separate values ​​without order

Example: Find all unique domains from the emails file.

domains = set()

with open("emails.txt", "r") as file:
    for line in file:
        email = line.strip().lower()
        if "@" in email:
            domain = email.split("@")(1)
            domains.add(domain)

print(domains) 

Output:

{'gmail.com', 'yahoo.com', 'example.org'}

!! ⏭ Exercise: A mini -data inspector code

Create a file that says dataset.txt With the following content:

Now write a tiger script that:

  1. Reads every line and stores it as a dictionary with keys: Name, age, character
  2. Counts how many people are in each character (use the dictionary) and the number of unique ages (use a set)

. Day 3: Working with strings

Most of the real -world datases are available everywhere.

Today, you will learn:

  • Clear and standardize raw text
  • Extract information from stars
  • Make easy text -based features (the way you can use for filtering or modeling)

!! 1. The basic wire cleaning

We say you get this raw list of job titles from CSV:

titles = (
    "  Data Scientist\n",
    "data scientist",
    "Senior Data Scientist ",
    "DATA scientist",
    "Data engineer",
    "Data Scientist"
)

Your work? Make it normal.

cleaned = (title.strip().lower() for title in titles)

Now everything is small and white space free.

Output:

('data scientist', 'data scientist', 'senior data scientist', 'data scientist', 'data engineer', 'data scientist')

!! 2. Standard values

We say you are only interested in identifying data scientists.

standardized = ()

for title in cleaned:
    if "data scientist" in title:
        standardized.append("data scientist")
    else:
        standardized.append(title)

!! 3. Counting words, checking samples

Useful text features:

  • The number of words
  • Whether a wire is a keyword
  • Whether String is a number or email

Example:

text = " The price is $5,000!  "

# Clean up
clean = text.strip().lower().replace("$", "").replace(",", "").replace("!", "")
print(clean)  

# Word count
word_count = len(clean.split())

# Contains digit
has_number = any(char.isdigit() for char in clean)

print(word_count)
print(has_number)

Output:

"the price is 5000"
4
True

!! 4. Divide the parts and extract

Let’s take the example of email:

email = "  Alice.Johnson@Example.com  "
email = email.strip().lower()

username, domain = email.split("@")

print(f"User: {username}, Domain: {domain}")

These prints:

User: alice.johnson, Domain: example.com

This type of extraction is used in the user’s behavior analysis, spam detection and so on.

!! 5. Detecting specimens specified text

You do not need to express regular pattern checks.

For example: Check whether someone has mentioned “Azigar” in response to the free text:

comment = "I'm learning Python and SQL for data jobs."

if "python" in comment.lower():
    print("Mentioned Python")

!! ⏭ Exercise: Clean survey comments

Create a file that says comments.txt With the following lines:

Great course! Loved the pacing.
Not enough Python examples.
Too basic for experienced users.
python is exactly what I needed!
Would like more SQL content.
Excellent – very beginner-friendly.

Now write a tiger script that:

  1. Clears each comment (strip, lower case, remove the punctuation)
  2. Prints the total number of comments, how many “Azgar” mention, and the average word count per comment

. Day 4: Summary with group, counting, and dictionary

You have used the duct to store the label recorded record. Today, you will go to a level deep level: the use of the dictionary to summarize groups, counting, and data – such as the axis table or group in the SQL.

!! 1. Grouping by a field

We say you have this data.

data = (
    {"name": "Alice", "city": "London"},
    {"name": "Bob", "city": "Paris"},
    {"name": "Eve", "city": "London"},
    {"name": "John", "city": "New York"},
    {"name": "Dana", "city": "Paris"},
)

Purpose: How many people are in every city.

city_counts = {}

for person in data:
    city = person("city")
    if city not in city_counts:
        city_counts(city) = 1
    else:
        city_counts(city) += 1

print(city_counts)

Output:

{'London': 2, 'Paris': 2, 'New York': 1}

!! 2. A field summary in terms of category

Now we have to say:

salaries = (
    {"role": "Engineer", "salary": 75000},
    {"role": "Analyst", "salary": 62000},
    {"role": "Engineer", "salary": 80000},
    {"role": "Manager", "salary": 95000},
    {"role": "Analyst", "salary": 64000},
)

Purpose: Calculate tomorrow and average salary.

totals = {}
counts = {}

for person in salaries:
    role = person("role")
    salary = person("salary")
    
    totals(role) = totals.get(role, 0) + salary
    counts(role) = counts.get(role, 0) + 1

averages = {role: totals(role) / counts(role) for role in totals}

print(averages)

Output:

{'Engineer': 77500.0, 'Analyst': 63000.0, 'Manager': 95000.0}

!! 3. Frequency table (detection of mode)

Find the most common age in Datasit:

ages = (29, 34, 29, 41, 34, 29)

freq = {}

for age in ages:
    freq(age) = freq.get(age, 0) + 1

most_common = max(freq.items(), key=lambda x: x(1))

print(f"Most common age: {most_common(0)} (appears {most_common(1)} times)")

Output:

Most common age: 29 (appears 3 times)

!! ⏭ Exercise: Analyze the employee Dataset

Create a file employees.txt With the following content:

Alice,London,Engineer,75000
Bob,Paris,Analyst,62000
Eve,London,Engineer,80000
John,New York,Manager,95000
Dana,Paris,Analyst,64000

Write a Uzar script that:

  1. Loads data into a dictionary list
  2. Prints the number of employees per city and per character prints average salary

. 5 days: written functions

You have written the code that loads, clears, filters, and summarizes the data. Now you will pack this logic into functions, so you can do:

  • Reuse your code
  • Construction of processing pipelines
  • Keep the script worth reading and checking

!! 1. Cleaning the inputs of the text

Let’s write a function to clean the basic text:

def clean_text(text):
    return text.strip().lower().replace(",", "").replace("$", "")

Now you can apply it to every field reading from the file.

!! 2. Making a row record

Next, here is a simple function to analyze and record each row in a file:

def parse_row(line):
    parts = line.strip().split(",")
    return {
        "name": parts(0),
        "city": parts(1),
        "role": parts(2),
        "salary": int(parts(3))
    }

Now your file becomes loading:

with open("employees.txt") as file:
    rows = (parse_row(line) for line in file)

!! 3. The collector

So far, you have counted on average and events. Let’s write some basic helper functions for this:

def average(values):
    return sum(values) / len(values) if values else 0

def count_by_key(data, key):
    counts = {}
    for item in data:
        k = item(key)
        counts(k) = counts.get(k, 0) + 1
    return counts

!! ⏭ Exercise: Modify the previous work

Reactor Total resolution in reusable functions:

  • load_data(filename)
  • average_salary_by_role(data)
  • count_by_city(data)

Then use them in the script that prints out just like 4 days.

. Day 6: Reading, writing, and handling the fundamental error

Data files are often incomplete, bad and incorrectly shaped. So how would you treat them?

Today you will learn:

  • How to read and write structural files
  • How to handle the mistakes beautifully
  • How to drop or log in without accidental rows

!! 1. Reading secure file

What happens when you try to read a file that does not exist? If the file is not available, you should open the file and catch the “Filenot Founder error”.

try:
    with open("employees.txt") as file:
        lines = file.readlines()
except FileNotFoundError:
    print("Error: File not found.")
    lines = ()

!! 2. Handle the bad rows beautifully

Now let’s try to leave a bad row and just take action on the full rows.

records = ()

for line in lines:
    try:
        parts = line.strip().split(",")
        if len(parts) != 4:
            raise ValueError("Incorrect number of fields")
        record = {
            "name": parts(0),
            "city": parts(1),
            "role": parts(2),
            "salary": int(parts(3))
        }
        records.append(record)
    except Exception as e:
        print(f"Skipping bad line: {line.strip()} ({e})")

!! 3. Write clean data in a file

Finally, let’s write clear data in a file.

with open("cleaned_employees.txt", "w") as out:
    for r in records:
        out.write(f"{r('name')},{r('city')},{r('role')},{r('salary')}\n")

!! ⏭ Exercise: Make error tolerant loader

Create a file raw_EMPLYEEES.TXT with some incomplete or dirty lines: such as: such as:

Alice,London,Engineer,75000
Bob,Paris,Analyst
Eve,London,Engineer,eighty thousand
John,New York,Manager,95000

Write a script that:

  1. The load only correct records
  2. Print the number of valid rows
  3. Writes them validated_employees.txt

. Day 7: Create a mini -data profile (Project Day)

It has a great job of making it so far. Today, you will create a standstone who is the script that:

  • Loads a CSV file
  • Detects the names and types of the column
  • Counts useful stats
  • Writes a summary report

!! Step by step

1. Load file:

def load_csv(filename):
    with open(filename) as f:
        lines = (line.strip() for line in f if line.strip())
    header = lines(0).split(",")
    rows = (line.split(",") for line in lines(1:))
    return header, rows

2. Find the types of columns:

def detect_type(value):
    try:
        float(value)
        return "numeric"
    except:
        return "text"

3. Each column’s profile:

def profile_columns(header, rows):
    summary = {}
    for i, col in enumerate(header):
        values = (row(i).strip() for row in rows if len(row) == len(header))
        col_type = detect_type(values(0))
        unique = set(values)
        summary(col) = {
            "type": col_type,
            "unique_count": len(unique),
            "most_common": max(set(values), key=values.count)
 }
 if col_type == "numeric":
 nums = (float(v) for v in values if v.replace('.', '', 1).isdigit())
 summary(col)("average") = sum(nums) / len(nums) if nums else 0
 return summary

4. Make a summary:

def write_summary(summary, out_file):
    with open(out_file, "w") as f:
        for col, stats in summary.items():
            f.write(f"Column: {col}\n")
            for k, v in stats.items():
                f.write(f"  {k}: {v}\n")
            f.write("\n")

You can use such functions:

header, rows = load_csv("employees.csv")
summary = profile_columns(header, rows)
write_summary(summary, "profile_report.txt")

!! ⏭ Last Workout

Use your CSV file (or reuse the first). Run the profile and check out.

. Conclusion

Congratulations! You have completed the data science mini course. 🎉 🎉

During this week, you have moved from the basic data structure to writing modular functions and scripts that handle real data issues. These are the basics, and that’s what I mean, really basic things. I suggest you use it as a starting point and find out more about the standard library (of course).

Thank you for learning with me. Happy coding and data crying!

Pray Ca Is a developer and technical author from India. She likes to work at the intersection of mathematics, programming, data science, and content creation. The fields of interest and expertise include dupas, data science, and natural language processing. She enjoys reading, writing, coding and coffee! Currently, they are working with the developer community to learn and share their knowledge with the developer community by writing a lesson, how to guide, feed and more. The above resources review and coding also engages lessons.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro