Star Wars Survey Analysis using Azigar and Pandas

by SkillAiNest

https://www.youtube.com/watch?v=5bgr1ynlsyk

In this project walkthrough, we will find how to clean and analyze the real data of the survey using Star Wars and Pandas, while diving into the fascinating world of star warfare. By working with the results of the Five Treaty Survey, we will expose the viewer’s preferences, film ratings and insights about settlement trends that are clearly ahead.

Survey data analysis is an important skill for any data analyst. Unlike clean, structured datases, the survey response comes with unique challenges: contradictory formatting, mixed data types, checkbox responses that require strategic handling, and lost values that tell their story. This project faces these real -world challenges, and prepares you in your career -to -dirty dirty datases.

During this lesson, we will develop professional quality concepts that tell a compelling story about star warfare, which shows how proper data cleaning and thinking can transform the raw survey data into a stakeholder.

Why does this project make a difference

The survey analysis represents the applicable basic data science skills in industries. Whether you are analyzing customer satisfaction survey, employee engagement data, or market research, the technique shown here is the basis for professional data analysis.

  • Data cleaning skills To handle the dirty, real world datasis
  • Techniques of Bowlin conversion Survey check box answers.
  • The settlement segment analysis To naked the group’s differences
  • Professional Image. Design of the night For stakeholder presentations
  • The synthesis of insights To translate data results in business intelligence

Star War theme makes learning pleasant, but this skill is directly transmitted to business context. Missing these techniques, and you will be ready to remove meaningful insights from any survey dataset that crosses your desk.

By the end of this tutorial, you will know how:

  • Clear Dirt Survey Data by Maping of Yes/No Columns and Changing Checkbox Answers
  • Handle unknown columns and make meaningful columns name for analysis
  • Use Bowen Mapping Techniques to avoid data abuse when Japrot Cells reopen
  • Calculate summary data and rating from the survey’s response
  • Create professional -looking horizontal bar charts with custom styling
  • Create comparative concepts for settlement analysis
  • Apply Object -based metaplatlib for precise control over the appearance of the chart
  • Offer clear, viable insights to stakeholders

Before you start: Pre -instruction

To take the maximum of this project, follow these initial steps:

Review the project

Access the project and familiarize yourself with goals and structures: Star Wars Survey Project

Access the solution notebook

You can see it here and download to see what we will cover: Solution notebook

Prepare your environment

  • If you are using the Data Quest platform, everything has already been configured for you
  • If working locally, make sure you have Pandas, Metaplatlib, and Nimp Install
  • Download Datasit from Fywetti Et Gut Hub Repeatory

Provisions

  • The main things of Azigar and comfortable with Pandas data frames
  • Familiar with dictionaries, loops and methods in azagar
  • Basic understanding of metaplatlib (we will use intermediate techniques)
  • Survey data structure is helpful, but it is not required

New in Mark Dowan? We recommend learning the basics to format the header and adding context to your Japter notebook: Mark Dowan Guide.

To set your environment

Let’s start by importing the necessary libraries and loading your dataset:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

%matplotlib inline Command is a magic that offers our plots directly in the notebook. This interactive data is essential for exploration work flu.

star_wars = pd.read_csv("star_wars.csv")
star_wars.head()

Environment for Star Wars Data ProjectEnvironment for Star Wars Data Project

Our datasate includes more than 1,100 respondents surveying about their star -wise viewing habits and preferences.

Learning Visual: See unknown columns (unknown: 4, unknown: 5, etc.) and extremely tall column name? This survey is a special thing about the survey data exported from a platform like Maneki. The unknown columns actually represent different films in the franchise, and cleaning them will be our first big task.

Data Challenge: Survey structure defined

Survey data offers a unique structure challenge. Consider this common question of the survey:

“Which of the following star warfare films have you seen? Please select all the application.”

This checkbox -style question is exported as multiple columns where:

  • Column 1 contains “Star Wars: Episode I Phantom Menis” if selected, if not, if not
  • Column 2 contains “Star Wars: Clones Episode II attack”, if selected, if not
  • And so for all six movies …

This structure makes the analysis difficult, so we will turn it into a clean bold columns.

The data cleaning process

Step 1: Bowlin to change yes/no reaction

Survey’s answers often come as text (“yes”/”no”) but bolin values (True).).False) Are very easy to work with the program:

yes_no = {"Yes": True, "No": False, True: True, False: False}

for col in (
    "Have you seen any of the 6 films in the Star Wars franchise?",
    "Do you consider yourself to be a fan of the Star Wars film franchise?",
    "Are you familiar with the Expanded Universe?",
    "Do you consider yourself to be a fan of the Star Trek franchise?"
):
    star_wars(col) = star_wars(col).map(yes_no, na_action='ignore')

Learning Visual: Why seemingly useless? True: True, False: False Entries? When it comes to re -operating the cells, it prevents overwriting data. Without these entries, if you accidentally run the cell twice, all your True Will become values NaN Because the mapping dictionary is no longer included True As a key. This is a danger of a common gapter that can quietly eliminate your analysis!

Step 2: Changing Movie viewing data

The toughest section involves changing the data of the checkbox movie. Each unknown column represents whether someone has seen a special event of the star war:

movie_mapping = {
    "Star Wars: Episode I  The Phantom Menace": True,
    np.nan: False,
    "Star Wars: Episode II  Attack of the Clones": True,
    "Star Wars: Episode III  Revenge of the Sith": True,
    "Star Wars: Episode IV  A New Hope": True,
    "Star Wars: Episode V The Empire Strikes Back": True,
    "Star Wars: Episode VI Return of the Jedi": True,
    True: True,
    False: False
}

for col in star_wars.columns(3:9):
    star_wars(col) = star_wars(col).map(movie_mapping)

Step 3: Changing strategic column name

The names of the tall, incredible column make it difficult to analyze. We will rename their name changing things:

star_wars = star_wars.rename(columns={
    "Which of the following Star Wars films have you seen? Please select all that apply.": "seen_1",
    "Unnamed: 4": "seen_2",
    "Unnamed: 5": "seen_3",
    "Unnamed: 6": "seen_4",
    "Unnamed: 7": "seen_5",
    "Unnamed: 8": "seen_6"
})

We will also clear the classification columns:

star_wars = star_wars.rename(columns={
    "Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.": "ranking_ep1",
    "Unnamed: 10": "ranking_ep2",
    "Unnamed: 11": "ranking_ep3",
    "Unnamed: 12": "ranking_ep4",
    "Unnamed: 13": "ranking_ep5",
    "Unnamed: 14": "ranking_ep6"
})

Analysis: naked the data story

Which movie is supreme rule?

Let’s calculate the average rating for each film. Remember, in the classification questions, the low number indicates the higher the priority:

mean_ranking = star_wars(star_wars.columns(9:15)).mean().sort_values()
print(mean_ranking)
ranking_ep5    2.513158
ranking_ep6    3.047847
ranking_ep4    3.272727
ranking_ep1    3.732934
ranking_ep2    4.087321
ranking_ep3    4.341317

The results are decisive: Episode V (Empire Strikesback) emerges as clear fans’ favorite with an average rating of 2.51. The original trinity (IV-VI) significantly improves the Perucel triangle (episodes I-III).

Samples of Movie Viewers

What movies have people actually seen?

total_seen = star_wars(star_wars.columns(3:9)).sum()
print(total_seen)
seen_1    673
seen_2    571
seen_3    550
seen_4    607
seen_5    758
seen_6    738

Episodes guide V and VI in viewers, while especially showing the number of views. Episode III has the lowest audience among the 550 respondents.

Professional Concept: Ready to Stakeholder from Basic

Make our first chart

Let’s start with a basic concept and gradually increase it:

plt.bar(range(6), star_wars(star_wars.columns(3:9)).sum())

It creates a functional chart, but it is not ready for stakeholders. Let’s upgrade to Object -based metaplatlib for precise control:

fig, ax = plt.subplots(figsize=(6,3))
rankings = ax.barh(mean_ranking.index, mean_ranking, color='#fe9b00')

ax.set_facecolor('#fff4d6')
ax.set_title('Average Ranking of Each Movie')

for spine in ('top', 'right', 'bottom', 'left'):
    ax.spines(spine).set_visible(False)

ax.invert_yaxis()
ax.text(2.6, 0.35, '*Lowest rank is the most\n liked', fontstyle='italic')

plt.show()

Star Wars average rating for each movieStar Wars average rating for each movie

Learning Visual: Think about fig As your canvas and ax As a panel or chart area on this canvas. Object -based metaphotleb may seem initially scary, but it provides precise control over every visual element. fig Objects handle the overall data features ax Individual chart controls elements.

Advanced concept: gender comparison

Our most sophisticated scenario compares the classification and viewers by gender using side -by -side bars:

# Create gender-based dataframes
males = star_wars(star_wars("Gender") == "Male")
females = star_wars(star_wars("Gender") == "Female")

# Calculate statistics for each gender
male_ranking_avgs = males(males.columns(9:15)).mean()
female_ranking_avgs = females(females.columns(9:15)).mean()
male_tot_seen = males(males.columns(3:9)).sum()
female_tot_seen = females(females.columns(3:9)).sum()

# Create side-by-side comparison
ind = np.arange(6)
height = 0.35
offset = ind + height

fig, ax = plt.subplots(1, 2, figsize=(8,4))

# Rankings comparison
malebar = ax(0).barh(ind, male_ranking_avgs, color='#fe9b00', height=height)
femalebar = ax(0).barh(offset, female_ranking_avgs, color='#c94402', height=height)
ax(0).set_title('Movie Rankings by Gender')
ax(0).set_yticks(ind + height / 2)
ax(0).set_yticklabels(('Episode 1', 'Episode 2', 'Episode 3', 'Episode 4', 'Episode 5', 'Episode 6'))
ax(0).legend(('Men', 'Women'))

# Viewership comparison
male2bar = ax(1).barh(ind, male_tot_seen, color='#ff1947', height=height)
female2bar = ax(1).barh(offset, female_tot_seen, color='#9b052d', height=height)
ax(1).set_title('# of Respondents by Gender')
ax(1).set_xlabel('Number of Respondents')
ax(1).legend(('Men', 'Women'))

plt.show()

By the classification of Star Wars moviesBy the classification of Star Wars movies

Learning Visual: Offset techniques (ind + height) The key to making the supplement is repeated. This leads to women’s bars slightly down from male bars, which produces a comparative effect. The boundaries of the same axis ensure a fair visual comparison between the chart.

Key results and insights

Through our systematic analysis, we have discovered:

Movie’s preferences:

  • Episode v (Empire Straitsback) emerges as the final fans’ favorite in all population items
  • The original trinity rating and the audience significantly improve the prevalence
  • Episode III receives the lowest rating and has the lowest viewer

Gender analysis:

  • Both men and women rank Episode V as their obvious favorite
  • Gender differences in priorities are minimal but in favor of permanently masculine engagement
  • Men used to rank a bit more event than female
  • More men have seen each of the six movies than women, but the samples remain permanent

Distinguished Visual:

  • Most films are not equal to the differences between gender
  • Epishes V and VI represent the most global appellate content of the VI franchise
  • Scientific concepts about gender preferences in science fi show some support in the engagement level, but the taste preferences remain the same.

Summary of the stakeholder

Each analysis should end with clear, viable insights. Stakeholders here need to know:

  • Episode V (Empire Strikesback) is the final fans’ favorite With average rating in all settlements
  • Gender differences in film preferences are minimalChallenging ordinary stereotypes about science -fi priorities
  • The original trinity improves the prevalent Critical welcome and arrive at both the audience
  • Male respondents demonstrate high overall engagement With franchise, on average watching more movies

Beyond this analysis: next steps

This datastate is worth looking for extra dimensions:

  • Character analysis: Which characters are loved, hated, or are controversial in the fan base?
  • “Han shot first” discussion: Analyze this infamous star war dispute and what it appears about
  • Cross franchise’s preferences: Detect the concrete between Star Wars and Star Track Fandem
  • The concrete of education and age: Do the samples of viewing vary from gender to the distinctive factors?

This project balances the development of technical skills with engaging topics. You will emerge with a polished portfolio piece of data cleaning skills, advanced concepts. The ability to transform the data capabilities, and dirty survey data into viable business insights.

Whether you are a team or Seth, data tells a compelling story. And now you have the skill of telling it beautifully.

If you let this project go, please share your results in this Data Quest Community And tag me (Anna_strahl). I would love to see which samples you discover!

More plans to try

We have some more project walkthrough tutorials you can enjoy:

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro