How to Build a Resume Screening System Using Python and Multiprocessing

Hiring the right candidate starts with a time-consuming task: screening resumes. If you’ve ever started a job, you know the pain of hundreds of applications in your inbox, causing you to spend hours manually reviewing each resume.

In this article, you’ll build a resume screening system using pure Python, focusing on basic programming concepts and the power of multiprocessing. You’ll build a custom system that automates the evaluation process by turning unstructured resume documents into a ranked leaderboard.

By the end of this guide, you will:

Analyze documents by extracting text from PDF
Extract information from resume content by identifying skills and keywords
Design a scoring algorithm using weighting logic to rank candidates reasonably
Create a web interface using Streamlet
Deploy the application to the Streamlet cloud for public access

By following this tutorial, you’ll build a tool capable of processing hundreds of resumes in seconds.

Here is the source code: GitHub repository

Conditions

To follow along with this tutorial, you should:

Basic knowledge of Python (functions, loops, dictionaries)
Python 3.8 or higher is installed
Familiarity with installing packages using pip
A code editor such as VS Code, Pycharm, or any editor you prefer

Project overview

In this guide, you will develop a system that takes a folder of resumes and a job description (JD) as input. The system processes each résumé, extracts relevant information, and calculates a score for how well the candidate matches the job requirements.

How does the system work?

The project includes four main components:

Resume é Parser: Reads PDF and DOCX files and extracts text
JD Parser: Analyzes job descriptions to identify required skills
Keyword Extractor: Matches résumé content against skill classification
Scoring engine: Ranks candidates using a weighted algorithm

Scoring Formula

Here is the scoring formula we will use:

Total Score =
(Required Skills × 50%) +
(Preferred Skills × 25%) +
(Experience × 15%) +
(Keywords × 10%)

This approach ensures that essential skills carry more weight than secondary keywords.

How does this approach help reduce bias?

This system evaluates rituals using predetermined criteria rather than subjective judgment. Each resume is scored based on the same set of desired skills, preferred skills, experience indicators and keywords.

Because all candidates are evaluated using the same weighting formula, personal factors such as writing style, formatting, or unconscious preferences do not affect the ranking. The scoring logic focuses only on how closely the job requirements are matched.

By standardizing the evaluation process, the system promotes more consistent and objective screening, which helps reduce bias during the initial résumé review stage.

System architecture

Input                    Processing                     Output
─────                    ──────────                     ──────

Résumés ──► Résumé Parser ──► Keyword Extractor ──┐
(PDF/DOCX)                                        │
                                                  ├──► Scoring Engine ──► Ranked Results
Job Description ──► JD Parser ────────────────────┘
(TXT/PDF)

The system follows a simple input-process-output flow.

Resumes and job descriptions are provided as inputs. The Résumé parser extracts text from each resume, while the JD parser identifies required and preferred skills from job descriptions.

The extracted résumé text is then fed to a keyword extractor, which matches skills and keywords using predefined rankings.

Finally, the scoring engine applies a weighted formula to calculate a score for each candidate and advances a ranked list of resumes.

Project structure

resume_screening_system/
├── app.py                    # Streamlit web interface
├── main.py                   # Command-line interface
├── parsers/
│   ├── resume_parser.py      # PDF/DOCX text extraction
│   └── jd_parser.py          # Job description parsing
├── extractors/
│   └── keyword_extractor.py  # Skills and experience extraction
├── matcher/
│   └── scorer.py             # Scoring algorithm
├── data/
│   ├── config.json           # Scoring weights
│   └── skills_taxonomy.json  # Skills database
└── requirements.txt          # Dependencies

Projects are organized into clear, modular directories. Parsing logic, keyword extraction, and scoring are separated into their own folders, while configuration files and data are kept separate. This structure makes the code base easy to navigate, maintain and extend.

Step 1: Set up the project

Create the folder structure and set up the virtual environment:

mkdir resume_screening_system
cd resume_screening_system
mkdir parsers extractors matcher data input output
python -m venv venv

Then go ahead and activate the virtual environment:


source venv/Scripts/activate


source venv/bin/activate

Install the required dependencies like this:

pip install PyPDF2 python-docx streamlit pandas

Step 2: Create a resume é parser

Résumé Parser handles different file formats by using a separate extraction method for each type.

For PDF files, the parser opens the document page by page and extracts the text from each page using a PDF reader. The extracted text is concatenated into a single string for further processing.

For DOCX files, the parser reads each paragraph in the document and adds the paragraph text into a block. This ensures consistent text output regardless of resume format.

By combining all resumes into plain text, the parser allows components such as keyword extraction and scoring to work efficiently.

file: parsers/resume_parser.py

def _extract_pdf(self, file_path: Path) -> str:
    text = ""
    with open(file_path, "rb") as file:
        pdf_reader = PyPDF2.PdfReader(file)
        for page in pdf_reader.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text + "\\n"
    return text.strip()

def _extract_docx(self, file_path: Path) -> str:
    from docx import Document
    doc = Document(file_path)
    return "\\n".join(
        para.text for para in doc.paragraphs
    ).strip()

The Resumé dataset is used in this project Cagle To ensure that the logic works with real-world professional data. Keyword extractor identifies skills by scanning resume text.

Resume text is first converted to lower case so that the match is case insensitive. A descriptive skill classification stores each skill with its possible variations. The extractor checks the resume text against these variations to find matches.

Word boundaries are used during matching to avoid partial matches, such as matching “java” within “javascript”. Matching skills are stored in a set to prevent duplication.

This approach ensures consistent and controlled mastery detection across rituals.

file: extractors/keyword_extractor.py

def extract_skills(self, text: str) -> Set(str):
    text_lower = text.lower()
    found_skills = set()

    for category, skills_dict in self.skills_taxonomy.items():
        for skill_name, variations in skills_dict.items():
            for variation in variations:
                
                pattern = r"\\b" + re.escape(variation) + r"\\b"
                if re.search(pattern, text_lower):
                    found_skills.add(skill_name)
                    break

    return found_skills

Step 4: Implement the scoring engine

To generate an objective ranking, the system uses a weighted scoring formula.

Ingredients	The weight	Reasoning
Required skills	50%	Necessary technical requirements
Preferred skills	25%	Competitive differentiation
Experience	15%	Professional depth
Keywords	10%	Domain orientation

Total Score =
(S_req × 0.50) +
(S_pref × 0.25) +
(E_exp × 0.15) +
(K_key × 0.10)

The scoring engine calculates a final score for each resume using the weighted values.

It counts how many desirable skills, preferred skills, experience indicators and keywords appear on a resume. Each count is multiplied by its assigned weight, with the required skills contributing the most.

The weighted values are summed to produce a single score. Resumes are then sorted by this score to produce a ranked list of candidates.

Step 5: Create the web interface

Streamlet provides a simple web interface to interact with the resume screening system.

A text area allows users to input a job description, while a file uploader lets them upload multiple resume files. When the button is clicked, Streamlet triggers back-end logic to parse resumes, extract data, and calculate scores.

The results are then displayed in the browser, allowing users to run the screening process without using the command line.

file: app.py

import streamlit as st

jd_text = st.text_area(
    "Paste the job description here:",
    height=300
)

uploaded_files = st.file_uploader(
    "Upload resume files:",
    type=("pdf", "docx", "txt"),
    accept_multiple_files=True
)

if st.button("Screen Resumes", type="primary"):
    st.success("Processing resumes...")

Run the application:

streamlit run app.py

Will be available on the app http://localhost:8500.

Step 6: Test the system

Sample job description input

Below is an example of a job description you can use for system testing:

We are looking for a Senior Python Developer with strong experience in backend development.

Required Skills:
- Python
- Django
- REST APIs
- SQL

Preferred Skills:
- PostgreSQL
- Docker
- AWS

Experience:
- 3+ years of professional Python development
- Experience building web applications

This input helps the system identify desired skills, preferred skills, and experience keywords, which are then used by the scoring engine to rank resumes.

python main.py

Sample output

============================================================
SCREENING RESULTS
============================================================
Rank #1: Alice Johnson | Score: 85.42/100 | Matched: python, django, postgresql
Rank #2: Carol Davis   | Score: 72.50/100 | Matched: python, django

Step 7: Deploy the application

To make the system publicly accessible:

Push the code to GitHub
go share.streamlit.io
Choose yours app.py File
Deploy the application

Your app will directly:

The result

In this tutorial, you build a complete resume screening system from scratch using Python. By combining text processing, structured scoring, and automation, this project demonstrates how manual resume screening can be transformed into an efficient and objective workflow.

This system helps reduce bias, save time, and evaluate candidates more consistently. Happy coding!

Table of Contents

Conditions

Project overview

How does the system work?

Scoring Formula

How does this approach help reduce bias?

System architecture

Project structure

Step 1: Set up the project

Step 2: Create a resume é parser

Step 4: Implement the scoring engine

Step 5: Create the web interface

Step 6: Test the system

Sample job description input

Sample output

Step 7: Deploy the application

The result

Editor's pick

Get latest news

How to Build a Resume Screening System Using Python and Multiprocessing

Table of Contents

Conditions

Project overview

How does the system work?

Scoring Formula

How does this approach help reduce bias?

System architecture

Project structure

Step 1: Set up the project

Step 2: Create a resume é parser

Step 4: Implement the scoring engine

Step 5: Create the web interface

Step 6: Test the system

Sample job description input

Sample output

Step 7: Deploy the application

The result

Moltbuk was the peak AI theater

AI Content Generation for SEO: Pros, Cons and How to Use It

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news