Hiring the right candidate starts with a time-consuming task: screening resumes. If you’ve ever started a job, you know the pain of hundreds of applications in your inbox, causing you to spend hours manually reviewing each resume.
In this article, you’ll build a resume screening system using pure Python, focusing on basic programming concepts and the power of multiprocessing. You’ll build a custom system that automates the evaluation process by turning unstructured resume documents into a ranked leaderboard.
By the end of this guide, you will:
Analyze documents by extracting text from PDF
Extract information from resume content by identifying skills and keywords
Design a scoring algorithm using weighting logic to rank candidates reasonably
Create a web interface using Streamlet
Deploy the application to the Streamlet cloud for public access
By following this tutorial, you’ll build a tool capable of processing hundreds of resumes in seconds.
Here is the source code: GitHub repository
Table of Contents
Conditions
To follow along with this tutorial, you should:
Basic knowledge of Python (functions, loops, dictionaries)
Python 3.8 or higher is installed
Familiarity with installing packages using
pipA code editor such as VS Code, Pycharm, or any editor you prefer
Project overview
In this guide, you will develop a system that takes a folder of resumes and a job description (JD) as input. The system processes each résumé, extracts relevant information, and calculates a score for how well the candidate matches the job requirements.
How does the system work?
The project includes four main components:
Resume é Parser: Reads PDF and DOCX files and extracts text
JD Parser: Analyzes job descriptions to identify required skills
Keyword Extractor: Matches résumé content against skill classification
Scoring engine: Ranks candidates using a weighted algorithm
Scoring Formula
Here is the scoring formula we will use:
Total Score =
(Required Skills × 50%) +
(Preferred Skills × 25%) +
(Experience × 15%) +
(Keywords × 10%)
This approach ensures that essential skills carry more weight than secondary keywords.
How does this approach help reduce bias?
This system evaluates rituals using predetermined criteria rather than subjective judgment. Each resume is scored based on the same set of desired skills, preferred skills, experience indicators and keywords.
Because all candidates are evaluated using the same weighting formula, personal factors such as writing style, formatting, or unconscious preferences do not affect the ranking. The scoring logic focuses only on how closely the job requirements are matched.
By standardizing the evaluation process, the system promotes more consistent and objective screening, which helps reduce bias during the initial résumé review stage.
System architecture
Input Processing Output
───── ────────── ──────
Résumés ──► Résumé Parser ──► Keyword Extractor ──┐
(PDF/DOCX) │
├──► Scoring Engine ──► Ranked Results
Job Description ──► JD Parser ────────────────────┘
(TXT/PDF)
The system follows a simple input-process-output flow.
Resumes and job descriptions are provided as inputs. The Résumé parser extracts text from each resume, while the JD parser identifies required and preferred skills from job descriptions.
The extracted résumé text is then fed to a keyword extractor, which matches skills and keywords using predefined rankings.
Finally, the scoring engine applies a weighted formula to calculate a score for each candidate and advances a ranked list of resumes.
Project structure
resume_screening_system/
├── app.py # Streamlit web interface
├── main.py # Command-line interface
├── parsers/
│ ├── resume_parser.py # PDF/DOCX text extraction
│ └── jd_parser.py # Job description parsing
├── extractors/
│ └── keyword_extractor.py # Skills and experience extraction
├── matcher/
│ └── scorer.py # Scoring algorithm
├── data/
│ ├── config.json # Scoring weights
│ └── skills_taxonomy.json # Skills database
└── requirements.txt # Dependencies
Projects are organized into clear, modular directories. Parsing logic, keyword extraction, and scoring are separated into their own folders, while configuration files and data are kept separate. This structure makes the code base easy to navigate, maintain and extend.
Step 1: Set up the project
Create the folder structure and set up the virtual environment:
mkdir resume_screening_system
cd resume_screening_system
mkdir parsers extractors matcher data input output
python -m venv venv
Then go ahead and activate the virtual environment:
source venv/Scripts/activate
source venv/bin/activate
Install the required dependencies like this:
pip install PyPDF2 python-docx streamlit pandas
Step 2: Create a resume é parser
Résumé Parser handles different file formats by using a separate extraction method for each type.
For PDF files, the parser opens the document page by page and extracts the text from each page using a PDF reader. The extracted text is concatenated into a single string for further processing.
For DOCX files, the parser reads each paragraph in the document and adds the paragraph text into a block. This ensures consistent text output regardless of resume format.
By combining all resumes into plain text, the parser allows components such as keyword extraction and scoring to work efficiently.
file: parsers/resume_parser.py
def _extract_pdf(self, file_path: Path) -> str:
text = ""
with open(file_path, "rb") as file:
pdf_reader = PyPDF2.PdfReader(file)
for page in pdf_reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\\n"
return text.strip()
def _extract_docx(self, file_path: Path) -> str:
from docx import Document
doc = Document(file_path)
return "\\n".join(
para.text for para in doc.paragraphs
).strip()
The Resumé dataset is used in this project Cagle To ensure that the logic works with real-world professional data. Keyword extractor identifies skills by scanning resume text.
Resume text is first converted to lower case so that the match is case insensitive. A descriptive skill classification stores each skill with its possible variations. The extractor checks the resume text against these variations to find matches.
Word boundaries are used during matching to avoid partial matches, such as matching “java” within “javascript”. Matching skills are stored in a set to prevent duplication.
This approach ensures consistent and controlled mastery detection across rituals.
file: extractors/keyword_extractor.py
def extract_skills(self, text: str) -> Set(str):
text_lower = text.lower()
found_skills = set()
for category, skills_dict in self.skills_taxonomy.items():
for skill_name, variations in skills_dict.items():
for variation in variations:
pattern = r"\\b" + re.escape(variation) + r"\\b"
if re.search(pattern, text_lower):
found_skills.add(skill_name)
break
return found_skills
Step 4: Implement the scoring engine
To generate an objective ranking, the system uses a weighted scoring formula.
| Ingredients | The weight | Reasoning |
| Required skills | 50% | Necessary technical requirements |
| Preferred skills | 25% | Competitive differentiation |
| Experience | 15% | Professional depth |
| Keywords | 10% | Domain orientation |
Total Score =
(S_req × 0.50) +
(S_pref × 0.25) +
(E_exp × 0.15) +
(K_key × 0.10)
The scoring engine calculates a final score for each resume using the weighted values.
It counts how many desirable skills, preferred skills, experience indicators and keywords appear on a resume. Each count is multiplied by its assigned weight, with the required skills contributing the most.
The weighted values are summed to produce a single score. Resumes are then sorted by this score to produce a ranked list of candidates.
Step 5: Create the web interface
Streamlet provides a simple web interface to interact with the resume screening system.
A text area allows users to input a job description, while a file uploader lets them upload multiple resume files. When the button is clicked, Streamlet triggers back-end logic to parse resumes, extract data, and calculate scores.
The results are then displayed in the browser, allowing users to run the screening process without using the command line.
file: app.py
import streamlit as st
jd_text = st.text_area(
"Paste the job description here:",
height=300
)
uploaded_files = st.file_uploader(
"Upload resume files:",
type=("pdf", "docx", "txt"),
accept_multiple_files=True
)
if st.button("Screen Resumes", type="primary"):
st.success("Processing resumes...")
Run the application:
streamlit run app.py
Will be available on the app http://localhost:8500.
Step 6: Test the system
Sample job description input
Below is an example of a job description you can use for system testing:
We are looking for a Senior Python Developer with strong experience in backend development.
Required Skills:
- Python
- Django
- REST APIs
- SQL
Preferred Skills:
- PostgreSQL
- Docker
- AWS
Experience:
- 3+ years of professional Python development
- Experience building web applications
This input helps the system identify desired skills, preferred skills, and experience keywords, which are then used by the scoring engine to rank resumes.
python main.py
Sample output
============================================================
SCREENING RESULTS
============================================================
Rank #1: Alice Johnson | Score: 85.42/100 | Matched: python, django, postgresql
Rank #2: Carol Davis | Score: 72.50/100 | Matched: python, django
Step 7: Deploy the application
To make the system publicly accessible:
Push the code to GitHub
Choose yours
app.pyFileDeploy the application
Your app will directly:
The result
In this tutorial, you build a complete resume screening system from scratch using Python. By combining text processing, structured scoring, and automation, this project demonstrates how manual resume screening can be transformed into an efficient and objective workflow.
This system helps reduce bias, save time, and evaluate candidates more consistently. Happy coding!