

Photo by Author | Canva
What if there is a way to make your codes faster? __slots__
Enforcement is easy and can improve your code performance by reducing memory use.
In this article, we will go through how it works using a data science project from the real world, where Alegro is using it as a challenge for the process of science recruitment. However, before going to this project, let’s create a solid understanding of what we __slots__
Does
What is __slots__
In
In the same way, everything has a dictionary of its attributes. This facilitates you to add, replace or delete them, but it also comes at a price: additional memory and access to slow attributes.
__slots__
The declaration has told Azgar that these are the only attributes that will ever need this purpose. This is a kind of limit, but it will save our time. Let’s look at an example.
class WithoutSlots:
def __init__(self, name, age):
self.name = name
self.age = age
class WithSlots:
__slots__
= ('name', 'age')
def __init__(self, name, age):
self.name = name
self.age = age
In the second class, __slots__
He tells us not to make a dictionary for everything. Instead, it preserves a fixed space in memory for name and age values, which reduces the use of faster and memory.
Why use __slots__
?
Now, before starting the data project, let’s tell the reason why you should use __slots__
.
- Memory: When Pyon develops a dictionary, objects take less space.
- Speed: Access to values is faster because Azagar knows where every price is safe.
- Insects: This structure avoids silent insects because only admirable people are allowed.
For example Use of Elgro’s Data Science Challenge
In this data project, Alegro asked the data science candidates to build a machine learning model and predict laptop prices.
There are three different datases:
- Train_ditaste.
- Well_Dette.
- Test_dataset.json
Well let’s continue the data research process.
Data Search
Now load one of them to see the data structure.
with open('train_dataset.json', 'r') as f:
train_data = json.load(f)
df = pd.DataFrame(train_data).dropna().reset_index(drop=True)
df.head()
Here is an output.
Well, let’s see the column.
Here is an output.
Now, check the numerical column.
Here is an output.
Search for data with __slots__
Vs regular classes
Let’s create a class called slotted data exploration, which will use __slots__
The attribute allows for only one attribute called the DF. Let’s see the code.
class SlottedDataExploration:
__slots__
= ('df')
def __init__(self, df):
self.df = df
def info(self):
return self.df.info()
def head(self, n=5):
return self.df.head(n)
def tail(self, n=5):
return self.df.tail(n)
def describe(self):
return self.df.describe(include="all")
Now we see the implementation, and instead of using __slots__
Let’s use regular classes.
class DataExploration:
def __init__(self, df):
self.df = df
def info(self):
return self.df.info()
def head(self, n=5):
return self.df.head(n)
def tail(self, n=5):
return self.df.tail(n)
def describe(self):
return self.df.describe(include="all")
This is what you can read more about how the class methods work in The methods of azagar class Leader
Performance Comparison: Time Benchmark
Now measure the performance by measuring time and memory.
import time
from pympler import asizeof # memory measurement
start_normal = time.time()
de = DataExploration(df)
_ = de.head()
_ = de.tail()
_ = de.describe()
_ = de.info()
end_normal = time.time()
normal_duration = end_normal - start_normal
normal_memory = asizeof.asizeof(de)
start_slotted = time.time()
sde = SlottedDataExploration(df)
_ = sde.head()
_ = sde.tail()
_ = sde.describe()
_ = sde.info()
end_slotted = time.time()
slotted_duration = end_slotted - start_slotted
slotted_memory = asizeof.asizeof(sde)
print(f"⏱️ Normal class duration: {normal_duration:.4f} seconds")
print(f"⏱️ Slotted class duration: {slotted_duration:.4f} seconds")
print(f"📦 Normal class memory usage: {normal_memory:.2f} bytes")
print(f"📦 Slotted class memory usage: {slotted_memory:.2f} bytes")
Let’s see the result now.
The classed class duration is 46.45 % faster, but for this example the use of memory is the same.
Machine Learning in Action
Now, in this section, let’s continue with the machine learning. But before doing so, let’s distribute a train and test.
Train and test divide
Now we have three different datases, trains, wells, and tests, so let’s find their indications first.
train_indeces = train_df.dropna().index
val_indeces = val_df.dropna().index
test_indeces = test_df.dropna().index
The time has come to assign these indicators to easily select these datases in the next step.
train_df = new_df.loc(train_indeces)
val_df = new_df.loc(val_indeces)
test_df = new_df.loc(test_indeces)
Great, now let’s format these data frames as the NIMP wants a flat (n,) format instead of
(n, 1) to do this, we need OT use. After Revil () to_numpy ().
X_train, X_val, X_test = train_df(selected_features).to_numpy(), val_df(selected_features).to_numpy(), test_df(selected_features).to_numpy()
y_train, y_val, y_test = df.loc(train_indeces)(label_col).to_numpy().ravel(), df.loc(val_indeces)(label_col).to_numpy().ravel(), df.loc(test_indeces)(label_col).to_numpy().ravel()
Machine Learning Model Apply
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import VotingRegressor
from sklearn import linear_model
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
import matplotlib.pyplot as plt
from sklearn import tree
import seaborn as sns
def rmse(y_true, y_pred):
return mean_squared_error(y_true, y_pred, squared=False)
def regression(regressor_name, regressor):
pipe = make_pipeline(MaxAbsScaler(), regressor)
pipe.fit(X_train, y_train)
predicted = pipe.predict(X_test)
rmse_val = rmse(y_test, predicted)
print(regressor_name, ':', rmse_val)
pred_df(regressor_name+'_Pred') = predicted
plt.figure(regressor_name)
plt.title(regressor_name)
plt.xlabel('predicted')
plt.ylabel('actual')
sns.regplot(y=y_test,x=predicted)
Next, we will explain a dictionary of registers and run every model.
regressors = {
'Linear' : LinearRegression(),
'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
'RandomForest': RandomForestRegressor(random_state=42),
'GradientBoosting': GradientBoostingRegressor(random_state=42, criterion='squared_error',
loss="squared_error",learning_rate=0.6, warm_start=True),
'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42),
}
pred_df = pd.DataFrame(columns =("Actual"))
pred_df("Actual") = y_test
for key in regressors.keys():
regression(key, regressors(key))
Here are the results.
Now, enforce it with both slot and regular classes.
With the machine learning __slots__
Vs regular classes
Now check the code with the slot.
class SlottedMachineLearning:
__slots__
= ('X_train', 'y_train', 'X_test', 'y_test', 'pred_df')
def __init__(self, X_train, y_train, X_test, y_test):
self.X_train = X_train
self.y_train = y_train
self.X_test = X_test
self.y_test = y_test
self.pred_df = pd.DataFrame({'Actual': y_test})
def rmse(self, y_true, y_pred):
return mean_squared_error(y_true, y_pred, squared=False)
def regression(self, name, model):
pipe = make_pipeline(MaxAbsScaler(), model)
pipe.fit(self.X_train, self.y_train)
predicted = pipe.predict(self.X_test)
self.pred_df(name + '_Pred') = predicted
score = self.rmse(self.y_test, predicted)
print(f"{name} RMSE:", score)
plt.figure(figsize=(6, 4))
sns.regplot(x=predicted, y=self.y_test, scatter_kws={"s": 10})
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title(f'{name} Predictions')
plt.grid(True)
plt.show()
def run_all(self):
models = {
'Linear': LinearRegression(),
'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
'RandomForest': RandomForestRegressor(random_state=42),
'GradientBoosting': GradientBoostingRegressor(random_state=42, learning_rate=0.6, warm_start=True),
'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42)
}
for name, model in models.items():
self.regression(name, model)
Here is a regular class application.
class MachineLearning:
def __init__(self, X_train, y_train, X_test, y_test):
self.X_train = X_train
self.y_train = y_train
self.X_test = X_test
self.y_test = y_test
self.pred_df = pd.DataFrame({'Actual': y_test})
def rmse(self, y_true, y_pred):
return mean_squared_error(y_true, y_pred, squared=False)
def regression(self, name, model):
pipe = make_pipeline(MaxAbsScaler(), model)
pipe.fit(self.X_train, self.y_train)
predicted = pipe.predict(self.X_test)
self.pred_df(name + '_Pred') = predicted
score = self.rmse(self.y_test, predicted)
print(f"{name} RMSE:", score)
plt.figure(figsize=(6, 4))
sns.regplot(x=predicted, y=self.y_test, scatter_kws={"s": 10})
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title(f'{name} Predictions')
plt.grid(True)
plt.show()
def run_all(self):
models = {
'Linear': LinearRegression(),
'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
'RandomForest': RandomForestRegressor(random_state=42),
'GradientBoosting': GradientBoostingRegressor(random_state=42, learning_rate=0.6, warm_start=True),
'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42)
}
for name, model in models.items():
self.regression(name, model)
Performance Comparison: Time Benchmark
Now let’s compare each code with what we did in the back.
import time
start_normal = time.time()
ml = MachineLearning(X_train, y_train, X_test, y_test)
ml.run_all()
end_normal = time.time()
normal_duration = end_normal - start_normal
normal_memory = (
ml.X_train.nbytes +
ml.X_test.nbytes +
ml.y_train.nbytes +
ml.y_test.nbytes
)
start_slotted = time.time()
sml = SlottedMachineLearning(X_train, y_train, X_test, y_test)
sml.run_all()
end_slotted = time.time()
slotted_duration = end_slotted - start_slotted
slotted_memory = (
sml.X_train.nbytes +
sml.X_test.nbytes +
sml.y_train.nbytes +
sml.y_test.nbytes
)
print(f"⏱️ Normal ML class duration: {normal_duration:.4f} seconds")
print(f"⏱️ Slotted ML class duration: {slotted_duration:.4f} seconds")
print(f"📦 Normal ML class memory usage: {normal_memory:.2f} bytes")
print(f"📦 Slotted ML class memory usage: {slotted_memory:.2f} bytes")
time_diff = normal_duration - slotted_duration
percent_faster = (time_diff / normal_duration) * 100
if percent_faster > 0:
print(f"✅ Slotted ML class is {percent_faster:.2f}% faster than the regular ML class.")
else:
print(f"ℹ️ No speed improvement with slots in this run.")
memory_diff = normal_memory - slotted_memory
percent_smaller = (memory_diff / normal_memory) * 100
if percent_smaller > 0:
print(f"✅ Slotted ML class uses {percent_smaller:.2f}% less memory than the regular ML class.")
else:
print(f"ℹ️ No memory savings with slots in this run.")
Here is an output.
Conclusion
By stopping the dynamic creation __dict__
For each instance, Azigar __slots__
Great in reducing memory use and accelerating access to attributes. You see how it works in practice through both data research and machine learning, using the Algro’s real recruitment project.
In small datases, improvement can be modest. But as the scales of data, the benefits are more noticeable, especially in memory connected or in critical applications of performance.
Net Razii A data is in a scientist and product strategy. He is also an affiliated professor of Teaching Analytics, and is the founder of Stratskrich, a platform that helps data scientists prepare for his interview with the real questions of high companies. The net carrier writes on the latest trends in the market, gives interview advice, sharing data science projects, and everything covers SQL.