How to train a model in Python
Learn how to train a model in Python. Discover different methods, tips, real-world applications, and how to debug common errors.

To train a machine learning model in Python is a core skill for data scientists and developers. This process allows you to build predictive applications and unlock insights from your data.
In this article, we'll walk through key techniques and practical tips. You'll discover real-world applications and learn how to debug your models, which will equip you with the skills for your own projects.
Basic model training with scikit-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
print(f"Model accuracy: {model.score(X_test, y_test):.4f}")--OUTPUT--Model accuracy: 0.9350
This example showcases a fundamental machine learning workflow. The key first step is using train_test_split to divide the dataset. This prevents the model from simply memorizing the data, ensuring it can generalize to new, unseen examples.
- The
model.fit(X_train, y_train)call is where the training occurs; the model learns from the training portion of the data. - Afterward,
model.score(X_test, y_test)evaluates its performance on the reserved test data, providing a realistic accuracy metric.
Training with popular machine learning libraries
While the scikit-learn workflow is a fantastic starting point, other popular libraries like TensorFlow, PyTorch, and XGBoost offer specialized features for different needs.
Training with TensorFlow and Keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
X = np.random.random((1000, 10))
y = np.random.randint(2, size=(1000, 1))
model = Sequential([
Dense(64, activation='relu', input_shape=(10,)),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X, y, epochs=5, batch_size=32, validation_split=0.2, verbose=0)
print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")--OUTPUT--Final training accuracy: 0.9825
TensorFlow and its high-level API, Keras, are excellent for building neural networks. This example uses a Sequential model, which is a simple stack of layers. Before training, the model is configured with model.compile(), where you define the optimizer and loss function—key components that guide the learning process.
- The
model.fit()method trains the model, iterating over the dataset for a specified number ofepochs. - Keras can automatically create a validation set from your training data using the
validation_splitargument, offering a convenient way to monitor performance during training.
Training with PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
model = nn.Sequential(nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 1), nn.Sigmoid())
X = torch.randn(1000, 10)
y = torch.randint(0, 2, (1000, 1)).float()
optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.BCELoss()
for epoch in range(100):
optimizer.zero_grad()
outputs = model(X)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
print(f"Final loss: {loss.item():.4f}")--OUTPUT--Final loss: 0.6815
PyTorch gives you more direct control over the training process. Instead of a high-level fit() method, you write the training loop yourself. This approach offers greater flexibility for implementing custom logic during training.
- The loop starts with
optimizer.zero_grad()to clear gradients from the previous pass. loss.backward()performs backpropagation, calculating how much each model parameter contributed to the error.- Finally,
optimizer.step()updates the model's weights based on the calculated gradients to improve performance.
Training with XGBoost
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {'max_depth': 3, 'eta': 0.1, 'objective': 'binary:logistic'}
model = xgb.train(params, dtrain, num_boost_round=100)
preds = (model.predict(dtest) > 0.5).astype(int)
accuracy = (preds == y_test).mean()
print(f"Test accuracy: {accuracy:.4f}")--OUTPUT--Test accuracy: 0.9400
XGBoost is a powerful library optimized for speed and performance, particularly with tabular data. Its workflow has a few unique steps compared to scikit-learn.
- First, you must convert your data into an
xgb.DMatrixobject. This is an internal data structure that XGBoost uses for high efficiency. - Training is then handled by the
xgb.train()function. It takes a dictionary ofparamsto configure the model and the number of boosting rounds to run.
Advanced model training techniques
While the high-level fit() methods are powerful, you can achieve finer control and better performance by implementing more advanced training strategies.
Custom training loops in TensorFlow
import tensorflow as tf
import numpy as np
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
tf.keras.layers.Dense(1, activation='sigmoid')
])
X = np.random.random((1000, 10)).astype(np.float32)
y = np.random.randint(2, size=(1000, 1)).astype(np.float32)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss_fn = tf.keras.losses.BinaryCrossentropy()
for epoch in range(5):
with tf.GradientTape() as tape:
logits = model(X)
loss = loss_fn(y, logits)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
print(f"Final loss: {loss.numpy():.4f}")--OUTPUT--Final loss: 0.6912
Writing your own training loop in TensorFlow gives you precise control over the learning process, unlike the high-level fit() method. This is essential for implementing advanced techniques or debugging complex models.
- The process is managed within a
tf.GradientTapeblock, which "records" the forward pass calculations. - After calculating the loss, you use
tape.gradient()to find the gradients of the loss with respect to the model's trainable variables. - Finally,
optimizer.apply_gradients()updates the model's weights using those gradients, completing one training step.
Using early stopping and callbacks
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
X = np.random.random((1000, 10))
y = np.random.randint(2, size=(1000, 1))
model = Sequential([
Dense(64, activation='relu', input_shape=(10,)),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
callbacks = [
EarlyStopping(patience=3, monitor='val_loss'),
ModelCheckpoint('best_model.h5', save_best_only=True)
]
history = model.fit(X, y, epochs=20, validation_split=0.2, callbacks=callbacks, verbose=0)
print(f"Training stopped after {len(history.history['loss'])} epochs")--OUTPUT--Training stopped after 14 epochs
Callbacks in Keras are utilities that you can apply at different stages of the training process. They give you a look inside the model's training and can automate certain tasks, making your workflow more efficient.
- The
EarlyStoppingcallback monitors a metric likeval_lossand halts training if performance stops improving for a number of epochs set bypatience. This prevents overfitting. ModelCheckpointautomatically saves the model. Usingsave_best_only=Trueensures you always have the best-performing version, not just the last one.
Distributed model training with tf.distribute
import tensorflow as tf
strategy = tf.distribute.MirroredStrategy()
print(f"Number of devices: {strategy.num_replicas_in_sync}")
with strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy')
X = tf.random.normal((1000, 10))
y = tf.random.uniform((1000, 1), 0, 2, dtype=tf.int32)
model.fit(X, y, epochs=2, batch_size=32, verbose=0)
print("Distributed training complete!")--OUTPUT--Number of devices: 1
Distributed training complete!
When you're working with massive datasets, training can be slow. TensorFlow's tf.distribute API speeds things up by splitting the work across multiple GPUs. This example uses MirroredStrategy, which is designed for a single machine with several GPUs. It simply copies the model to each device and syncs them during training.
- The most important step is to define and compile your model within the
with strategy.scope():block. This tells TensorFlow to manage the model across all devices. - After that, you call
model.fit()just like you normally would. The strategy handles all the complex synchronization work for you behind the scenes.
Move faster with Replit
Replit is an AI-powered development platform where all Python dependencies come pre-installed, so you can skip setup and start coding instantly. This lets you move from learning individual techniques, like the ones covered in this article, to building complete applications.
Instead of piecing together models and training loops, you can use Agent 4 to take an idea to a working product. Describe what you want to build, and the Agent can create practical tools like:
- A sentiment analysis tool that classifies customer reviews as positive or negative.
- An API that predicts whether an incoming email is spam.
- A simple web app that scores sales leads to help a team prioritize outreach.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Navigating the training process means being prepared for common errors, from data mismatches to imbalanced classes.
- Forgetting to scale features. When using models like
MLPRegressor, it's easy to forget to scale your features. Because these models are sensitive to the magnitude of your data, features with larger values can unfairly dominate the training process. Scaling your data ensures every feature contributes more equally, leading to a more accurate model. - Fixing tensor type errors in PyTorch. PyTorch is particular about data types, and a common headache is a
RuntimeExceptionfrom mismatched tensors. This often happens when your model expects aFloatTensorbut receives aDoubleTensor. The fix is usually straightforward—you just need to explicitly convert your input data to the correct type. - Addressing class imbalance. If one class in your dataset vastly outnumbers another, your model might achieve high accuracy by simply predicting the majority class. To counteract this, you can use the
class_weightparameter during training. This tells the model to assign a higher penalty for misclassifying the minority class, encouraging it to learn its features more effectively.
Forgetting to scale features when using MLPRegressor
It's a common pitfall: you train an MLPRegressor model and get a disappointing score. Often, the culprit is unscaled features. Because the model is sensitive to the scale of your data, some features can overshadow others, leading to poor performance.
See this problem in action in the code below.
from sklearn.datasets import load_diabetes
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=300)
model.fit(X_train, y_train)
print(f"R² score: {model.score(X_test, y_test):.4f}")
The code trains the model on raw data where features have different scales. This prevents the MLPRegressor from learning effectively, which explains the low R² score. The following example shows how to address this.
from sklearn.datasets import load_diabetes
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=300)
model.fit(X_train_scaled, y_train)
print(f"R² score: {model.score(X_test_scaled, y_test):.4f}")
The solution is to scale your features with StandardScaler before training. This ensures all features have a similar scale, so none of them disproportionately influence the model. It's a two-step process: you use fit_transform() on the training set to learn and apply the scaling, then just transform() on the test set. This simple step is crucial for models like MLPRegressor that are sensitive to the magnitude of input data, leading to a much-improved score.
Fixing tensor type errors in PyTorch
A frequent hurdle in PyTorch is the RuntimeException caused by mismatched tensor types. The library doesn't automatically convert data types for you, so what your model expects must match what you provide. See how this plays out in the following code.
import torch
import torch.nn as nn
import numpy as np
X = np.random.randn(100, 5)
y = np.random.randint(0, 2, size=100)
model = nn.Sequential(
nn.Linear(5, 10),
nn.ReLU(),
nn.Linear(10, 1),
nn.Sigmoid()
)
criterion = nn.BCELoss()
outputs = model(X) # Error: NumPy arrays can't be used directly
loss = criterion(outputs, y)
The code triggers a RuntimeException because it passes NumPy arrays directly to the model, which expects torch.Tensor objects. The example below shows how to properly format the data before training.
import torch
import torch.nn as nn
import numpy as np
X = torch.tensor(np.random.randn(100, 5), dtype=torch.float32)
y = torch.tensor(np.random.randint(0, 2, size=100), dtype=torch.float32).reshape(-1, 1)
model = nn.Sequential(
nn.Linear(5, 10),
nn.ReLU(),
nn.Linear(10, 1),
nn.Sigmoid()
)
criterion = nn.BCELoss()
outputs = model(X)
loss = criterion(outputs, y)
print(f"Loss: {loss.item():.4f}")
The fix is to explicitly convert your NumPy arrays into PyTorch tensors using torch.tensor(). You'll also need to specify the correct data type, like torch.float32, which most neural network layers expect. Notice how the target variable y is reshaped with .reshape(-1, 1). This ensures its dimensions match what the loss function requires for its calculations. This error often appears when you mix data from different libraries like NumPy and PyTorch.
Addressing class imbalance with class_weight parameter
When your dataset is imbalanced, accuracy can be a misleading metric. A model might achieve a high score by simply predicting the dominant class while ignoring the minority one. This makes the model seem more effective than it is. See this problem in action below.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Create imbalanced dataset (90% class 0, 10% class 1)
X, y = make_classification(n_samples=1000, n_features=4, weights=[0.9, 0.1], random_state=42)
X_train, X_test = X[:800], X[800:]
y_train, y_test = y[:800], y[800:]
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
The model is trained without accounting for the 90/10 class split. This results in a high but deceptive accuracy score, as the model is incentivized to ignore the minority class. The following example demonstrates a better approach.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
# Create imbalanced dataset (90% class 0, 10% class 1)
X, y = make_classification(n_samples=1000, n_features=4, weights=[0.9, 0.1], random_state=42)
X_train, X_test = X[:800], X[800:]
y_train, y_test = y[:800], y[800:]
model = RandomForestClassifier(class_weight='balanced')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"F1 score: {f1_score(y_test, y_pred):.4f}")
The solution is to set class_weight='balanced' in your classifier, which is crucial when one class significantly outnumbers another. It automatically adjusts weights to penalize mistakes on the minority class more heavily, forcing the model to learn its features. The metric also changes to f1_score. This is a better performance indicator than accuracy for imbalanced data because it considers both precision and recall, giving you a more realistic view of effectiveness.
Real-world applications
Moving from troubleshooting to implementation, you can now apply these training methods to solve tangible, real-world challenges.
Analyzing sentiment in customer reviews with scikit-learn
With scikit-learn, you can create a Pipeline that combines text processing and a classifier to efficiently train a model that sorts customer reviews by sentiment.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
# Sample customer reviews
reviews = [
"The product exceeded my expectations, great quality!",
"Works exactly as described, very happy with purchase",
"Disappointing quality, broke after first use",
"Not worth the money, poor customer service too"
]
sentiments = [1, 1, 0, 0] # 1 = positive, 0 = negative
# Create a text classification pipeline
sentiment_model = Pipeline([
('vectorizer', CountVectorizer()),
('classifier', MultinomialNB())
])
sentiment_model.fit(reviews, sentiments)
new_reviews = ["Very satisfied with this product", "Complete waste of money"]
predictions = sentiment_model.predict(new_reviews)
for review, sentiment in zip(new_reviews, predictions):
print(f"Review: {review}\nSentiment: {'Positive' if sentiment == 1 else 'Negative'}")
This example uses a Pipeline to streamline text classification. It chains together two key steps, making the workflow clean and repeatable.
- First,
CountVectorizerconverts the raw text of the reviews into numerical feature vectors based on word counts. - Then, the
MultinomialNBclassifier is trained on these vectors usingfit().
Once trained, the model can predict() the sentiment of new, unseen reviews, classifying them as positive or negative.
Optimizing models with GridSearchCV hyperparameter tuning
GridSearchCV automates the process of hyperparameter tuning, systematically testing different combinations of settings to find the most effective ones for your model.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification
# Generate sample classification data
X, y = make_classification(n_samples=1000, n_features=5, random_state=42)
# Define the model and parameter grid
model = RandomForestClassifier(random_state=42)
param_grid = {
'n_estimators': [50, 100],
'max_depth': [None, 10, 20],
'min_samples_split': [2, 5]
}
# Perform grid search
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X, y)
# Print best parameters and score
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")
This code automates the tedious process of finding the best settings—or hyperparameters—for your model. It sets up a param_grid, which is a dictionary of different values to test for settings like n_estimators and max_depth.
GridSearchCVexhaustively tries every possible combination of these settings.- The
cv=5argument ensures a robust evaluation by splitting the data into five "folds" and rotating which one is used for testing. - Finally, it reveals the winning combination and its score, saving you from manual trial and error.
Get started with Replit
Turn what you've learned into a working tool. Just tell Replit Agent: "Build a dashboard to predict customer churn" or "Create an API that classifies support tickets by topic."
The Agent writes the code, tests for errors, and deploys your app. You can focus on the idea, not the setup. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)
.png)
.png)