How to train a model in Python

Learn how to train a model in Python. Discover different methods, tips, real-world applications, and how to debug common errors.

How to train a model in Python
Published on: 
Tue
Mar 3, 2026
Updated on: 
Fri
Mar 6, 2026
The Replit Team Logo Image
The Replit Team

Python lets you train machine learning models, a core skill for data scientists. The process transforms raw data into predictive insights. Python's libraries simplify this complex task with powerful, accessible tools.

In this article, you'll explore key techniques and practical tips. You will see real-world applications and learn effective ways to debug. This helps you refine your models and solve common problems.

Basic model training with scikit-learn

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
print(f"Model accuracy: {model.score(X_test, y_test):.4f}")--OUTPUT--Model accuracy: 0.9350

The code snippet uses scikit-learn to demonstrate a fundamental training workflow. The process involves a few key steps:

  • Data Splitting: Using train_test_split is critical. It reserves a portion of the data for testing, which ensures the model is evaluated on information it hasn't seen before.
  • Training: The model.fit() method is where the RandomForestClassifier learns patterns from the training data.
  • Evaluation: Finally, model.score() checks the model's accuracy using the reserved test data. This gives you a reliable measure of how well your model can make predictions on new inputs.

Training with popular machine learning libraries

While the scikit-learn workflow is a great start, you'll often turn to other specialized libraries for more complex deep learning or gradient boosting tasks.

Training with TensorFlow and Keras

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

X = np.random.random((1000, 10))
y = np.random.randint(2, size=(1000, 1))
model = Sequential([
   Dense(64, activation='relu', input_shape=(10,)),
   Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X, y, epochs=5, batch_size=32, validation_split=0.2, verbose=0)
print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")--OUTPUT--Final training accuracy: 0.9825

For deep learning, TensorFlow with its Keras API is a popular choice. The code builds a simple neural network using the Sequential model, which lets you stack layers in order. The process is defined by two main steps:

  • Configuration: The model.compile() method sets up the training process. You define the optimizer, the loss function, and what metrics to track.
  • Training: model.fit() executes the training over a set number of epochs. Keras conveniently handles splitting the data for validation with the validation_split parameter.

Training with PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Sequential(nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 1), nn.Sigmoid())
X = torch.randn(1000, 10)
y = torch.randint(0, 2, (1000, 1)).float()

optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.BCELoss()
for epoch in range(100):
   optimizer.zero_grad()
   outputs = model(X)
   loss = criterion(outputs, y)
   loss.backward()
   optimizer.step()
print(f"Final loss: {loss.item():.4f}")--OUTPUT--Final loss: 0.6815

PyTorch offers a more hands-on training loop than Keras. While model definition with nn.Sequential is familiar, you manage the training process inside a manual loop. This gives you granular control over the learning process, which involves a few key steps on each pass:

  • The optimizer.zero_grad() method resets gradients from the previous iteration.
  • loss.backward() calculates new gradients through backpropagation.
  • optimizer.step() updates the model’s weights based on those gradients.

Training with XGBoost

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
params = {'max_depth': 3, 'eta': 0.1, 'objective': 'binary:logistic'}
model = xgb.train(params, dtrain, num_boost_round=100)
preds = (model.predict(dtest) > 0.5).astype(int)
accuracy = (preds == y_test).mean()
print(f"Test accuracy: {accuracy:.4f}")--OUTPUT--Test accuracy: 0.9400

XGBoost is a go-to for gradient boosting, known for its speed and performance. Unlike other libraries, it uses its own optimized data structure called DMatrix. You'll need to convert your training and test sets into this format before you can start training.

  • Data Structure: The xgb.DMatrix is an internal data structure that XGBoost uses for high efficiency.
  • Training: The core training happens with the xgb.train() function, which takes your parameters and the prepared DMatrix.
  • Prediction: Once trained, model.predict() returns raw probability scores, which you then convert into class labels for evaluation.

Advanced model training techniques

Beyond the standard fit methods, you can gain finer control and boost performance by implementing more advanced training strategies in your workflow.

Custom training loops in TensorFlow

import tensorflow as tf
import numpy as np

model = tf.keras.Sequential([
   tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
   tf.keras.layers.Dense(1, activation='sigmoid')
])
X = np.random.random((1000, 10)).astype(np.float32)
y = np.random.randint(2, size=(1000, 1)).astype(np.float32)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss_fn = tf.keras.losses.BinaryCrossentropy()
for epoch in range(5):
   with tf.GradientTape() as tape:
       logits = model(X)
       loss = loss_fn(y, logits)
   gradients = tape.gradient(loss, model.trainable_variables)
   optimizer.apply_gradients(zip(gradients, model.trainable_variables))
print(f"Final loss: {loss.numpy():.4f}")--OUTPUT--Final loss: 0.6912

While model.fit() is convenient, a custom training loop gives you granular control. This approach is essential for implementing complex model behaviors. The process revolves around TensorFlow's tf.GradientTape, which automatically tracks operations to compute gradients.

  • The tf.GradientTape block records the forward pass—calculating predictions and loss.
  • tape.gradient() uses this recording to compute the gradients of the loss with respect to the model's weights.
  • Finally, optimizer.apply_gradients() uses these gradients to update the model, completing one training step.

Using early stopping and callbacks

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

X = np.random.random((1000, 10))
y = np.random.randint(2, size=(1000, 1))
model = Sequential([
   Dense(64, activation='relu', input_shape=(10,)),
   Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
callbacks = [
   EarlyStopping(patience=3, monitor='val_loss'),
   ModelCheckpoint('best_model.h5', save_best_only=True)
]
history = model.fit(X, y, epochs=20, validation_split=0.2, callbacks=callbacks, verbose=0)
print(f"Training stopped after {len(history.history['loss'])} epochs")--OUTPUT--Training stopped after 14 epochs

Callbacks are powerful tools in Keras that automate actions during training. By passing them to model.fit(), you can make your training process smarter and more efficient. This code uses two key callbacks:

  • EarlyStopping monitors a metric like val_loss and stops training if it doesn't improve for a set number of epochs—defined by patience. This saves time and prevents overfitting.
  • ModelCheckpoint saves the best version of your model as it trains. Using save_best_only=True ensures you always have the top-performing model state, even if performance later declines.

Distributed model training with tf.distribute

import tensorflow as tf

strategy = tf.distribute.MirroredStrategy()
print(f"Number of devices: {strategy.num_replicas_in_sync}")

with strategy.scope():
   model = tf.keras.Sequential([
       tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)),
       tf.keras.layers.Dense(1, activation='sigmoid')
   ])
   model.compile(optimizer='adam', loss='binary_crossentropy')

X = tf.random.normal((1000, 10))
y = tf.random.uniform((1000, 1), 0, 2, dtype=tf.int32)
model.fit(X, y, epochs=2, batch_size=32, verbose=0)
print("Distributed training complete!")--OUTPUT--Number of devices: 1
Distributed training complete!

When models get bigger, training on one GPU can become a bottleneck. TensorFlow's tf.distribute API lets you scale your training across multiple devices to speed things up. The code uses MirroredStrategy, which is designed for training on all the GPUs in a single machine.

  • You simply define and compile your model inside a with strategy.scope(): block.
  • TensorFlow automatically handles the complex work of replicating the model, distributing data, and syncing updates.
  • Best of all, your call to model.fit() doesn't need to change.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. The training workflows you've explored are the foundation for powerful tools. Replit Agent can take these concepts and build them into production applications—complete with databases, APIs, and deployment—directly from your descriptions.

For example, you could use Replit Agent to:

  • Build a spam detection API that uses a trained classifier to filter incoming messages.
  • Create a stock trend dashboard that trains a model with EarlyStopping to prevent overfitting on historical price data.
  • Deploy a customer churn prediction tool that uses an XGBoost model to identify at-risk accounts from user activity.

You provide the idea, and Replit Agent handles the coding, testing, and debugging. Start building your next AI application with Replit Agent.

Common errors and challenges

Training a model often involves navigating a few common hurdles, but they're usually straightforward to fix once you know what to look for.

  • Forgetting to scale features: Neural networks, like scikit-learn's MLPRegressor, are sensitive to the scale of your data. If one feature ranges from 0 to 1 and another from 0 to 100,000, the model may struggle to learn. Scaling your features so they share a similar range helps the model learn more efficiently.
  • Fixing tensor type errors: PyTorch is very strict about data types. A common error is a dtype mismatch, where you might feed integer data to a layer expecting floating-point numbers. You can fix this by explicitly converting your tensors to the required type, often with a simple call to .float().
  • Addressing class imbalance: If your dataset has far more examples of one class than another, your model might just learn to predict the majority class. To solve this, you can use the class_weight parameter in libraries like Keras or scikit-learn. This tells the model to pay more attention to the underrepresented class during training.

Forgetting to scale features when using MLPRegressor

The MLPRegressor is powerful but sensitive to input data scale. When features have vastly different ranges, the model struggles to learn effectively, often resulting in poor performance. The following code demonstrates this common pitfall by training a model on unscaled data.

from sklearn.datasets import load_diabetes
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=300)
model.fit(X_train, y_train)
print(f"R² score: {model.score(X_test, y_test):.4f}")

The low R² score is a direct result of training the MLPRegressor on raw, unscaled data, which highlights the model's sensitivity to feature ranges. The following code shows how to properly prepare the data for better results.

from sklearn.datasets import load_diabetes
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=300)
model.fit(X_train_scaled, y_train)
print(f"R² score: {model.score(X_test_scaled, y_test):.4f}")

The fix is to scale your features before training. By using StandardScaler, you transform the data so every feature has a similar range. You'll want to fit_transform() the training data to learn the scaling parameters and apply them. Then, you use transform() to apply that same scaling to the test data. This simple preprocessing step dramatically improves the model's R² score, showing how crucial it is for algorithms like MLPRegressor.

Fixing tensor type errors in PyTorch

PyTorch is particular about data types, a common source of runtime errors. You can't simply pass a NumPy array to a model that expects a PyTorch tensor. This strictness ensures efficiency but requires you to be mindful of your data. The following code triggers this exact error.

import torch
import torch.nn as nn
import numpy as np

X = np.random.randn(100, 5)
y = np.random.randint(0, 2, size=100)

model = nn.Sequential(
   nn.Linear(5, 10),
   nn.ReLU(),
   nn.Linear(10, 1),
   nn.Sigmoid()
)

criterion = nn.BCELoss()
outputs = model(X)  # Error: NumPy arrays can't be used directly
loss = criterion(outputs, y)

The error is triggered when the model(X) call receives a NumPy array. PyTorch models don't operate on NumPy data directly—they require their input to be PyTorch tensors. The corrected code below shows how to resolve this.

import torch
import torch.nn as nn
import numpy as np

X = torch.tensor(np.random.randn(100, 5), dtype=torch.float32)
y = torch.tensor(np.random.randint(0, 2, size=100), dtype=torch.float32).reshape(-1, 1)

model = nn.Sequential(
   nn.Linear(5, 10),
   nn.ReLU(),
   nn.Linear(10, 1),
   nn.Sigmoid()
)

criterion = nn.BCELoss()
outputs = model(X)
loss = criterion(outputs, y)
print(f"Loss: {loss.item():.4f}")

The solution is to explicitly convert your data into PyTorch tensors with the correct data type. You wrap your NumPy arrays with torch.tensor() and set the dtype to torch.float32, as neural network layers expect floating-point numbers. You also need to ensure the target tensor y has the right shape for the loss function, which is easily done with .reshape(-1, 1). This prevents mismatches when the model processes the data.

Addressing class imbalance with class_weight parameter

When one class heavily outweighs another, accuracy becomes a misleading metric. Your model might achieve a high score by simply predicting the majority class, effectively ignoring the minority group. The following code demonstrates how this imbalance can create a false sense of performance.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Create imbalanced dataset (90% class 0, 10% class 1)
X, y = make_classification(n_samples=1000, n_features=4, weights=[0.9, 0.1], random_state=42)
X_train, X_test = X[:800], X[800:]
y_train, y_test = y[:800], y[800:]

model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

The high accuracy score is deceptive. The model appears effective, but it's mostly just predicting the majority class. The following code shows how to adjust the training process for a more reliable outcome.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score

# Create imbalanced dataset (90% class 0, 10% class 1)
X, y = make_classification(n_samples=1000, n_features=4, weights=[0.9, 0.1], random_state=42)
X_train, X_test = X[:800], X[800:]
y_train, y_test = y[:800], y[800:]

model = RandomForestClassifier(class_weight='balanced')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"F1 score: {f1_score(y_test, y_pred):.4f}")

The fix is to use the class_weight='balanced' parameter. This tells the classifier to give more importance to the minority class during training, effectively penalizing mistakes on it more heavily. As a result, the model learns to identify both classes better, not just the dominant one. The example switches to f1_score for evaluation because it's a more balanced measure of performance than accuracy when you're dealing with imbalanced data.

Real-world applications

With these training and debugging skills, you can build applications that solve real-world problems, from analyzing customer sentiment to optimizing model performance.

Analyzing sentiment in customer reviews with scikit-learn

With scikit-learn, you can build a simple Pipeline to automatically classify customer reviews by positive or negative sentiment.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Sample customer reviews
reviews = [
   "The product exceeded my expectations, great quality!",
   "Works exactly as described, very happy with purchase",
   "Disappointing quality, broke after first use",
   "Not worth the money, poor customer service too"
]
sentiments = [1, 1, 0, 0]  # 1 = positive, 0 = negative

# Create a text classification pipeline
sentiment_model = Pipeline([
   ('vectorizer', CountVectorizer()),
   ('classifier', MultinomialNB())
])

sentiment_model.fit(reviews, sentiments)
new_reviews = ["Very satisfied with this product", "Complete waste of money"]
predictions = sentiment_model.predict(new_reviews)
for review, sentiment in zip(new_reviews, predictions):
   print(f"Review: {review}\nSentiment: {'Positive' if sentiment == 1 else 'Negative'}")

This code builds a text classification model by chaining two steps together in a scikit-learn Pipeline. This approach automates the entire process, making your code cleaner and less error prone.

  • The CountVectorizer first converts the raw text of customer reviews into numerical data based on word counts.
  • Then, a MultinomialNB classifier is trained on this data to recognize patterns associated with positive and negative sentiment.

After training with fit(), the model can instantly predict() the sentiment of new reviews.

Optimizing models with GridSearchCV hyperparameter tuning

Instead of manually tweaking your model’s settings, you can use GridSearchCV to automatically search through a grid of hyperparameters and find the combination that performs best.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

# Generate sample classification data
X, y = make_classification(n_samples=1000, n_features=5, random_state=42)

# Define the model and parameter grid
model = RandomForestClassifier(random_state=42)
param_grid = {
   'n_estimators': [50, 100],
   'max_depth': [None, 10, 20],
   'min_samples_split': [2, 5]
}

# Perform grid search
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X, y)

# Print best parameters and score
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")

This code demonstrates how GridSearchCV methodically finds the best hyperparameters for a model. You define a param_grid dictionary containing different values to test for settings like n_estimators and max_depth.

  • GridSearchCV then trains and evaluates a model for every possible combination of these parameters.
  • The cv=5 argument ensures robust evaluation by using 5-fold cross-validation, which helps prevent the model from simply memorizing the training data.

The result is the set of parameters that yielded the best performance during the search.

Get started with Replit

Turn your knowledge into a real tool with Replit Agent. Describe what you want to build, like “create a sentiment analysis tool for customer reviews” or “build a dashboard that uses EarlyStopping to train a model.”

The agent writes the code, tests for errors, and deploys your application from your description. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.