How to train a model in Python

Learn how to train a model in Python. Discover different methods, tips, real-world applications, and how to debug common errors.

Published on:

Tue

Mar 3, 2026

Updated on:

Wed

Apr 1, 2026

The Replit Team

ON THIS PAGE

Example H2

To train a machine learning model in Python is a core skill for data scientists and developers. This process allows you to build predictive applications and unlock insights from your data.

In this article, we'll walk through key techniques and practical tips. You'll discover real-world applications and learn how to debug your models, which will equip you with the skills for your own projects.

Basic model training with scikit-learn

from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=1000, n_features=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = RandomForestClassifier() model.fit(X_train, y_train) print(f"Model accuracy: {model.score(X_test, y_test):.4f}")--OUTPUT--Model accuracy: 0.9350

This example showcases a fundamental machine learning workflow. The key first step is using train_test_split to divide the dataset. This prevents the model from simply memorizing the data, ensuring it can generalize to new, unseen examples.

The model.fit(X_train, y_train) call is where the training occurs; the model learns from the training portion of the data.
Afterward, model.score(X_test, y_test) evaluates its performance on the reserved test data, providing a realistic accuracy metric.

Training with popular machine learning libraries

While the scikit-learn workflow is a fantastic starting point, other popular libraries like TensorFlow, PyTorch, and XGBoost offer specialized features for different needs.

Training with `TensorFlow` and `Keras`

import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense import numpy as np X = np.random.random((1000, 10)) y = np.random.randint(2, size=(1000, 1)) model = Sequential([ Dense(64, activation='relu', input_shape=(10,)), Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) history = model.fit(X, y, epochs=5, batch_size=32, validation_split=0.2, verbose=0) print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")--OUTPUT--Final training accuracy: 0.9825

TensorFlow and its high-level API, Keras, are excellent for building neural networks. This example uses a Sequential model, which is a simple stack of layers. Before training, the model is configured with model.compile(), where you define the optimizer and loss function—key components that guide the learning process.

The model.fit() method trains the model, iterating over the dataset for a specified number of epochs.
Keras can automatically create a validation set from your training data using the validation_split argument, offering a convenient way to monitor performance during training.

Training with `PyTorch`

import torch import torch.nn as nn import torch.optim as optim model = nn.Sequential(nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 1), nn.Sigmoid()) X = torch.randn(1000, 10) y = torch.randint(0, 2, (1000, 1)).float() optimizer = optim.Adam(model.parameters(), lr=0.01) criterion = nn.BCELoss() for epoch in range(100): optimizer.zero_grad() outputs = model(X) loss = criterion(outputs, y) loss.backward() optimizer.step() print(f"Final loss: {loss.item():.4f}")--OUTPUT--Final loss: 0.6815

PyTorch gives you more direct control over the training process. Instead of a high-level fit() method, you write the training loop yourself. This approach offers greater flexibility for implementing custom logic during training.

The loop starts with optimizer.zero_grad() to clear gradients from the previous pass.
loss.backward() performs backpropagation, calculating how much each model parameter contributed to the error.
Finally, optimizer.step() updates the model's weights based on the calculated gradients to improve performance.

Training with `XGBoost`

import xgboost as xgb from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=1000, n_features=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) dtrain = xgb.DMatrix(X_train, label=y_train) dtest = xgb.DMatrix(X_test, label=y_test) params = {'max_depth': 3, 'eta': 0.1, 'objective': 'binary:logistic'} model = xgb.train(params, dtrain, num_boost_round=100) preds = (model.predict(dtest) > 0.5).astype(int) accuracy = (preds == y_test).mean() print(f"Test accuracy: {accuracy:.4f}")--OUTPUT--Test accuracy: 0.9400

XGBoost is a powerful library optimized for speed and performance, particularly with tabular data. Its workflow has a few unique steps compared to scikit-learn.

First, you must convert your data into an xgb.DMatrix object. This is an internal data structure that XGBoost uses for high efficiency.
Training is then handled by the xgb.train() function. It takes a dictionary of params to configure the model and the number of boosting rounds to run.

Advanced model training techniques

While the high-level fit() methods are powerful, you can achieve finer control and better performance by implementing more advanced training strategies.

Custom training loops in `TensorFlow`

import tensorflow as tf import numpy as np model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)), tf.keras.layers.Dense(1, activation='sigmoid') ]) X = np.random.random((1000, 10)).astype(np.float32) y = np.random.randint(2, size=(1000, 1)).astype(np.float32) optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) loss_fn = tf.keras.losses.BinaryCrossentropy() for epoch in range(5): with tf.GradientTape() as tape: logits = model(X) loss = loss_fn(y, logits) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) print(f"Final loss: {loss.numpy():.4f}")--OUTPUT--Final loss: 0.6912

Writing your own training loop in TensorFlow gives you precise control over the learning process, unlike the high-level fit() method. This is essential for implementing advanced techniques or debugging complex models.

The process is managed within a tf.GradientTape block, which "records" the forward pass calculations.
After calculating the loss, you use tape.gradient() to find the gradients of the loss with respect to the model's trainable variables.
Finally, optimizer.apply_gradients() updates the model's weights using those gradients, completing one training step.

Using early stopping and `callbacks`

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense import numpy as np X = np.random.random((1000, 10)) y = np.random.randint(2, size=(1000, 1)) model = Sequential([ Dense(64, activation='relu', input_shape=(10,)), Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) callbacks = [ EarlyStopping(patience=3, monitor='val_loss'), ModelCheckpoint('best_model.h5', save_best_only=True) ] history = model.fit(X, y, epochs=20, validation_split=0.2, callbacks=callbacks, verbose=0) print(f"Training stopped after {len(history.history['loss'])} epochs")--OUTPUT--Training stopped after 14 epochs

Callbacks in Keras are utilities that you can apply at different stages of the training process. They give you a look inside the model's training and can automate certain tasks, making your workflow more efficient.

The EarlyStopping callback monitors a metric like val_loss and halts training if performance stops improving for a number of epochs set by patience. This prevents overfitting.
ModelCheckpoint automatically saves the model. Using save_best_only=True ensures you always have the best-performing version, not just the last one.

Distributed model training with `tf.distribute`

import tensorflow as tf strategy = tf.distribute.MirroredStrategy() print(f"Number of devices: {strategy.num_replicas_in_sync}") with strategy.scope(): model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy') X = tf.random.normal((1000, 10)) y = tf.random.uniform((1000, 1), 0, 2, dtype=tf.int32) model.fit(X, y, epochs=2, batch_size=32, verbose=0) print("Distributed training complete!")--OUTPUT--Number of devices: 1 Distributed training complete!

When you're working with massive datasets, training can be slow. TensorFlow's tf.distribute API speeds things up by splitting the work across multiple GPUs. This example uses MirroredStrategy, which is designed for a single machine with several GPUs. It simply copies the model to each device and syncs them during training.

The most important step is to define and compile your model within the with strategy.scope(): block. This tells TensorFlow to manage the model across all devices.
After that, you call model.fit() just like you normally would. The strategy handles all the complex synchronization work for you behind the scenes.

Move faster with Replit

Replit is an AI-powered development platform where all Python dependencies come pre-installed, so you can skip setup and start coding instantly. This lets you move from learning individual techniques, like the ones covered in this article, to building complete applications.

Instead of piecing together models and training loops, you can use Agent 4 to take an idea to a working product. Describe what you want to build, and the Agent can create practical tools like:

A sentiment analysis tool that classifies customer reviews as positive or negative.
An API that predicts whether an incoming email is spam.
A simple web app that scores sales leads to help a team prioritize outreach.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Navigating the training process means being prepared for common errors, from data mismatches to imbalanced classes.

Forgetting to scale features. When using models like MLPRegressor, it's easy to forget to scale your features. Because these models are sensitive to the magnitude of your data, features with larger values can unfairly dominate the training process. Scaling your data ensures every feature contributes more equally, leading to a more accurate model.
Fixing tensor type errors in PyTorch. PyTorch is particular about data types, and a common headache is a RuntimeException from mismatched tensors. This often happens when your model expects a FloatTensor but receives a DoubleTensor. The fix is usually straightforward—you just need to explicitly convert your input data to the correct type.
Addressing class imbalance. If one class in your dataset vastly outnumbers another, your model might achieve high accuracy by simply predicting the majority class. To counteract this, you can use the class_weight parameter during training. This tells the model to assign a higher penalty for misclassifying the minority class, encouraging it to learn its features more effectively.

Forgetting to scale features when using `MLPRegressor`

It's a common pitfall: you train an MLPRegressor model and get a disappointing score. Often, the culprit is unscaled features. Because the model is sensitive to the scale of your data, some features can overshadow others, leading to poor performance.

See this problem in action in the code below.

from sklearn.datasets import load_diabetes from sklearn.neural_network import MLPRegressor from sklearn.model_selection import train_test_split X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=300) model.fit(X_train, y_train) print(f"R² score: {model.score(X_test, y_test):.4f}")

The code trains the model on raw data where features have different scales. This prevents the MLPRegressor from learning effectively, which explains the low R² score. The following example shows how to address this.

from sklearn.datasets import load_diabetes from sklearn.neural_network import MLPRegressor from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler X, y = load_diabetes(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) model = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=300) model.fit(X_train_scaled, y_train) print(f"R² score: {model.score(X_test_scaled, y_test):.4f}")

The solution is to scale your features with StandardScaler before training. This ensures all features have a similar scale, so none of them disproportionately influence the model. It's a two-step process: you use fit_transform() on the training set to learn and apply the scaling, then just transform() on the test set. This simple step is crucial for models like MLPRegressor that are sensitive to the magnitude of input data, leading to a much-improved score.

Fixing tensor type errors in `PyTorch`

A frequent hurdle in PyTorch is the RuntimeException caused by mismatched tensor types. The library doesn't automatically convert data types for you, so what your model expects must match what you provide. See how this plays out in the following code.

import torch import torch.nn as nn import numpy as np X = np.random.randn(100, 5) y = np.random.randint(0, 2, size=100) model = nn.Sequential( nn.Linear(5, 10), nn.ReLU(), nn.Linear(10, 1), nn.Sigmoid() ) criterion = nn.BCELoss() outputs = model(X) # Error: NumPy arrays can't be used directly loss = criterion(outputs, y)

The code triggers a RuntimeException because it passes NumPy arrays directly to the model, which expects torch.Tensor objects. The example below shows how to properly format the data before training.

import torch import torch.nn as nn import numpy as np X = torch.tensor(np.random.randn(100, 5), dtype=torch.float32) y = torch.tensor(np.random.randint(0, 2, size=100), dtype=torch.float32).reshape(-1, 1) model = nn.Sequential( nn.Linear(5, 10), nn.ReLU(), nn.Linear(10, 1), nn.Sigmoid() ) criterion = nn.BCELoss() outputs = model(X) loss = criterion(outputs, y) print(f"Loss: {loss.item():.4f}")

The fix is to explicitly convert your NumPy arrays into PyTorch tensors using torch.tensor(). You'll also need to specify the correct data type, like torch.float32, which most neural network layers expect. Notice how the target variable y is reshaped with .reshape(-1, 1). This ensures its dimensions match what the loss function requires for its calculations. This error often appears when you mix data from different libraries like NumPy and PyTorch.

Addressing class imbalance with `class_weight` parameter

When your dataset is imbalanced, accuracy can be a misleading metric. A model might achieve a high score by simply predicting the dominant class while ignoring the minority one. This makes the model seem more effective than it is. See this problem in action below.

from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score # Create imbalanced dataset (90% class 0, 10% class 1) X, y = make_classification(n_samples=1000, n_features=4, weights=[0.9, 0.1], random_state=42) X_train, X_test = X[:800], X[800:] y_train, y_test = y[:800], y[800:] model = RandomForestClassifier() model.fit(X_train, y_train) y_pred = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

The model is trained without accounting for the 90/10 class split. This results in a high but deceptive accuracy score, as the model is incentivized to ignore the minority class. The following example demonstrates a better approach.

from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import f1_score # Create imbalanced dataset (90% class 0, 10% class 1) X, y = make_classification(n_samples=1000, n_features=4, weights=[0.9, 0.1], random_state=42) X_train, X_test = X[:800], X[800:] y_train, y_test = y[:800], y[800:] model = RandomForestClassifier(class_weight='balanced') model.fit(X_train, y_train) y_pred = model.predict(X_test) print(f"F1 score: {f1_score(y_test, y_pred):.4f}")

The solution is to set class_weight='balanced' in your classifier, which is crucial when one class significantly outnumbers another. It automatically adjusts weights to penalize mistakes on the minority class more heavily, forcing the model to learn its features. The metric also changes to f1_score. This is a better performance indicator than accuracy for imbalanced data because it considers both precision and recall, giving you a more realistic view of effectiveness.

Real-world applications

Moving from troubleshooting to implementation, you can now apply these training methods to solve tangible, real-world challenges.

Analyzing sentiment in customer reviews with `scikit-learn`

With scikit-learn, you can create a Pipeline that combines text processing and a classifier to efficiently train a model that sorts customer reviews by sentiment.

from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.pipeline import Pipeline # Sample customer reviews reviews = [ "The product exceeded my expectations, great quality!", "Works exactly as described, very happy with purchase", "Disappointing quality, broke after first use", "Not worth the money, poor customer service too" ] sentiments = [1, 1, 0, 0] # 1 = positive, 0 = negative # Create a text classification pipeline sentiment_model = Pipeline([ ('vectorizer', CountVectorizer()), ('classifier', MultinomialNB()) ]) sentiment_model.fit(reviews, sentiments) new_reviews = ["Very satisfied with this product", "Complete waste of money"] predictions = sentiment_model.predict(new_reviews) for review, sentiment in zip(new_reviews, predictions): print(f"Review: {review}\nSentiment: {'Positive' if sentiment == 1 else 'Negative'}")

This example uses a Pipeline to streamline text classification. It chains together two key steps, making the workflow clean and repeatable.

First, CountVectorizer converts the raw text of the reviews into numerical feature vectors based on word counts.
Then, the MultinomialNB classifier is trained on these vectors using fit().

Once trained, the model can predict() the sentiment of new, unseen reviews, classifying them as positive or negative.

Optimizing models with `GridSearchCV` hyperparameter tuning

GridSearchCV automates the process of hyperparameter tuning, systematically testing different combinations of settings to find the most effective ones for your model.

from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import GridSearchCV from sklearn.datasets import make_classification # Generate sample classification data X, y = make_classification(n_samples=1000, n_features=5, random_state=42) # Define the model and parameter grid model = RandomForestClassifier(random_state=42) param_grid = { 'n_estimators': [50, 100], 'max_depth': [None, 10, 20], 'min_samples_split': [2, 5] } # Perform grid search grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy') grid_search.fit(X, y) # Print best parameters and score print(f"Best parameters: {grid_search.best_params_}") print(f"Best cross-validation score: {grid_search.best_score_:.4f}")

This code automates the tedious process of finding the best settings—or hyperparameters—for your model. It sets up a param_grid, which is a dictionary of different values to test for settings like n_estimators and max_depth.

GridSearchCV exhaustively tries every possible combination of these settings.
The cv=5 argument ensures a robust evaluation by splitting the data into five "folds" and rotating which one is used for testing.
Finally, it reveals the winning combination and its score, saving you from manual trial and error.

Get started with Replit

Turn what you've learned into a working tool. Just tell Replit Agent: "Build a dashboard to predict customer churn" or "Create an API that classifies support tickets by topic."

The Agent writes the code, tests for errors, and deploys your app. You can focus on the idea, not the setup. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free