← Back to dmlc/xgboost

How to Deploy & Use dmlc/xgboost

XGBoost Deployment and Usage Guide

Prerequisites

Before installing XGBoost, ensure you have the following:

  • C++ Compiler: A C++ compiler that supports C++11 (GCC 4.8.2 or later, Clang 3.3 or later, or Visual Studio 2015 or later)
  • Python (for Python interface): Python 3.6+ with pip
  • R (for R interface): R 3.2+
  • Java (for JVM interfaces): Java 8+
  • CMake: Version 3.11+ for building from source
  • Git: For cloning the repository

Installation

Option 1: Install from Package Managers (Recommended)

Python

pip install xgboost

R

install.packages("xgboost")

Conda

conda install py-xgboost -c conda-forge

Java/Scala

# Add to your Maven pom.xml
<dependency>
  <groupId>ml.dmlc</groupId>
  <artifactId>xgboost4j</artifactId>
  <version>1.7.5</version>
</dependency>

Option 2: Build from Source

# Clone the repository
git clone --recursive https://github.com/dmlc/xgboost.git
cd xgboost

# Create build directory and configure
mkdir build && cd build
cmake ..

# Build
cmake --build . --config Release

# Install
sudo cmake --build . --config Release --target install

Configuration

Environment Variables

  • XGBOOST_BUILD_CONFIG: Custom build configuration (optional)
  • JAVA_HOME: Required for JVM interfaces (set to your JDK path)

Configuration Files

XGBoost uses parameter dictionaries for model configuration. Example parameters:

params = {
    'objective': 'reg:squarederror',
    'max_depth': 6,
    'learning_rate': 0.1,
    'n_estimators': 100
}

Build & Run

Local Development

Python Example

import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create DMatrix
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Train model
model = xgb.train(params, dtrain, num_boost_round=10)

# Make predictions
predictions = model.predict(dtest)

R Example

library(xgboost)

# Load data
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

# Create DMatrix
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)

# Train model
model <- xgb.train(
  params = list(objective = "binary:logistic", max_depth = 2),
  data = dtrain,
  nrounds = 2
)

# Make predictions
pred <- predict(model, dtest)

Production

For production deployments, consider:

  • Model Serialization: Use model.save_model('model.json') in Python or xgb.save.model(model, 'model.json') in R
  • Cross-Platform Deployment: XGBoost models can be deployed across different platforms using the same model file

Deployment

Cloud Platforms

AWS

  • SageMaker: XGBoost is natively supported
  • Lambda: Package with your application code
  • EC2: Install via package manager or build from source

Google Cloud

  • AI Platform: Native XGBoost support
  • Cloud Functions: Include XGBoost in deployment package
  • Compute Engine: Standard installation

Azure

  • Machine Learning Service: XGBoost integration
  • Functions: Package with application

Distributed Environments

Spark/PySpark

from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from xgboost.spark import XGBoostClassifier

# Configure Spark XGBoost
xgb = XGBoostClassifier(
    features_col="features",
    label_col="label",
    max_depth=5,
    num_round=100
)

Dask

import dask.dataframe as dd
from dask_ml.model_selection import train_test_split
import xgboost as xgb

# Use Dask dataframes with XGBoost
df = dd.read_csv('data.csv')
X_train, X_test, y_train, y_test = train_test_split(df.drop('target'), df['target'])

Troubleshooting

Common Issues

  1. Build Failures

    • Ensure C++11 support: g++ --version should be 4.8.2+
    • Check CMake version: cmake --version should be 3.11+
    • For macOS: Install OpenMP with brew install libomp
  2. Python Import Errors

    # Reinstall with correct Python version
    pip install --force-reinstall xgboost
    
  3. Memory Issues

    • Increase system memory for large datasets
    • Use tree_method='hist' or tree_method='approx' for memory efficiency
  4. Version Compatibility

    • Check Python/R version compatibility in documentation
    • Use compatible versions: Python 3.6+, R 3.2+
  5. Distributed Setup Issues

    • Ensure consistent XGBoost versions across cluster nodes
    • Check network connectivity between nodes
    • Verify Java version for JVM interfaces

Performance Optimization

# Use GPU acceleration if available
params = {
    'tree_method': 'gpu_hist',
    'max_depth': 6,
    'learning_rate': 0.1
}

Getting Help