# Python Machine Learning (1st Edition) - Setup and Usage Guide
Comprehensive guide for running the code examples from the "Python Machine Learning" book by Sebastian Raschka.
## 1. Prerequisites
**Required:**
- Python 3.6+ (Python 2.7 supported but deprecated)
- Git
- pip or conda package manager
**Core Dependencies:**
- NumPy ≥1.9.1
- SciPy ≥0.14
- scikit-learn ≥0.15 (0.18+ recommended)
- matplotlib ≥1.4.0
- pandas ≥0.16
- Jupyter Notebook or JupyterLab
**Optional (for specific chapters):**
- Theano ≥0.7 (Chapter 13 - Neural Networks) *Note: Theano development ceased in 2017*
- Flask ≥0.10.1 (Chapter 9 - Web Application)
- PyYAML (Chapter 9)
## 2. Installation
### Clone Repository
```bash
git clone https://github.com/rasbt/python-machine-learning-book.git
cd python-machine-learning-book
Option A: Using pip (Recommended)
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install numpy scipy scikit-learn matplotlib pandas jupyter
pip install flask pyyaml # For Chapter 9
pip install theano # Optional, for Chapter 13
Option B: Using Conda
conda create -n pyml python=3.8
conda activate pyml
conda install numpy scipy scikit-learn matplotlib pandas jupyter flask pyyaml
conda install theano # Optional
Verify Installation
python -c "import sklearn; print(sklearn.__version__)"
jupyter --version
3. Configuration
Theano Configuration (Chapter 13 only)
Create ~/.theanorc for GPU support or optimized CPU:
[global]
device = cpu
floatX = float32
Data Paths
Datasets are downloaded automatically by most notebooks. For Chapter 9 (Web Application), ensure write permissions in the code/ch09/ directory for the SQLite database.
Environment Variables (Optional)
export PYTHONPATH="${PYTHONPATH}:$(pwd)/code"
4. Build & Run
Launch Jupyter Notebook
jupyter notebook
Navigate to code/chXX/ directories and open .ipynb files.
Alternative: JupyterLab
jupyter lab
Running Individual Scripts
Some chapters contain standalone .py files:
cd code/ch09
python app.py # Starts the movie review classifier web service
Chapter 9 Web Application Deployment
The Chapter 9 example includes a Flask application:
cd code/ch09
export FLASK_APP=app.py
export FLASK_ENV=development
flask run --host=0.0.0.0 --port=5000
Access at http://localhost:5000
5. Deployment Options
Static Notebook Viewing (No Installation)
View rendered notebooks via NbViewer (read-only):
- Chapter 1: http://nbviewer.ipython.org/github/rasbt/python-machine-learning-book/blob/master/code/ch01/ch01.ipynb
- Replace
ch01withch02throughch13for subsequent chapters
Production Deployment (Chapter 9 Web App)
For deploying the sentiment analysis web application:
Heroku:
cd code/ch09
# Create Procfile with: web: gunicorn app:app
pip freeze > requirements.txt
heroku create your-ml-app
git push heroku master
Docker:
FROM python:3.8-slim
WORKDIR /app
COPY code/ch09/requirements.txt .
RUN pip install -r requirements.txt
COPY code/ch09/ .
EXPOSE 5000
CMD ["python", "app.py"]
Model Serialization
Export trained models for production use:
import pickle
# From any notebook cell
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
6. Troubleshooting
Theano Issues (Chapter 13)
Problem: ImportError: No module named theano or GPU errors
Solution: Theano is deprecated. Use CPU mode only or migrate to TensorFlow/PyTorch for production:
# In notebooks, replace Theano backend with pure NumPy or modern frameworks
import os
os.environ["THEANO_FLAGS"] = "device=cpu,floatX=float32"
Scikit-learn API Changes
Problem: AttributeError: 'module' object has no attribute 'cross_validation'
Solution: The book uses older scikit-learn APIs. Update imports:
# Old (book code)
from sklearn.cross_validation import train_test_split
# New (modern sklearn)
from sklearn.model_selection import train_test_split
Python 2 vs 3 Compatibility
Problem: print statement syntax errors
Solution: Ensure Python 3.x environment or manually update print "text" to print("text").
Missing Data Files
Problem: FileNotFoundError for CSV/datasets
Solution: Run notebooks from their respective chapter directories:
cd code/ch08
jupyter notebook ../ch08.ipynb
Permission Errors (macOS/Linux)
Problem: Cannot write model pickles or database files
Solution:
chmod 755 code/ch09/
Jupyter Kernel Issues
Problem: Module not found in notebook but available in terminal
Solution: Install kernel in correct environment:
python -m ipykernel install --user --name=pyml --display-name "Python ML"
# Then select Kernel > Change kernel > Python ML in Jupyter
Memory Errors (Large Datasets)
Problem: Kernel dies processing large arrays
Solution: Reduce dataset size in notebook parameters or increase Jupyter memory limits:
jupyter notebook --NotebookApp.max_buffer_size=2147483648