FastText Deployment and Usage Guide
Prerequisites
- Compiler: C++11 compatible compiler (g++-4.7.2 or newer, or clang-3.3 or newer)
- Build Tools: GNU Make (for make-based build) or CMake 2.8.9 or newer (for cmake-based build)
- Operating System: Modern Mac OS or Linux distributions (tested on Debian jessie and newer)
- Python Dependencies (for Python bindings):
- Python 2.7 or >=3.4
- NumPy & SciPy
- pybind11
Installation
Option 1: Using Make (Recommended)
# Download the latest stable release
wget https://github.com/facebookresearch/fastText/archive/v0.9.2.zip
# Extract the archive
unzip v0.9.2.zip
# Navigate to the extracted directory
cd fastText-0.9.2
# Build the project
make
Option 2: Using CMake
# Clone the repository (master branch for latest features)
git clone https://github.com/facebookresearch/fastText.git
# Navigate to the project directory
cd fastText
# Create a build directory and navigate to it
mkdir build && cd build
# Configure with CMake
cmake ..
# Build and install
make && make install
Option 3: Python Bindings
# Clone the repository (master branch)
git clone https://github.com/facebookresearch/fastText.git
# Navigate to the project directory
cd fastText
# Install Python package
pip install .
Configuration
- Compiler Selection: If you want to use a compiler other than the default system-wide compiler, update the
CCandINCLUDESmacros at the beginning of the Makefile. - Word Vector Parameters: FastText uses character n-grams from 3 to 6 characters by default. You can modify these parameters when running the training commands.
Build & Run
Local Development
After building with make, you can run fastText commands directly:
# Word representation learning (skipgram model)
./fasttext skipgram -input data.txt -output model
# Word representation learning (cbow model)
./fasttext cbow -input data.txt -output model
# Text classification
./fasttext supervised -input train.txt -output model
Production Usage
For production deployments, consider using the CMake build to generate shared libraries:
# After cmake build
./fasttext supervised -input train.txt -output model -epoch 25 -lr 1.0
Deployment
FastText is a C++ library that can be deployed in various environments:
Web Applications
- Docker: Containerize your application with the fastText binary and required dependencies
- Cloud Functions: Deploy as serverless functions (AWS Lambda, Google Cloud Functions) with the binary included
API Services
- Flask/FastAPI: Create REST APIs wrapping fastText functionality
- Node.js: Use child processes to call the fastText binary from Node applications
Machine Learning Pipelines
- Apache Spark: Integrate fastText as a transformer in ML pipelines
- Kubeflow: Deploy as part of MLOps workflows
Troubleshooting
Common Issues and Solutions
1. Compilation Errors
Issue: "error: 'to_string' is not a member of 'std'" Solution: Your compiler doesn't support C++11. Upgrade to g++-4.7.2 or newer, or clang-3.3 or newer.
2. Missing Dependencies
Issue: "make: *** No rule to make target 'fasttext', needed by 'all'. Stop."
Solution: Ensure you have GNU Make installed and run make from the correct directory.
3. Python Bindings Not Working
Issue: "ImportError: No module named 'fasttext'"
Solution: Verify that you ran pip install . from the correct directory and that your Python environment has the required dependencies (NumPy, SciPy, pybind11).
4. Memory Issues During Training
Issue: "std::bad_alloc" or system running out of memory Solution: Reduce the training data size, decrease the number of dimensions, or use a machine with more RAM.
5. Text Encoding Problems
Issue: Characters not displaying correctly or training failing Solution: Ensure your input text files are UTF-8 encoded, as fastText expects UTF-8 input.
6. CMake Configuration Issues
Issue: CMake configuration fails Solution: Ensure you have CMake 2.8.9 or newer installed, and that you're running cmake from within a build directory.
For additional help, visit the fastText FAQ or join the fastText community.