Natural Node.js NLP Library - Deployment & Usage Guide
Prerequisites
- Node.js: Version 12.x or higher (LTS recommended)
- npm: Version 6.x or higher (or Yarn 1.22+)
- Git: Required only for development/contributing
- WordNet Database (optional): Required only for WordNet lexical lookups. Download from Princeton WordNet
Installation
As a Project Dependency
npm install natural
For TypeScript support (types included):
npm install --save-dev @types/natural # if needed, though the package includes TypeScript support
Development Setup (Contributors)
git clone https://github.com/NaturalNode/natural.git
cd natural
npm install
Configuration
WordNet Configuration (Optional)
If using WordNet features, set the path to the WordNet database files:
const natural = require('natural');
const wordnet = new natural.WordNet('/path/to/wordnet/dict');
Or use environment variables:
export WORDNET_PATH=/usr/local/share/wordnet/dict
TypeScript Configuration
Add to your tsconfig.json (if strict mode is enabled):
{
"compilerOptions": {
"esModuleInterop": true,
"moduleResolution": "node"
}
}
Build & Run
Running Tests
# Run full test suite
npm test
# Run with coverage
npm run coverage
Linting
The project uses JavaScript Standard Style:
npm run lint
# or
npx standard
Building (if modifying source)
# Compile TypeScript definitions (if applicable)
npm run build
# Clean build artifacts
npm run clean
Usage Examples
Tokenization
const natural = require('natural');
// Word tokenizer
const tokenizer = new natural.WordTokenizer();
console.log(tokenizer.tokenize("your text here"));
// Japanese tokenizer
const jaTokenizer = new natural.TokenizerJa();
console.log(jaTokenizer.tokenize("日本語のテキスト"));
// Treebank tokenizer
const treebankTokenizer = new natural.TreebankWordTokenizer();
Stemming
// Porter stemmer (English)
const stemmer = natural.PorterStemmer;
console.log(stemmer.stem("words")); // 'word'
// Indonesian stemmer
const IndonesianStemmer = natural.StemmerId;
const stemmerId = new IndonesianStemmer();
console.log(stemmerId.stem("memperkenankan"));
// German stemmer
const germanStemmer = natural.StemmerDe;
Part-of-Speech Tagging
const BrillPOS = require('natural/lib/natural/brill_pos_tagger');
// Load lexicon and rules
const lexicon = new BrillPOS.Lexicon('EN', 'N');
const rules = new BrillPOS.RuleSet('EN');
const tagger = new BrillPOS.BrillPOSTagger(lexicon, rules);
const sentence = ["I", "saw", "the", "man", "with", "the", "telescope"];
console.log(tagger.tag(sentence));
Japanese Transliteration
const transliterator = require('natural/lib/natural/transliterators/ja');
// Convert Katakana/Hiragana to Hepburn romanization
const result = transliterator('カタカナ'); // 'katakana'
WordNet Lookup
const wordnet = new natural.WordNet();
wordnet.lookup('node', function(results) {
results.forEach(function(result) {
console.log(result.synsetOffset + ': ' + result.gloss);
});
});
Deployment
Production Usage in Applications
-
Install as dependency:
npm install --production natural -
Memory Management: For high-throughput applications, initialize heavy objects (tokenizers, classifiers) once and reuse:
// Initialize once const tokenizer = new natural.WordTokenizer(); // Use in request handler app.post('/analyze', (req, res) => { const tokens = tokenizer.tokenize(req.body.text); // ... }); -
Bundling: For browser usage, use Webpack or Rollup with Node.js polyfills:
// webpack.config.js resolve: { fallback: { "path": require.resolve("path-browserify"), "fs": false } }
Publishing to npm (Maintainers)
# Version bump
npm version patch|minor|major
# Publish
npm publish
# Verify
npm view natural
Platform Recommendations
- Serverless (AWS Lambda/GCP Functions): Package includes all necessary files; ensure
WORDNET_PATHis set if using WordNet - Docker: Use official Node.js images:
FROM node:18-alpine COPY package*.json ./ RUN npm ci --only=production COPY . . CMD ["node", "index.js"] - Heroku/Railway: Standard Node.js buildpack works; add
naturalto dependencies
Troubleshooting
"Module not found" errors for specific languages
Issue: Error when requiring specific stemmers/tokenizers
Solution: Use full path or ensure proper installation:
// Instead of relying on main export
const IndonesianStemmer = require('natural/lib/natural/stemmers/indonesian');
WordNet "ENOENT" errors
Issue: Error: ENOENT: no such file or directory when using WordNet
Fix: Download WordNet 3.0 and set correct path:
wget http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz
tar -xzf wn3.1.dict.tar.gz
export WORDNET_PATH=$(pwd)/dict
Memory issues with large corpora
Issue: Heap out of memory when processing large texts
Solution: Increase Node.js memory limit:
node --max-old-space-size=4096 app.js
TypeScript type errors
Issue: Cannot find module 'natural' or type errors
Fix: Ensure esModuleInterop is enabled in tsconfig.json or use:
import * as natural from 'natural';
Performance degradation
Issue: Slow processing on first run
Cause: Some tokenizers (like Japanese) load dictionaries on first use
Solution: Warm up on application startup:
// In app initialization
new natural.TokenizerJa().tokenize('warmup');