Cheerio Deployment and Usage Guide
Prerequisites
- Node.js (version 16 or higher)
- npm (version 8 or higher)
- TypeScript (for development, if modifying source code)
Installation
Using npm (Recommended)
npm install cheerio
Using yarn
yarn add cheerio
From Source
- Clone the repository:
git clone https://github.com/cheeriojs/cheerio.git
cd cheerio
- Install dependencies:
npm install
- Build the project:
npm run build
Configuration
Cheerio doesn't require complex configuration for basic usage. However, you can customize parsing options when loading HTML:
const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>', {
// Options
xmlMode: false, // Use HTML parsing mode
decodeEntities: true, // Decode HTML entities
lowerCaseTags: false, // Don't convert tags to lowercase
recognizeSelfClosing: true, // Recognize self-closing tags
});
Build & Run
Development
For development, you can use Cheerio directly without building:
# Run tests
npm test
# Run linting
npm run lint
# Build TypeScript
npm run build
Basic Usage Example
const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
// Get text content
console.log($('h2.title').text()); // "Hello world"
// Get HTML content
console.log($('h2').html()); // "Hello world"
// Set attributes
$('h2').attr('id', 'title');
console.log($('h2').attr('id')); // "title"
// Add classes
$('h2').addClass('main-title');
console.log($('h2').hasClass('main-title')); // true
Deployment
Cheerio is a library for server-side HTML parsing and manipulation, so it doesn't have a traditional deployment process. However, here are common deployment scenarios:
For Node.js Applications
- Package your application with Cheerio as a dependency:
npm install cheerio
- Deploy to any Node.js hosting platform:
- Vercel (for serverless functions)
- Heroku (traditional Node.js hosting)
- AWS Lambda (serverless)
- DigitalOcean App Platform
- Railway
For CLI Tools
If building a CLI tool with Cheerio:
- Package as an npm module:
npm publish
- Or distribute as a standalone binary using tools like:
- pkg (for Node.js)
- nexe (for Node.js)
For Browser Usage
Cheerio is primarily designed for server-side use. For browser usage, consider:
- Bundling with webpack/rollup for browser compatibility
- Using alternatives like jQuery for browser-based DOM manipulation
Troubleshooting
Common Issues
1. "cheerio is not defined" or "Cannot find module 'cheerio'"
Solution: Ensure Cheerio is installed in your project:
npm install cheerio
2. HTML parsing issues with malformed HTML
Solution: Use Cheerio's built-in error handling or pre-process HTML:
const $ = cheerio.load(html, {
// Enable XML mode for stricter parsing
xmlMode: true,
// Handle self-closing tags
recognizeSelfClosing: true,
});
3. Performance issues with large documents
Solution: Use Cheerio's streaming capabilities or limit DOM traversal:
// Use more specific selectors
const items = $('.container .item');
// Limit traversal depth
const directChildren = $('.parent > .child');
4. Attribute manipulation not working
Solution: Ensure you're working with the correct element type:
// Check if element exists
if ($('selector').length) {
// Manipulate attributes
$('selector').attr('data-custom', 'value');
}
5. TypeScript compilation errors
Solution: Install type definitions:
npm install --save-dev @types/node
Debugging Tips
- Log the HTML structure:
console.log($.html());
- Check element existence:
console.log($('selector').length);
- Use Cheerio's built-in methods:
// Check if element has class
console.log($('selector').hasClass('classname'));
// Get all attributes
console.log($('selector').attr());
Performance Optimization
- Reuse Cheerio instances when processing multiple documents:
const $ = cheerio.load(html1);
// Process html1
$.load(html2); // Reuse the same instance
- Use efficient selectors:
// Better than traversing entire DOM
const items = $('.container .item');
- Limit DOM traversal:
// Use direct child selectors
const directChildren = $('.parent > .child');
This guide provides a comprehensive overview of installing, configuring, and using Cheerio for HTML parsing and manipulation in your projects.