← Back to cheeriojs/cheerio

How to Deploy & Use cheeriojs/cheerio

Cheerio Deployment and Usage Guide

Prerequisites

  • Node.js (version 16 or higher)
  • npm (version 8 or higher)
  • TypeScript (for development, if modifying source code)

Installation

Using npm (Recommended)

npm install cheerio

Using yarn

yarn add cheerio

From Source

  1. Clone the repository:
git clone https://github.com/cheeriojs/cheerio.git
cd cheerio
  1. Install dependencies:
npm install
  1. Build the project:
npm run build

Configuration

Cheerio doesn't require complex configuration for basic usage. However, you can customize parsing options when loading HTML:

const cheerio = require('cheerio');

const $ = cheerio.load('<h2 class="title">Hello world</h2>', {
  // Options
  xmlMode: false,        // Use HTML parsing mode
  decodeEntities: true,  // Decode HTML entities
  lowerCaseTags: false,  // Don't convert tags to lowercase
  recognizeSelfClosing: true, // Recognize self-closing tags
});

Build & Run

Development

For development, you can use Cheerio directly without building:

# Run tests
npm test

# Run linting
npm run lint

# Build TypeScript
npm run build

Basic Usage Example

const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

// Get text content
console.log($('h2.title').text()); // "Hello world"

// Get HTML content
console.log($('h2').html()); // "Hello world"

// Set attributes
$('h2').attr('id', 'title');
console.log($('h2').attr('id')); // "title"

// Add classes
$('h2').addClass('main-title');
console.log($('h2').hasClass('main-title')); // true

Deployment

Cheerio is a library for server-side HTML parsing and manipulation, so it doesn't have a traditional deployment process. However, here are common deployment scenarios:

For Node.js Applications

  1. Package your application with Cheerio as a dependency:
npm install cheerio
  1. Deploy to any Node.js hosting platform:
  • Vercel (for serverless functions)
  • Heroku (traditional Node.js hosting)
  • AWS Lambda (serverless)
  • DigitalOcean App Platform
  • Railway

For CLI Tools

If building a CLI tool with Cheerio:

  1. Package as an npm module:
npm publish
  1. Or distribute as a standalone binary using tools like:
  • pkg (for Node.js)
  • nexe (for Node.js)

For Browser Usage

Cheerio is primarily designed for server-side use. For browser usage, consider:

  1. Bundling with webpack/rollup for browser compatibility
  2. Using alternatives like jQuery for browser-based DOM manipulation

Troubleshooting

Common Issues

1. "cheerio is not defined" or "Cannot find module 'cheerio'"

Solution: Ensure Cheerio is installed in your project:

npm install cheerio

2. HTML parsing issues with malformed HTML

Solution: Use Cheerio's built-in error handling or pre-process HTML:

const $ = cheerio.load(html, {
  // Enable XML mode for stricter parsing
  xmlMode: true,
  // Handle self-closing tags
  recognizeSelfClosing: true,
});

3. Performance issues with large documents

Solution: Use Cheerio's streaming capabilities or limit DOM traversal:

// Use more specific selectors
const items = $('.container .item');

// Limit traversal depth
const directChildren = $('.parent > .child');

4. Attribute manipulation not working

Solution: Ensure you're working with the correct element type:

// Check if element exists
if ($('selector').length) {
  // Manipulate attributes
  $('selector').attr('data-custom', 'value');
}

5. TypeScript compilation errors

Solution: Install type definitions:

npm install --save-dev @types/node

Debugging Tips

  1. Log the HTML structure:
console.log($.html());
  1. Check element existence:
console.log($('selector').length);
  1. Use Cheerio's built-in methods:
// Check if element has class
console.log($('selector').hasClass('classname'));

// Get all attributes
console.log($('selector').attr());

Performance Optimization

  1. Reuse Cheerio instances when processing multiple documents:
const $ = cheerio.load(html1);
// Process html1
$.load(html2); // Reuse the same instance
  1. Use efficient selectors:
// Better than traversing entire DOM
const items = $('.container .item');
  1. Limit DOM traversal:
// Use direct child selectors
const directChildren = $('.parent > .child');

This guide provides a comprehensive overview of installing, configuring, and using Cheerio for HTML parsing and manipulation in your projects.