Skip to main content

欢迎来到 Cheerio!

¥Welcome to Cheerio!

让我们在不到 5 分钟的时间内快速了解 Cheerio。

¥Let's get a quick overview of Cheerio in less than 5 minutes.

入门

¥Getting Started

让我们安装 Cheerio 及其依赖。

¥Let's install Cheerio and its dependencies.

设置 Node.js

¥Setting up Node.js

要安装 Cheerio,你需要在系统上安装 Node.js。

¥To install Cheerio, you will need to have Node.js installed on your system.

  • 下载最新版本的 Node.js

    ¥Download the latest version of Node.js:

    • 安装 Node.js 时,建议你选中所有与依赖相关的复选框。

      ¥When installing Node.js, you are recommended to check all checkboxes related to dependencies.

安装 Cheerio

¥Installing Cheerio

设置 Node.js 后,你可以使用以下命令来安装 Cheerio:

¥Once you have set up Node.js, you can use the following command to install Cheerio:

npm install cheerio

导入 Cheerio

¥Importing Cheerio

安装 Cheerio 后,你可以使用 import 语句将其导入 JavaScript 代码中:

¥Once Cheerio is installed, you can import it into your JavaScript code using the import statement:

import * as cheerio from 'cheerio';

如果你使用的是较旧的环境(或者更喜欢使用 CommonJS),则可以使用 require 函数:

¥If you are on an older environment (or prefer using CommonJS), you can use the require function:

const cheerio = require('cheerio');

使用 Cheerio

¥Using Cheerio

导入 Cheerio 后,你可以开始使用它来操作和抓取网页数据。

¥After importing Cheerio, you can start using it to manipulate and scrape web page data.

加载文档

¥Loading a Document

加载 HTML 最简单的方法是使用 load 函数:

¥The easiest way of loading HTML is to use the load function:

const $ = cheerio.load('<h2 class="title">Hello world</h2>');

这会将 HTML 字符串加载到 Cheerio 中并返回 Cheerio 对象。然后,你可以使用该对象来遍历 DOM 并操作数据。

¥This will load the HTML string into Cheerio and return a Cheerio object. You can then use this object to traverse the DOM and manipulate the data.

了解有关 加载文档 的更多信息。

¥Learn more about loading documents.

注意

Cheerio 不是网络浏览器。Cheerio 解析标记并提供用于遍历/操作结果数据结构的 API。它不会像网络浏览器那样解释结果。具体来说,它不会生成视觉渲染、应用 CSS、加载外部资源或执行 SPA(单页应用)常见的 JavaScript。这使得 Cheerio 比其他解决方案快得多。如果你的用例需要任何此类功能,你应该考虑浏览器自动化软件(如 PuppeteerPlaywright)或 DOM 模拟项目(如 JSDom)。

¥Cheerio is not a web browser. Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript which is common for a SPA (single page application). This makes Cheerio much, much faster than other solutions. If your use case requires any of this functionality, you should consider browser automation software like Puppeteer and Playwright or DOM emulation projects like JSDom.

选择元素

¥Selecting Elements

加载文档后,你可以使用返回的函数从文档中选择元素。

¥Once you have loaded a document, you can use the returned function to select elements from the document.

在这里,我们将选择具有 title 类的 h2 元素,然后从中获取文本:

¥Here, we will select the h2 element with the class title, and then get the text from it:

$('h2.title').text(); // "Hello world"

了解有关 选择元素 的更多信息。

¥Learn more about selecting elements.

遍历 DOM

¥Traversing the DOM

$ 函数返回一个 Cheerio 对象,该对象类似于 DOM 元素数组。可以使用该对象作为进一步遍历 DOM 的起点。例如,你可以使用 find 函数来选择所选元素中的元素:

¥The $ function returns a Cheerio object, which is similar to an array of DOM elements. It is possible to use this object as a starting point to further traverse the DOM. For example, you can use the find function to select elements within the selected elements:

$('h2.title').find('.subtitle').text();

还有许多其他函数可用于遍历 DOM。了解有关 遍历 DOM 的更多信息。

¥There are many other functions that can be used to traverse the DOM. Learn more about traversing the DOM.

操作元素

¥Manipulating Elements

选择元素后,你可以使用 Cheerio 对象来操作该元素。

¥Once you have selected an element, you can use the Cheerio object to manipulate the element.

在这里,我们将选择类为 titleh2 元素,然后更改其中的文本。我们还在文档中添加了一个新的 h3 元素:

¥Here, we will select the h2 element with the class title, and then change the text inside it. We also add a new h3 element to the document:

$('h2.title').text('Hello there!');

$('h2').after('<h3>How are you?</h3>');

了解有关 操作元素 的更多信息。

¥Learn more about manipulating elements.