欢迎来到 Cheerio!
¥Welcome to Cheerio!
让我们在不到 5 分钟的时间内快速了解 Cheerio。
¥Let's get a quick overview of Cheerio in less than 5 minutes.
入门
¥Getting Started
让我们安装 Cheerio 及其依赖。
¥Let's install Cheerio and its dependencies.
设置 Node.js
¥Setting up Node.js
要安装 Cheerio,你需要在系统上安装 Node.js。
¥To install Cheerio, you will need to have Node.js installed on your system.
-
下载最新版本的 Node.js:
¥Download the latest version of Node.js:
-
安装 Node.js 时,建议你选中所有与依赖相关的复选框。
¥When installing Node.js, you are recommended to check all checkboxes related to dependencies.
-
安装 Cheerio
¥Installing Cheerio
设置 Node.js 后,你可以使用以下命令来安装 Cheerio:
¥Once you have set up Node.js, you can use the following command to install Cheerio:
- npm
- Yarn
- pnpm
npm install cheerio
yarn add cheerio
pnpm add cheerio
导入 Cheerio
¥Importing Cheerio
安装 Cheerio 后,你可以使用 import
语句将其导入 JavaScript 代码中:
¥Once Cheerio is installed, you can import it into your JavaScript code using the
import
statement:
import * as cheerio from 'cheerio';
如果你使用的是较旧的环境(或者更喜欢使用 CommonJS),则可以使用 require
函数:
¥If you are on an older environment (or prefer using CommonJS), you can use the
require
function:
const cheerio = require('cheerio');
使用 Cheerio
¥Using Cheerio
导入 Cheerio 后,你可以开始使用它来操作和抓取网页数据。
¥After importing Cheerio, you can start using it to manipulate and scrape web page data.
加载文档
¥Loading a Document
加载 HTML 最简单的方法是使用 load
函数:
¥The easiest way of loading HTML is to use the load
function:
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
这会将 HTML 字符串加载到 Cheerio 中并返回 Cheerio
对象。然后,你可以使用该对象来遍历 DOM 并操作数据。
¥This will load the HTML string into Cheerio and return a Cheerio
object. You
can then use this object to traverse the DOM and manipulate the data.
了解有关 加载文档 的更多信息。
¥Learn more about loading documents.
Cheerio 不是网络浏览器。Cheerio 解析标记并提供用于遍历/操作结果数据结构的 API。它不会像网络浏览器那样解释结果。具体来说,它不会生成视觉渲染、应用 CSS、加载外部资源或执行 SPA(单页应用)常见的 JavaScript。这使得 Cheerio 比其他解决方案快得多。如果你的用例需要任何此类功能,你应该考虑浏览器自动化软件(如 Puppeteer 和 Playwright)或 DOM 模拟项目(如 JSDom)。
¥Cheerio is not a web browser. Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript which is common for a SPA (single page application). This makes Cheerio much, much faster than other solutions. If your use case requires any of this functionality, you should consider browser automation software like Puppeteer and Playwright or DOM emulation projects like JSDom.
选择元素
¥Selecting Elements
加载文档后,你可以使用返回的函数从文档中选择元素。
¥Once you have loaded a document, you can use the returned function to select elements from the document.
在这里,我们将选择具有 title
类的 h2
元素,然后从中获取文本:
¥Here, we will select the h2
element with the class title
, and then get the
text from it:
$('h2.title').text(); // "Hello world"
了解有关 选择元素 的更多信息。
¥Learn more about selecting elements.
遍历 DOM
¥Traversing the DOM
$
函数返回一个 Cheerio
对象,该对象类似于 DOM 元素数组。可以使用该对象作为进一步遍历 DOM 的起点。例如,你可以使用 find
函数来选择所选元素中的元素:
¥The $
function returns a Cheerio
object, which is similar to an array of DOM
elements. It is possible to use this object as a starting point to further
traverse the DOM. For example, you can use the find
function to select
elements within the selected elements:
$('h2.title').find('.subtitle').text();
还有许多其他函数可用于遍历 DOM。了解有关 遍历 DOM 的更多信息。
¥There are many other functions that can be used to traverse the DOM. Learn more about traversing the DOM.
操作元素
¥Manipulating Elements
选择元素后,你可以使用 Cheerio
对象来操作该元素。
¥Once you have selected an element, you can use the Cheerio
object to
manipulate the element.
在这里,我们将选择类为 title
的 h2
元素,然后更改其中的文本。我们还在文档中添加了一个新的 h3
元素:
¥Here, we will select the h2
element with the class title
, and then change
the text inside it. We also add a new h3
element to the document:
$('h2.title').text('Hello there!');
$('h2').after('<h3>How are you?</h3>');
了解有关 操作元素 的更多信息。
¥Learn more about manipulating elements.