Skip to main content

加载文档

¥Loading Documents

在本指南中,我们将了解如何使用 Cheerio 加载文档以及何时使用不同的加载方法。

¥In this guide, we'll take a look at how to load documents with Cheerio and when to use the different loading methods.

提示

如果你熟悉 jQuery,那么这一步对你来说将会很陌生。jQuery 在单一的、内置的 DOM 上运行。使用 Cheerio,我们需要传入 HTML 文档。

¥If you're familiar with jQuery, then this step will be new to you. jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.

方法的可用性

loadBufferstringStreamdecodeStreamfromURL 方法在浏览器环境中不可用。相反,使用 load 方法来解析 HTML 字符串。

¥The loadBuffer, stringStream, decodeStream, and fromURL methods are not available in the browser environment. Instead, use the load method to parse HTML strings.

load

load 方法是使用 Cheerio 解析 HTML 或 XML 文档的最基本方法。它采用包含文档的字符串作为其参数,并返回一个可用于遍历和操作文档的 Cheerio 对象。

¥The load method is the most basic way to parse an HTML or XML document with Cheerio. It takes a string containing the document as its argument and returns a Cheerio object that you can use to traverse and manipulate the document.

下面是如何使用 load 方法的示例:

¥Here's an example of how to use the load method:

import * as cheerio from 'cheerio';

const $ = cheerio.load('<h1>Hello, world!</h1>');

console.log($('h1').text());
// Output: Hello, world!
提示

与 Web 浏览器上下文类似,load 将引入 <html><head><body> 元素(如果它们尚不存在)。你可以将 load 的第三个参数设置为 false 以禁用此功能。

¥Similar to web browser contexts, load will introduce <html>, <head>, and <body> elements if they are not already present. You can set load's third argument to false to disable this.

const $ = cheerio.load('<ul id="fruits">...</ul>', null, false);

$.html();
//=> '<ul id="fruits">...</ul>'

API 文档 中了解有关 load 方法的更多信息。

¥Learn more about the load method in the API documentation.

loadBuffer

loadBuffer 方法与 load 方法类似,但它采用包含文档的缓冲区而不是字符串作为参数。Cheerio 将运行 HTML 编码嗅探算法来确定文档的编码。当你拥有二进制形式的文档时(例如当你从文件中读取该文档或通过网络连接接收该文档时),这非常有用。

¥The loadBuffer method is similar to the load method, but it takes a buffer containing the document as its argument instead of a string. Cheerio will run the HTML encoding sniffing algorithm to determine the encoding of the document. This is useful when you have the document in binary form, such as when you're reading it from a file or receiving it over a network connection.

以下是如何使用 loadBuffer 方法的示例:

¥Here's an example of how to use the loadBuffer method:

import * as cheerio from 'cheerio';
import * as fs from 'fs';

const buffer = fs.readFileSync('document.html');

const $ = cheerio.loadBuffer(buffer);

console.log($('title').text());
// Output: Hello, world!

API 文档 中了解有关 loadBuffer 方法的更多信息。

¥Learn more about the loadBuffer method in the API documentation.

stringStream

当从流加载 HTML 文档并且编码已知时,你可以使用 stringStream 方法将其解析为 Cheerio 对象。

¥When loading an HTML document from a stream and the encoding is known, you can use the stringStream method to parse it into a Cheerio object.

import * as cheerio from 'cheerio';
import * as fs from 'fs';

const writeStream = cheerio.stringStream({}, (err, $) => {
if (err) {
// Handle error
}

console.log($('title').text());
// Output: Hello, world!
});

fs.createReadStream('document.html', { encoding: 'utf8' }).pipe(writeStream);

API 文档 中了解有关 stringStream 方法的更多信息。

¥Learn more about the stringStream method in the API documentation.

decodeStream

当从流加载 HTML 文档且编码未知时,可以使用 decodeStream 方法将其解析为 Cheerio 对象。该方法运行 HTML 编码嗅探算法来确定文档的编码。

¥When loading an HTML document from a stream and the encoding is not known, you can use the decodeStream method to parse it into a Cheerio object. This method runs the HTML encoding sniffing algorithm to determine the encoding of the document.

以下是如何使用 decodeStream 方法的示例:

¥Here's an example of how to use the decodeStream method:

import * as cheerio from 'cheerio';
import * as fs from 'fs';

const writeStream = cheerio.decodeStream({}, (err, $) => {
if (err) {
// Handle error
}

console.log($('title').text());
// Output: Hello, world!
});

fs.createReadStream('document.html').pipe(writeStream);

API 文档 中了解有关 decodeStream 方法的更多信息。

¥Learn more about the decodeStream method in the API documentation.

fromURL

fromURL 方法允许你从 URL 加载文档。此方法是异步的,因此你需要使用 await(或 then 块)来访问生成的 Cheerio 对象。

¥The fromURL method allows you to load a document from a URL. This method is asynchronous, so you need to use await (or a then block) to access the resulting Cheerio object.

import * as cheerio from 'cheerio';

const $ = await cheerio.fromURL('https://example.com');

API 文档 中了解有关 fromURL 方法的更多信息。

¥Learn more about the fromURL method in the API documentation.

结论

¥Conclusion

Cheerio 提供了多种加载 HTML 文档并将其解析为 DOM 结构的方法。这些方法适用于不同的场景,具体取决于 HTML 数据的类型和来源。我们鼓励用户仔细阅读这些方法并选择最适合他们需求的方法。

¥Cheerio provides several methods for loading HTML documents and parsing them into a DOM structure. These methods are useful for different scenarios, depending on the type and source of the HTML data. Users are encouraged to read through each of these methods and pick the one that best suits their needs.