加载文档
¥Loading Documents
在本指南中,我们将了解如何使用 Cheerio 加载文档以及何时使用不同的加载方法。
¥In this guide, we'll take a look at how to load documents with Cheerio and when to use the different loading methods.
如果你熟悉 jQuery,那么这一步对你来说将会很陌生。jQuery 在单一的、内置的 DOM 上运行。使用 Cheerio,我们需要传入 HTML 文档。
¥If you're familiar with jQuery, then this step will be new to you. jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.
loadBuffer
、stringStream
、decodeStream
和 fromURL
方法在浏览器环境中不可用。相反,使用 load
方法来解析 HTML 字符串。
¥The loadBuffer
, stringStream
, decodeStream
, and fromURL
methods are not
available in the browser environment. Instead, use the load
method to parse
HTML strings.
load
load 方法是使用 Cheerio 解析 HTML 或 XML 文档的最基本方法。它采用包含文档的字符串作为其参数,并返回一个可用于遍历和操作文档的 Cheerio 对象。
¥The load method is the most basic way to parse an HTML or XML document with Cheerio. It takes a string containing the document as its argument and returns a Cheerio object that you can use to traverse and manipulate the document.
下面是如何使用 load 方法的示例:
¥Here's an example of how to use the load method:
import * as cheerio from 'cheerio';
const $ = cheerio.load('<h1>Hello, world!</h1>');
console.log($('h1').text());
// Output: Hello, world!
与 Web 浏览器上下文类似,load
将引入 <html>
、<head>
和 <body>
元素(如果它们尚不存在)。你可以将 load
的第三个参数设置为 false
以禁用此功能。
¥Similar to web browser contexts, load
will introduce <html>
, <head>
, and
<body>
elements if they are not already present. You can set load
's third
argument to false
to disable this.
const $ = cheerio.load('<ul id="fruits">...</ul>', null, false);
$.html();
//=> '<ul id="fruits">...</ul>'
在 API 文档 中了解有关 load
方法的更多信息。
¥Learn more about the load
method in the
API documentation.
loadBuffer
loadBuffer
方法与 load
方法类似,但它采用包含文档的缓冲区而不是字符串作为参数。Cheerio 将运行 HTML 编码嗅探算法来确定文档的编码。当你拥有二进制形式的文档时(例如当你从文件中读取该文档或通过网络连接接收该文档时),这非常有用。
¥The loadBuffer
method is similar to the load
method, but it takes a buffer
containing the document as its argument instead of a string. Cheerio will run
the HTML encoding sniffing algorithm to determine the encoding of the document.
This is useful when you have the document in binary form, such as when you're
reading it from a file or receiving it over a network connection.
以下是如何使用 loadBuffer
方法的示例:
¥Here's an example of how to use the loadBuffer
method:
import * as cheerio from 'cheerio';
import * as fs from 'fs';
const buffer = fs.readFileSync('document.html');
const $ = cheerio.loadBuffer(buffer);
console.log($('title').text());
// Output: Hello, world!
在 API 文档 中了解有关 loadBuffer
方法的更多信息。
¥Learn more about the loadBuffer
method in the
API documentation.
stringStream
当从流加载 HTML 文档并且编码已知时,你可以使用 stringStream
方法将其解析为 Cheerio 对象。
¥When loading an HTML document from a stream and the encoding is known, you can
use the stringStream
method to parse it into a Cheerio object.
import * as cheerio from 'cheerio';
import * as fs from 'fs';
const writeStream = cheerio.stringStream({}, (err, $) => {
if (err) {
// Handle error
}
console.log($('title').text());
// Output: Hello, world!
});
fs.createReadStream('document.html', { encoding: 'utf8' }).pipe(writeStream);
在 API 文档 中了解有关 stringStream
方法的更多信息。
¥Learn more about the stringStream
method in the
API documentation.
decodeStream
当从流加载 HTML 文档且编码未知时,可以使用 decodeStream
方法将其解析为 Cheerio 对象。该方法运行 HTML 编码嗅探算法来确定文档的编码。
¥When loading an HTML document from a stream and the encoding is not known, you
can use the decodeStream
method to parse it into a Cheerio object. This method
runs the HTML encoding sniffing algorithm to determine the encoding of the
document.
以下是如何使用 decodeStream
方法的示例:
¥Here's an example of how to use the decodeStream
method:
import * as cheerio from 'cheerio';
import * as fs from 'fs';
const writeStream = cheerio.decodeStream({}, (err, $) => {
if (err) {
// Handle error
}
console.log($('title').text());
// Output: Hello, world!
});
fs.createReadStream('document.html').pipe(writeStream);
在 API 文档 中了解有关 decodeStream
方法的更多信息。
¥Learn more about the decodeStream
method in the
API documentation.
fromURL
fromURL
方法允许你从 URL 加载文档。此方法是异步的,因此你需要使用 await
(或 then
块)来访问生成的 Cheerio 对象。
¥The fromURL
method allows you to load a document from a URL. This method is
asynchronous, so you need to use await
(or a then
block) to access the
resulting Cheerio object.
import * as cheerio from 'cheerio';
const $ = await cheerio.fromURL('https://example.com');
在 API 文档 中了解有关 fromURL
方法的更多信息。
¥Learn more about the fromURL
method in the
API documentation.
结论
¥Conclusion
Cheerio 提供了多种加载 HTML 文档并将其解析为 DOM 结构的方法。这些方法适用于不同的场景,具体取决于 HTML 数据的类型和来源。我们鼓励用户仔细阅读这些方法并选择最适合他们需求的方法。
¥Cheerio provides several methods for loading HTML documents and parsing them into a DOM structure. These methods are useful for different scenarios, depending on the type and source of the HTML data. Users are encouraged to read through each of these methods and pick the one that best suits their needs.