Charset: Limit _wp_scan_utf8() ASCII scan to remaining code points.

The ASCII fast-path in `_wp_scan_utf8()` uses `strspn()` to skip past ASCII bytes. When a code point limit was provided without a byte limit, the scan would include the rest of the input even when there was a code point limit. Because ASCII characters are single-byte code points, the fast-path scan length can be bounded by the number of remaining code points. This improves performance when working with some large documents.

Developed in https://github.com/WordPress/wordpress-develop/pull/12214.

Follow-up to [60768].

Props jonsurrell, dmsnell, zieladam.
Fixes #65483. See #63863.

Built from https://develop.svn.wordpress.org/trunk@62523


git-svn-id: http://core.svn.wordpress.org/trunk@61804 1a063a9b-81f0-0310-95a4-ce76da25c4cd
This commit is contained in:
jonsurrell
2026-06-18 16:46:43 +00:00
parent b151a33cda
commit b6e1fd6b2a
2 changed files with 2 additions and 2 deletions
+1 -1
View File
@@ -65,7 +65,7 @@ function _wp_scan_utf8( string $bytes, int &$at, int &$invalid_length, ?int $max
"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" .
" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f",
$i,
$end - $i
min( $end - $i, $max_count - $count )
);
if ( $count + $ascii_byte_count >= $max_count ) {
+1 -1
View File
@@ -16,7 +16,7 @@
*
* @global string $wp_version
*/
$wp_version = '7.1-alpha-62522';
$wp_version = '7.1-alpha-62523';
/**
* Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.