PHP正则笔记

####中文字符 gbk编码下中文字符正则会出现乱码问题：解决方案是转码用utf8

中文字符匹配正则:

UTF-8:/[\u{4e00}-\u{9fff}]/u
GBK:/([\xb0-\xfe][\x00-xff])+/

utf8必须指定Unicode模式，gbk下匹配多个字符，必须添加括号将两个字节将两个字节作为一组

####. * 的滥用 <?php $str = «<EOF <h4>第1集</h4> aaa aaa <h4>第2集</h4> bbbbb EOF;

$reg1 = "/<h4>(.*?)<\/h4>(.*)/Us";

$reg2 = "/<h4>(.*?)<\/h4>(((?!<h4>).)*)/s";

preg_match_all($reg1, $str, $match1);
preg_match_all($reg2, $str, $match2);

$reg1用的是. * , h4下面的内容就匹配不到，改用$reg2就可以解决问题

####中文乱码的情况 <?php $str = «<EOF 中文，英文，懂中文 EOF;

$reg = '/[^->]中文/';
$replace = "<a>不懂中文</a>"

$str = preg_replace($reg, $replace, $str, 1);

str出现乱码,解决方案：

$reg = '(.*[^-|])(中文)';

$str = preg_replace($reg, "$1{$replace}", $str, 1);

第一种方法$reg是整体，替换就会把[^->]替换，所以要把$reg分成2个部分