字集字碼的問題真的是很麻煩的,沒有好的觀念實在很難解決這麼多字集字碼的問題,尤其在轉換資料的時候,最是痛苦,很多人解決不了也是妥協,但常看到別人的網站上出現一堆 \ ? 等字元,站長本身又無法解決時,想必也是很難過吧。以下這篇文章是非常精華的一篇關於如何將 MySQL 4.0 完整且正確的升級至 MySQL 4.1/5.0 的說明。 至於文章中的名人「許功蓋」相信大家都知道是誰吧!^_^
MySQL 關於 Character Sets and Collations 的詳細定義在此:
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
概要說明一下:
mysql> set names utf8; set character set utf8;
mysql> SELECT name FROM `cms_content` WHERE `name` LIKE CONVERT( _big5 '%許功蓋%' USING utf8 );
CONVERT:
宣告 '%歐白%' 為 BIG5 字集,並轉成 UTF8 字碼,用於 SQL 查詢
真相:
只要用 casting 功能就可以下出正確的 SQL Command
e.g.
mysql> set names utf8;set character set utf8;
mysql> SELECT name FROM `cms_content` WHERE `name` LIKE _big5 '%許功蓋%';
mysql> set names big5;set character set big5;
mysql> SELECT name FROM `cms_content` WHERE `name` LIKE _big5 '%許功蓋%';
只要能指定「字串」的字集,MySQL 就能辨認字串的比對! 用 _big5 'XXXXX'
這兩段的結果都可以正確查到資料!
情況(一):轉移 MySQL 4.0 之前的資料
1. 先 CREATE DATABASE 指定字集為 big5
CREATE DATABASE `dbname1` DEFAULT CHARACTER SET UTF8;
2. 選擇資料庫
USE `dbname1`;
3. 再宣告接下來要輸入 SQL 所使用的字集 ( Big5 )
SET NAMES 'big5';
SET CHARACTER SET big5;
4. 再來,就新增資料
CREATE TABLE or INSERT INTO
情況(二):轉移網站 PHP 程式
在資料庫連線後,立即執行兩行 SQL 指令:
$conn->Execute("SET NAMES 'big5'");
$conn->Execute("SET CHARACTER SET big5");
情況(三):開發新的 PHP 網站
1. 資料庫、表格與欄位皆宣告為 UTF8
2. 訊息在處理的時候,程式要先宣告使用的字集: ( e.g. Big5 或 UTF-8 )
$conn->Execute("SET NAMES 'big5'");
$conn->Execute("SET CHARACTER SET big5");
3. 最好在輸入資料庫的時候,完全以 utf8 存入資料庫
※ 潛在問題:
1. 若輸入字集為 Big5,但 Browser 會將「非Big5」的字集變成 Unicode Entities,要如何正確存入 MySQL 5
- 策略:先將 BIG5 文字轉成 UTF-8,再將 Unicode Entities 轉成 UTF-8,就大功告成啦!我真是天才~ ^____^
- 解答:
$tmp = '我是Will! :: ク チ コ ミ 付 き 全 国 う ま い ラ';
$str = iconv("BIG5", "UTF-8", $tmp);
$str = unescape($str); # unescape 定義在下方
$conn->Execute("SET NAMES 'UTF8'");
$conn->Execute("SET CHARACTER SET UTF8");
$conn->Execute("INSERT INTO test VALUES ('[ ".date("Y-m-d H:i:s")." ] ".$str."');");
2. 若輸出的字集是 Big5,原本的 UTF-8 用 iconv 可能無法完全轉回 Big5,要如何將無法轉換的部分轉成 Entities 呢?
- 策略:先取出所有 UTF-8 的單字,一個一個用 iconv 轉換,轉換不成功的再轉成 Entities
- 解答:
$s = ddc_readfile("/home/demo/utf8.txt");
function utf8_to_big5($str)
{
#$to_charset = 'BIG5';
#$to_charset = 'GB2312';
#$to_charset = 'UTF-8';
$old_mb_internal_encoding = mb_internal_encoding();
mb_internal_encoding('UTF-8');
$len = mb_strlen($str);
$buf = '';
for($i=$len ; $i > 0 ; $i--)
{
$idx = $len - $i;
$char = mb_substr($str, $idx, 1);
$char_b5 = iconv("UTF-8", $to_charset, $char);
if($char_b5 == '') {
$char = utf8ToUnicodeEntities($char);
}
else {
$char = $char_b5;
}
$buf .= $char;
}
mb_internal_encoding($old_mb_internal_encoding);
return $buf;
}
# header("Content-Type: text/html; charset=UTF-8");
$n = utf8_to_big5($s);
echo($n);
※ 若應用程式是以 Big5 做為傳遞的依據,可以用以下方式確保輸入的字集都是 utf8
# 先判斷該字串是否為 UTF-8 字串,若不是則用 iconv 從 BIG5 轉換至 UTF-8
if (iconv('UTF-8', 'UTF-8', $xml) != $xml) {
$xml = iconv("BIG5", "UTF-8", $xml);
}
# 再將所有 Browser 自動轉換的 Entities 轉成 Unicode 字串
$xml = preg_replace_callback("/(&#[0-9]{1,5}+;)/", "preg_utf8Encode", $xml);
function preg_utf8Encode($matches)
{
#dp($matches[1]);
return utf8Encode($matches[1]);
}
function utf8Encode ($source)
{
$utf8Str = '';
$entityArray = explode ("&#", $source);
$size = count ($entityArray);
for ($i = 0; $i < $size; $i++)
{
$subStr = $entityArray[$i];
$nonEntity = strstr ($subStr, ';');
if ($nonEntity !== false)
{
/* Add by Will Huang - BEGIN */
if(substr ($subStr, 0, 1) == 'x') {
$unicode = intval(hexdec(substr ($subStr, 1, (strpos ($subStr, ';')+1))));
}
else
/* Add by Will Huang - END */
$unicode = intval (substr ($subStr, 0, (strpos ($subStr, ';') + 1)));
// determine how many chars are needed to reprsent this unicode char
if ($unicode < 128) {
$utf8Substring = chr ($unicode);
}
else if ($unicode >= 128 && $unicode < 2048) {
$binVal = str_pad (decbin ($unicode), 11, "0", STR_PAD_LEFT);
$binPart1 = substr ($binVal, 0, 5);
$binPart2 = substr ($binVal, 5);
$char1 = chr (192 + bindec ($binPart1));
$char2 = chr (128 + bindec ($binPart2));
$utf8Substring = $char1 . $char2;
}
else if ($unicode >= 2048 && $unicode < 65536) {
$binVal = str_pad (decbin ($unicode), 16, "0", STR_PAD_LEFT);
$binPart1 = substr ($binVal, 0, 4);
$binPart2 = substr ($binVal, 4, 6);
$binPart3 = substr ($binVal, 10);
$char1 = chr (224 + bindec ($binPart1));
$char2 = chr (128 + bindec ($binPart2));
$char3 = chr (128 + bindec ($binPart3));
$utf8Substring = $char1 . $char2 . $char3;
}
else {
$binVal = str_pad (decbin ($unicode), 21, "0", STR_PAD_LEFT);
$binPart1 = substr ($binVal, 0, 3);
$binPart2 = substr ($binVal, 3, 6);
$binPart3 = substr ($binVal, 9, 6);
$binPart4 = substr ($binVal, 15);
$char1 = chr (240 + bindec ($binPart1));
$char2 = chr (128 + bindec ($binPart2));
$char3 = chr (128 + bindec ($binPart3));
$char4 = chr (128 + bindec ($binPart4));
$utf8Substring = $char1 . $char2 . $char3 . $char4;
}
if (strlen ($nonEntity) > 1) {
$nonEntity = substr ($nonEntity, 1); // chop the first char (';')
}
else {
$nonEntity = '';
}
$utf8Str .= $utf8Substring . $nonEntity;
}
else {
$utf8Str .= $subStr;
}
}
return $utf8Str;
}
UTF-8 Notes:
http://php.nctu.edu.tw/manual/en/function.iconv.php
<?php
//script from http://zizi.kxup.com/
//javascript unesape
function unescape($str) {
$str = rawurldecode($str);
preg_match_all("/(?:%u.{4})|&#x.{4};|&#\d+;|.+/U",$str,$r);
$ar = $r[0];
print_r($ar);
foreach($ar as $k=>$v) {
if(substr($v,0,2) == "%u")
$ar[$k] = iconv("UCS-2","UTF-8",pack("H4",substr($v,-4)));
elseif(substr($v,0,3) == "&#x")
$ar[$k] = iconv("UCS-2","UTF-8",pack("H4",substr($v,3,-1)));
elseif(substr($v,0,2) == "&#") {
echo substr($v,2,-1)."<br>";
$ar[$k] = iconv("UCS-2","UTF-8",pack("n",substr($v,2,-1)));
}
}
return join("",$ar);
}
?>
**********************************************
使用 ADOdb for PHP4 如何順利將 PHP4 + MySQL 4.0 升級至 PHP4 + MySQL 4.1/5.0
※ 必要條件:要安裝 iconv、一定要是輸入 BIG5 的字
※ 開啟 adobe/adodb.inc.php 檔案,並修改以下兩個 Function !!!
function b2u_walk (&$item, $key) {
if(!is_array($item)) {
$item = iconv("BIG5", "UTF-8//IGNORE", $item);
}
}
function GetUpdateSQL(&$rs, $arrFields,$forceUpdate=false,$magicq=false)
{
array_walk($arrFields, 'b2u_walk');
global $ADODB_INCLUDED_LIB;
if (empty($ADODB_INCLUDED_LIB)) include_once(ADODB_DIR.'/adodb-lib.inc.php');
$tmp = _adodb_getupdatesql($this,$rs,$arrFields,$forceUpdate,$magicq);
$tmp = iconv("UTF-8", "BIG5//IGNORE", $tmp);
return $tmp;
}
function GetInsertSQL(&$rs, $arrFields,$magicq=false)
{
array_walk($arrFields, 'b2u_walk');
global $ADODB_INCLUDED_LIB;
if (empty($ADODB_INCLUDED_LIB)) include_once(ADODB_DIR.'/adodb-lib.inc.php');
$tmp = _adodb_getinsertsql($this,$rs,$arrFields,$magicq);
$tmp = iconv("UTF-8", "BIG5//IGNORE", $tmp);
return $tmp;
}