MySQL: Charset and Collation 博客分类: MySQL MySQLCharsetCollationGarbledData Loss
1. Introduction
1) create table table_name (column_declaration) charset utf8;
2) set names gbk;
Comments:
1) What's the meaning?
2) What's the difference?
2. Charset
1) Charset hierarchy: (Server>Database>Table>Column)
1) Server default charset
2) Database default charset
3) Table default charset
4) Column default charset
2) Charset hierarchy policy:
1) If we didn't declare the charset for a specific level, then its charset inherit from its parent's.
2) If we didn't declare the charset for server, the server can never start.
3) Comprehension for Translator:
1) Client/Console have its own charset
2) Translator has its own charset
3) Database has its own charset
4) Translator:
1) Translator has to know the input data's charset ---> In the figure above, the charset from client/console is gbk. ---> character_set_client = gbk;
2) Translator has to know the transit data's charset ---> In the figure above, the charset for transit data is utf-8 marked as red. ---> character_set_connection = utf8;
3) Translator has to know the database/table charset ---> In the figure above, the charset for database is utf-8. ---> create table table_name (column_delcaration) charset utf8;
4) Translator has to know the output data's charset ---> In the figure above, the charset for output is gbk. ---> character_set_results = gbk;
Comments: If character_set_client, character_set_connection, character_set_results is the same value of N. Then we can use "set names N" for short.
5) When will garbled occurs?
1) Character_set_client is not according to the truth. The data input from console is in charset of gbk. If we declared character_set_client = utf8, garbled occured.
2) Character_set_results is not according to the truth. The data output to webpage is in charset of utf8. If we declared character_set_client = gbk, garbled occured.
6) When will data loss?
1) Character_set_connection/database-charset is smaller than the charset of data passed from client.
Eg: gbk->lartin1->gbk: During the procession of translating from client to transit data, data loss!
gbk->gbk->lartin1: Durint the processon of translating from transit data to database, data loss!
7) Real world problem:
1) For some reason, the data store in database as charset gbk and cannot be modified.
2) Data passed from client is php with charset of utf8.
3) Solution: set names utf8; crate table table_name(column_declaration) charset gbk;
3. Collation
1) Introduction
# Create table create table temp(name varchar(12)); # Insert data insert into temp values('a'), ('B'), ('c'), ('D'); # Order data select * from temp order by name asc; +------+ | name | +------+ | a | | B | | c | | D | +------+ # Q: a->97, B->66. Why a < B? # A: Refer to collation.
2) What is collaton?
1) In order to order data in a table according to a column, we must specify a rule for this. And the rule is just the collation.
3) What is the relationship between charset and collation?
1) One charset may have many collations.
# Command for show collation show collation # Command for show collation for utf8 show collation like 'utf8%' # utf8 has about 40 collations.
2) Default collation for utf8 is 'utf8_general_ci': Is case insensitive.
'utf8_bin': Order by binary code.(ASCII Code)
# Create table create table temp2(name varchar(11)) charset utf8 collate=utf8_bin; # Insert data insert into temp2 values('a'), ('B'), ('c'), ('D'); # Order data select * from temp2 order by name asc; +------+ | name | +------+ | B | | D | | a | | c | +------+