初识全文索引
通常来说,全文索引大多用在OLAP环境当中,全文索引擅长于词汇的快速搜索。 一、全文索引和普通b_tree索引对比 SQL create tablet1 (id int,name varchar(10)); Table created. SQL create indext1_ind on t1(name); Index created. SQL create tablet2 as s
通常来说,全文索引大多用在OLAP环境当中,全文索引擅长于词汇的快速搜索。
一、全文索引和普通b_tree索引对比
SQL> create tablet1 (id int,name varchar(10));
Table created.
SQL> create indext1_ind on t1(name);
Index created.
SQL> create tablet2 as select * from t1;
Table created.
SQL> create indext2_ind on t2(name) indextype is ctxsys.context;
Index created.
SQL> select *from t1 where name like '%tom%';
ID NAME
--------------------
1 tom
2 tom tom
2 tom tom
Execution Plan
----------------------------------------------------------
Plan hash value:3589342044
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 10 | 2 (0)| 00:00:01 |
|* 2 | INDEX FULL SCAN | T1_IND| 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
2 - filter("NAME" LIKE '%tom%' AND"NAME" IS NOT NULL)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
676 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
SQL> select *from t2 where contains(name,'tom')>0;
ID NAME
--------------------
1 tom
2 tom tom
2 tom tom
Execution Plan
----------------------------------------------------------
Plan hash value:785228215
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 30 | 7 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T2 | 3 | 30 | 7 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX | T2_IND| | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
2 -access("CTXSYS"."CONTAINS"("NAME",'tom')>0)
Statistics
----------------------------------------------------------
10 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
676 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
SQL> selectobject_name,object_type from user_objects order by 1;
OBJECT_NAME OBJECT_TYPE
-------------------------------------------------
--DR开头的四张表为全文索引的基表
DR$T2_IND$X INDEX
DRC$T2_IND$R INDEX
SYS_IL0000236119C00006$$ INDEX
SYS_IL0000236124C00002$$ INDEX
SYS_IOT_TOP_236122 INDEX
SYS_IOT_TOP_236128 INDEX
SYS_LOB0000236119C00006$$ LOB
OBJECT_NAME OBJECT_TYPE
-------------------------------------------------
SYS_LOB0000236124C00002$$ LOB
T1 TABLE
T1_IND INDEX
T2 TABLE
T2_IND INDEX
二、DML操作对全文索引的影响
以context方式创建的全文索引并不是基于事务的,默认情况下,即使一个dml操作提交,信息也不会更新到全文索引中。
1、insert 操作
SQL> create tablet(name varchar2(30));
Table created.
SQL> create indext_ind on t(name) indextype is ctxsys.context;
Index created.
SQL> insert intot values('i am an oracle dba');
1 row created.
SQL> commit;
insert数据已提交,我们看看全文索引是否已更新
SQL> setautotrace on
SQL> select *from t where name like '%dba%';
NAME
------------------------------
i am an oracle dba
Execution Plan
----------------------------------------------------------
Plan hash value:1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 17 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
1 - filter("NAME" IS NOT NULL AND"NAME" LIKE '%dba%')
Note
-----
- dynamic sampling used for this statement(level=2)
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
15 consistent gets
0 physical reads
0 redo size
538 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> set line 200
SQL> select *from t where contains(name,'dba') >0;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value:315187259
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 29 | 4 (0)| 00:00:01 |
|* 2 | DOMAIN INDEX | T_IND| | | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
2 -access("CTXSYS"."CONTAINS"("NAME",'dba')>0)
Note
-----
- dynamic sampling used for this statement(level=2)
Statistics
----------------------------------------------------------
1829 recursive calls
0 db block gets
2696 consistent gets
30 physical reads
0 redo size
332 bytes sent via SQL*Net to client
509 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
164 sorts (memory)
0 sorts (disk)
0 rows processed
以上发现,全文索引并没有自动更新,而是把记录存放在线面的ctxsys.dr$pending表中。
SQL> setautotrace off
SQL> select *from ctxsys.dr$pending;
PND_CID PND_PID PND_ROWID PND_TIMESTAMP P
-------------------- ------------------ ------------------ -
1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N
SQL> insert intot values('he is an oracle dba');
1 row created.
SQL> commit;
Commit complete.
SQL> select *from ctxsys.dr$pending;
PND_CID PND_PID PND_ROWID PND_TIMESTAMP P
-------------------- ------------------ ------------------ -
1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N
1084 0 AABGmVAAEAAAADmAAB 03-APR-14 N
SQL> select *from t where contains(name,'dba') >0;
no rows selected
为了把信息同步到全文索引中,我们需要手工同步:
SQL> alter indext_ind rebuild parameters ('sync');
Index altered.
SQL> select *from t where contains(name,'dba') >0;
NAME
------------------------------
i am an oracle dba
he is an oracle dba
SQL> select *from ctxsys.dr$pending;
no rows selected
2、delete操作
SQL> select *from t;
NAME
------------------------------
i am an oracle dba
he is an oracle dba
SQL> delete fromt where name='he is an oracle dba';
1 row deleted.
SQL> select *from t where contains(name,'dba') >0;
NAME
------------------------------
i am an oracle dba
SQL> select *from ctxsys.dr$pending;
no rows selected
SQL> select *from ctxsys.dr$delete;
DEL_IDX_IDDEL_IXP_ID DEL_DOCID
-------------------- ----------
1084 0 2
这里我们看到全文索引立即生效了,至于ctxsys.dr$delete里面的数据是delete操作的中间状态,用来维护一个事物,无论事物提交或者回滚。
SQL> rollback;
Rollback complete.
SQL> select *from t where contains(name,'dba') >0;
NAME
------------------------------
i am an oracle dba
he is an oracle dba
SQL> select *from ctxsys.dr$delete;
no rows selected
3、update操作
update操作相当于delete+insert操作,所以默认情况下需要手动刷新全文索引。
SQL> update t setname='oracle dba' where name='i am an oracle dba';
1 row updated.
SQL> select *from ctxsys.dr$delete;
DEL_IDX_IDDEL_IXP_ID DEL_DOCID
-------------------- ----------
1084 0 1
SQL> select *from ctxsys.dr$pending;
PND_CID PND_PID PND_ROWID PND_TIMESTAMP P
-------------------- ------------------ ------------------ -
1084 0 AABGmVAAEAAAADmAAA 03-APR-14 N
SQL> select *from t where contains(name,'dba') > 0;
NAME
------------------------------
he is an oracle dba
SQL> alter indext_ind rebuild parameters ('sync');
Index altered.
SQL> select *from t where contains(name,'dba') > 0;
NAME
------------------------------
he is an oracle dba
oracle dba
由于全文索引创建对象大多是海量数据的表,dml操作如果实时更新会影响到系统性能。创建全文索引同步的三个选项:
manual:默认选项
every:在一个时间段后更新索引
on commitdml:在事务提交后更新索引
语法如下:
create index t_indon t(name) indextype is ctxsys.context parameters('sync (on commit)');
查看全文索引信息和性能的工具包ctx_report