I.症状:
业务侧反馈应用缓慢,我方发现数据库缓慢,并伴随有LATCH:ROW CACHE OBJECTS等待事件,数据库CPU过高。
II.解决方案:
通过如下分析是由于故障时段有大量硬解析,硬解析需要去获取数据字典资源,这需要获得latch,若硬解析量太大,会造成严重的latch争用,占用非常多的资源,导致CPU使用过高,从而表现出数据库ROW CACHE OBJECT等待事件,最终导致数据库系统缓慢。
根据分析,我方有如下2点建议:
1. 对业务侧进行SQL整改,将未使用绑定变量改成绑定变量,成为软解析,减少获取数据字典的次数,从而降低LATCH争用次数,从根本解决问题(推荐)
2. 对数据库参数cursor_sharing进行调整,可部分缓解这个问题,但不能从根本上解决问题。
III.分析过程:
1.数据库存在LATCH等待情况,看到这几个等待事件,这个数据库的内存很大,我怀疑是硬解析或者热块争用有关;
2.SGA使用率,Shared pool还有17G,排除内存不足
select POOL,BYTES/1024/1024 FREE_MB from v$sgastat a where a.NAME like 'free%';
3. 通过AWR发现Parse CPU to Parse Elapsed为42%,Non-Parse CPU为57%,正常情况下98%以上,该值表示SQL解析时间占比,越高越好,如果过低说明解析中等待资源时间太长。软解析比例也过低,正常在98%以上。应该就是硬解析的问题了
4. 通过分析数据库内部资源,发现数据字典的各种LATCH争用过高,(GETS表示请求该资源的次数,MISSES表示请求失败重新请求的次数,SLEEPS表示请求失败进入睡眠队列),成功率最底12%,正常应为98%。
查询LATCH 的GET量,以及成为率
col LATCH_NAME for a20
SELECT a.addr,a.latch#,a.child#,a.level#,a.name LATCH_NAME,a.gets,a.misses,round((1-a.misses/a.gets)*100,2) SUCESS_PCT,a.sleeps
FROM v$latch_children a
WHERE a.name='row cache objects' AND a.gets <>0
ORDER BY a.gets desc;
4.查询ROW CACHE中的GET量及命中率
SELECT r.cache#,r.parameter name,r.TYPE,r.subordinate#,r.gets,r.GETMISSES,round((1 - r.GETMISSES/r.gets)*100,2) SUC_PCT FROM v$rowcache r where r.gets <>0 ORDER BY 5 desc;
争用最多的是 DC_OBJECTS,DC_OBJECTS_GRANTS,DC_HISTOGRAM_DATA,DC_HISTOGRAM_DEF
5.统计硬解析的SQL语句
查SQL硬解析次数1000以上的FORCE_MATCHING_SIGNATURE,若需要得到SQL语句用FORCE_MATCHING_SIGNATURE关联即可:
select to_char(FORCE_MATCHING_SIGNATURE) FORCE_MATCHING_SIGNATURE, count(1) counts from v$sql
where FORCE_MATCHING_SIGNATURE>0 and FORCE_MATCHING_SIGNATURE <> EXACT_MATCHING_SIGNATURE and EXECUTIONS<=5
group by FORCE_MATCHING_SIGNATURE
having count(1) > 1000
order by 2 desc;
官方v$SQL视图:
EXACT_MATCHING_SIGNATURE
|
NUMBER
|
Signature calculated on the normalized SQL text. The normalization includes the removal of white space and the uppercasing of all non-literal strings.
|
FORCE_MATCHING_SIGNATURE
|
NUMBER
|
The signature used when the CURSOR_SHARING parameter is set to FORCE
|
将变量硬编码至SQL中的游标,FORCE_MATCHING_SIGNATURE值完全相同,而EXACT_MATCHING_SIGNATURE值各有不同。FORCE_MATCHING_SIGNATURE值相同说明在游标共享FORCE模式下,这些游标满足CURSOR SHARING的条件
6.SQL解析到底哪一步访问了ROWCACHE,哪一步争用的latch?
这个问题我查询了好久,我之前以为只是在语义检查需要到row cache,其实是在生成执行计划的时候需要访问的数据字典次数更多,争用latch也就更频繁了,所以这里才是最慢的。
http://docs.oracle.com/cd/E11882_01/server.112/e40540/sqllangu.htm#CNCPT015
SQL execution flow
Parse operations fall into the following categories, depending on the type of statement submitted and the result of the hash check:
-
Hard parse
If Oracle Database cannot reuse existing code, then it must build a new executable version of the application code. This operation is known as ahard parse, or a library cache miss. The database always perform a hard parse of DDL.
During the hard parse, the database accesses the library cache and data dictionary cache numerous times to check the data dictionary. When the database accesses these areas, it uses a serialization device called a latch on required objects so that their definition does not change (see"Latches"). Latch contention increases statement execution time and decreases concurrency.
在硬解析期间,数据库需要访问 library cache and data dictionary cache 非常多次去检查数据字典,当数据库访问这些区域的时候,它用一个序列化的设备调用一个latch锁存这个对象,使其的定义不会被改变。latch 的争用会增加语句执行时间以及减少并发量。
-
Soft parse
A soft parse is any parse that is not a hard parse. If the submitted statement is the same as a reusable SQL statement in the shared pool, then Oracle Database reuses the existing code. This reuse of code is also called a library cache hit.
Soft parses can vary in the amount of work they perform. For example, configuring the session shared SQL area can sometimes reduce the amount of latching in the soft parses, making them "softer."
In general, a soft parse is preferable to a hard parse because the database skips the optimization and row source generation steps, proceeding straight to execution.
当然有时热块争用也会造成latch,后续我会再分析