12C新特性___In-Memory列式存储的总结

发布时间：2020-08-10 03:31:55 作者：lusklusklusk
来源：ITPUB博客阅读：395

官方文档

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/concepts-for-the-im-column-store.html#GUID-5A72B48A-8427-41AE-9220-E46042BC90C4

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/configuring-the-im-column-store.html#GUID-8844C889-E381-4B77-8A51-7AA6462B14D7

The IM column store encodes data in a columnar format: each column is a separate structure. The columns are stored contiguously, which optimizes them for analytic queries. The database buffer cache can modify objects that are also populated in the IM column store. However, the buffer cache stores data in the traditional row format. Data blocks store the rows contiguously, optimizing them for transactions.

When you enable an IM column store, the SGA manages data in separate locations: the In-Memory Area and the database buffer cache.

The IM column store maintains copies of tables, partitions, and individual columns in a special compressed columnar format that is optimized for rapid scans.

In-Memory Column Store意思就是In-Memory列式存储，每一列都是一个单独的结构，它的优点就是只需要访问表的部分列，不像database buffer cache以传统的行格式存储数据，需要访问表的所有列。但是传统的database buffer cache也可以修改填充在In-Memory内存中的对象。

In-Memory列式存储特性开启后数据库启动时会在SGA中分配一块静态的内存池In-Memory Area，用于存放以In-Memory列式存储的用户表。

In-Memory列式存储以一种特殊的压缩列格式维护表、分区和单个列的副本，这种格式是为快速扫描而优化的。

In memory内存中的数据的同步机制

一旦加载到In memory内存中的表涉及DML了，就需要一种机制保证In memory内存中的数据的一致性，因为DML语句的修改在内存中仅修改database buffer cache和log buffer，如何把这些修改的数据同步到In memory内存中呢。Oracle 是通过Transaction journal来确保数据的一致性的。如果DML语句修改的表已经存在In memory内存中，在DML提交后就把该DML的元数据比如表名tablename和行号rowid记录到transaction journal，并把该表在In memory内存中的SCN标识为过期。如果后面新的查询需要访问该表在In memory内存中的数据，就会根据该表原来在In memory内存中的数据+transaction journal+database buffer cache进行访问

当然,如果DML语句不断发生的话，就会使transaction journal的数据越来越多，甚至出现In memory内存中的大部分数据都是过期的旧数据，这对于in memory查询的性能伤害是很大的。所以，Oracle定义了一个阀值staleness threshold,当in memory中旧数据的比例达到这个阀值时就会触发Repopulate的过程，oracle默认2分钟就会检查一次是否触发了该阀值

In-Memory列式存储涉及参数

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/init-parameters-for-im-column-store.html#GUID-A67ABCAC-C6B9-499E-8AE0-BD7922B239BE

In-Memory列式存储涉及的视图

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/views-related-to-im-column-store.html#GUID-2EBF8D9B-FA9E-4D67-8934-5908E6018D4E

关于In-Memory的一些总结

1、数据库级别启用In-Memory列式存储的两个前提条件：MEMORY_TARGET必须设置且大于100M;COMPATIBLE参数必须设置且大于12.1.0

2、表空间、表、分区和物化视图都可以启用In-Memory列式存储,当前表空间启用In-Memory列式存储后，默认为该表空间下以后新增所有表和物化视图都启用了In-Memory列式存储，该表空间下之前已经存在的表不受影响,设置表空间启用In-Memory列式存储时INMEMORY关键字前面必须加default

3、表级别启用In-Memory列式存储的前提条件：create table或alter table时指定了INMEMORY

4、查询表是否启用In-Memory列式存储,参见USER_TABLES.INMEMORY是否等于'ENABLED',等于ENABLED说明启用了

5、表已经启用In-Memory列式存储不代表该表的数据就已经自动加载到In-Memory内存中,只有在实例启动或访问该对象时才会加载到In-Memory内存中，如果想把表数据立即加载到In-Memory内存中，则对该表强制执行全表扫描或使用DBMS_INMEMORY.POPULATE即可。只要对象In-Memory列式存储的PRIORITY的级别不是none，则实例启动或该对象对应的PDB启动时会自动加载该对象到In-Memory内存中，查看表的数据是否已经进入了In-Memory内存区，参见V$IM_SEGMENTS.SEGMENT_NAME。某表已经存在V$IM_SEGMENTS的话，truncate table后V$IM_SEGMENTS中该表记录消失，delete table后V$IM_SEGMENTS中该表记录还在

6、12.2.0版本开始可以使用ILM ADO POLICY对In-Memory列式存储进行相应设置，ILM ADO POLICY在数据库级别生效，而不是实例级别，Information Lifecycle Management (ILM) Automatic Data Optimization (ADO) POLICY信息生命周期管理自动数据优化政策意思就是可以决定In-Memory列式存储在哪张表上什么时候什么情况下生效，什么时候什么情况下失效。ALTER TABLE TABLE_NAME ILM ADD POLICY SET|MODIFY|NO INMEMORY

7、可以只把表的特定字段列启用In-Memory，使用inmemory指定这些特定字段，同时必须使用no inmemory把剩余的列写进去，字段列启用In-Memory的话，其中列的类型不能是LONG or LONG RAW column, an out-of-line column (LOB, varray, nested table column), or an extended data type column，某表只有部分字段列启用In-Memory的话，通过USER_TABLES.INMEMORY='ENABLED'查不到该表,可以通过V$IM_COLUMN_LEVEL.INMEMORY_COMPRESSION<>'NO INMEMORY'来查

8、无法使用In-Memory列式存储的对象有：Indexes、Index-organized tables、Hash clusters、Objects owned by the SYS user and stored in the SYSTEM or SYSAUX tablespace、If you enable a table for the IM column store and it contains any of the following types of columns, then these columns will not be populated in the IM column store:Out-of-line columns (varrays, nested table columns, and out-of-line LOBs)、Columns that use the LONG or LONG RAW data types、Extended data type columns

9、如果不指定inmemory的priority优先级别,默认是none，则只有全表扫描访问对象时才会把该对象放入In-Memory内存中。通过索引扫描或通过rowid获取该对象都不会把该对象放入In-Memory内存中。如果priority级别不是none，则在数据库启动过程中会自动把对象In-Memory放入内存中，或根据优先级别把对象放入In-Memory内存中

10、如果不指定inmemory的MEMCOMPRESS压缩级别,默认是MEMCOMPRESS FOR QUERY LOW

11、如果不指定DUPLICATE时，默认就是NO DUPLICATE，只有RAC环境且是Oracle Engineered System环境才能使用DUPLICATE或DUPLICATE ALL，否则就算是使用了DUPLICATE或DUPLICATE ALL也不起作用，还是当成NO DUPLICATE.

12、如果不指定distribute时,默认是auto,默认存在IM中的表会分布在各个节点之中。只有RAC环境才能使用distribute

13、关于populate和repopulate的区别，populate是把磁盘上的现有数据转换为列格式并存放到In-Memory内存中，repopulate是把将新数据加载到In-Memory内存中，可以简单理解为populate初始化全量刷数据进入In-Memory内存中，repopulate是增量刷数据进入In-Memory内存中

一些实验结果

1、表空间设置为inmemory

创建表空间或修改表空间为inmemory，inmemory关键字前面必须加上default

SQL> create tablespace tablespace1 datafile '/u02/data/tablespace2.dbf' size 100M inmemory;

ERROR at line 1:

ORA-02180: invalid option for CREATE TABLESPACE

SQL> create tablespace tablespace1 datafile '/u02/data/tablespace2.dbf' size 100M default inmemory;

Tablespace created.

SQL> alter tablespace USERS inmemory;

ERROR at line 1:

ORA-02142: missing or invalid ALTER TABLESPACE option

SQL> alter tablespace USERS default inmemory;

Tablespace altered.

2、表设置为inmemory

如果create table as方式，则inmemory放在as前面

create table table1 (hid number(10)) inmemory;

alter table table2 inmemory;

create table t4 inmemory as select * from t1;--t4启用了In-Memory列式存储

create table t5 as select * from t1 inmemory;--t5没有启用了In-Memory列式存储

3、物化视图设置为inmemory

create materialized view mview1 inmemory as select * from table1;

alter materialized view mview2 inmemory

4、分区表某些分区设置为inmemory

建表是最后两个分区SALES_Q4_2019、SALES_Q1_2020都没有启用In-Memory列式存储，参见user_tab_partitions.inmemory，最后修改SALES_Q4_2019分区启用In-Memory列式存储

CREATE TABLE sales1( prod_id NUMBER(6),time_id DATE,channel_id varchar2(100))

PARTITION BY RANGE (time_id)

(PARTITION SALES_Q1_2019

VALUES LESS THAN (TO_DATE('01-APR-2019','DD-MON-YYYY')) INMEMORY,

PARTITION SALES_Q2_2019

VALUES LESS THAN (TO_DATE('01-JUL-2019','DD-MON-YYYY')) INMEMORY,

PARTITION SALES_Q3_2019

VALUES LESS THAN (TO_DATE('01-OCT-2019','DD-MON-YYYY')) INMEMORY,

PARTITION SALES_Q4_2019

VALUES LESS THAN (TO_DATE('01-JAN-2020','DD-MON-YYYY')) NO INMEMORY,

PARTITION SALES_Q1_2020

VALUES LESS THAN (MAXVALUE));

alter table sales1 modify partition SALES_Q4_2019 inmemory;

5、字段列设置为inmemory

如下创建的表table1,只有CREATED_APPID字段没有启用In-Memory列式存储，其他列都启用了

所以一张表只要某些列设置为inmemory时，必须使用no inmemory把剩余的列写进去

create table table1 as select * from dba_objects;

alter table table1 inmemory (OWNER) no inmemory (CREATED_APPID);

When a database is restarted, all of the data for database objects with a priority level other than NONE are populated in the IM column store during startup.

重新启动数据库后，在启动期间，优先级比NONE高的数据库对象的所有数据都将加载进入In-Memory中。

Population

The operation of reading existing data blocks from data files, transforming the rows into columnar format, and then writing the columnar data to the IM column store. In contrast, loading refers to bringing new data into the database using DML or DDL.

Population, which transforms existing data on disk into columnar format, is different from repopulation, which loads new data into the IM column store. Because IMCUs are read-only structures, Oracle Database does not populate them when rows change. Rather, the database records the row changes in a transaction journal, and then creates new IMCUs as part of repopulation

从数据文件读取现有数据块，将行转换为列格式，然后将列数据写入IM列存储的操作。相反，loading是指使用DML或DDL将新数据带入数据库。

Population是将磁盘上的现有数据转换为列格式，Population不同于将新数据加载到IM列存储中的repopulation。因为IMCU是只读结构，所以当行更改时，Oracle数据库不会填充它们。而是，数据库将行更改记录在transaction journal中，然后创建新的IMCU作为repopulation的一部分

IMCU

An In-Memory Compression Unit (IMCU) is a compressed, read-only storage unit that contains data for one or more columns.

内存中压缩单元（IMCU）是一种压缩的只读存储单元，其中包含一个或多个列的数据。

Transaction journal

Metadata in a Snapshot Metadata Unit (SMU) that keeps the IM column store transactionally consistent.

快照元数据单元（SMU）中的元数据，可以使IM列存储在事务上保持一致。

Every SMU contains a transaction journal. The database uses the transaction journal to keep the IMCU transactionally consistent.

The database uses the buffer cache to process DML, just as when the IM column store is not enabled. For example, an UPDATE statement might modify a row in an IMCU. In this case, the database adds the rowid for the modified row to the transaction journal and marks it stale as of the SCN of the DML statement. If a query needs to access the new version of the row, then the database obtains the row from the database buffer cache.

The database achieves read consistency by merging the contents of the column, transaction journal, and buffer cache. When the IMCU is refreshed during repopulation, queries can access the up-to-date row directly from the IMCU.

每个SMU都包含一个transaction journal。数据库使用transaction journal来使IMCU保持事务一致。

与未启用IM列存储时一样，数据库使用缓冲区高速缓存来处理DML。例如，一条UPDATE语句可能会修改IMCU中的一行。在这种情况下，数据库将已修改行的行标识添加到transaction journal中，并从DML语句的SCN开始将其标记为过期。如果查询需要访问该行的新版本，则数据库从数据库缓冲区高速缓存中获取该行。

数据库通过合并列，transaction journal和缓冲区高速缓存的内容来实现读取一致性。在重新填充期间刷新IMCU时，查询可以直接从IMCU访问最新行。

Repopulation

The automatic refresh of a currently populated In-Memory Compression Unit (IMCU) after its data has been significantly modified. In contrast, population is the initial creation of IMCUs in the IM column store.

在当前populated内存中压缩单元（IMCU）的数据进行了重大修改之后，它会自动刷新。相反，population是IM列存储中IMCU的初始创建。

12C新特性___In-Memory列式存储的总结

相关阅读