您好,登录后才能下订单哦!
原文:
从Oracle到PostgreSQL:最全控制文件(上) https://www.enmotech.com/web/detail/1/770/1.html
从Oracle到PostgreSQL:最全控制文件(下) https://www.enmotech.com/web/detail/1/771/1.html
导读:本文介绍了Oracle和PostgreSQL控制文件基本内容,对如何重建PostgreSQL控制文件进行了详细描述并进行了恢复测试。
控制文件内容
Oracle控制文件内容
从官方文档上可以知道控制文件保存着下列信息:
由于控制文件是个二进制文件,无法直接打开查阅,可以将控制文件内容转储出来便于查看,可以使用以下命令来做转存。
SQL> alter session set events 'immediate trace name controlf level 8';Session altered.SQL> select value from v$diag_info where name='Default Trace File';VALUE--------------------------------------------------------------------------------/u01/app/oracle/diag/rdbms/rac12201/RAC122011/trace/RAC122011_ora_24813.trc
注意,从11g开始可以通过v$diag_info获得当前会话转储文件的名称。
打开跟踪文件可以清晰的看到控制文件的内容,最开始的一段是关于数据库ID、名称等的概要信息:
Trace file /u01/app/oracle/diag/rdbms/rac12201/RAC122011/trace/RAC122011_ora_24813.trcOracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit ProductionBuild label: RDBMS_12.2.0.1.0_LINUX.X64_170125ORACLE_HOME: /u01/app/oracle/product/12.2.0.1/dbhome_1System name: LinuxNode name: ractest1Release: 2.6.32-431.el6.x86_64Version: #1 SMP Sun Nov 10 22:19:54 EST 2013Machine: x86_64Instance name: RAC122011Redo thread mounted by this instance: 1Oracle process number: 96Unix process pid: 24813, image: oracle@ractest1 (TNS V1-V3)*** 2019-05-30T09:15:38.980823+08:00 (CDB$ROOT(1))*** SESSION ID:(59.49876) 2019-05-30T09:15:38.980878+08:00*** CLIENT ID:() 2019-05-30T09:15:38.980885+08:00*** SERVICE NAME:(SYS$USERS) 2019-05-30T09:15:38.980891+08:00*** MODULE NAME:(sqlplus@ractest1 (TNS V1-V3)) 2019-05-30T09:15:38.980897+08:00*** ACTION NAME:() 2019-05-30T09:15:38.980903+08:00*** CLIENT DRIVER:(SQL*PLUS) 2019-05-30T09:15:38.980908+08:00*** CONTAINER ID:(1) 2019-05-30T09:15:38.980914+08:00DUMP OF CONTROL FILES, Seq # 233771 = 0x3912bV10 STYLE FILE HEADER: Compatibility Vsn = 203424000=0xc200100 Db ID=1217928546=0x48981d62, Db Name='RAC12201' Activation ID=0=0x0 Control Seq=233771=0x3912b, File size=1216=0x4c0 File Number=0, Blksiz=16384, File Type=1 CONTROL
接下来是数据条目的详细信息,包括了数据的名称、数据文件及日志文件的数量、数据库的检查点及SCN信息等:
***************************************************************************DATABASE ENTRY***************************************************************************(size = 316, compat size = 316, section max = 1, section in-use = 1, last-recid= 0, old-recno = 0, last-recno = 0)(extent = 1, blkno = 1, numrecs = 1)03/31/2019 23:47:46DB Name "RAC12201"Database flags = 0x10406001 0x00001200 0x00000082Controlfile Creation Timestamp 03/31/2019 23:47:47Incmplt recovery scn: 0x0000000000000000Resetlogs scn: 0x0000000000157e2e Resetlogs Timestamp 03/31/2019 23:47:49Prior resetlogs scn: 0x0000000000000001 Prior resetlogs Timestamp 01/26/2017 13:52:29Redo Version: compatible=0xc200100#Data files = 28, #Online files = 25Database checkpoint: Thread=1 scn: 0x0000000002a1699eThreads: #Enabled=2, #Open=2, Head=1, Tail=2enabled threads: 01100000 00000000 00000000 00000000 00000000 00000000.......Max log members = 3, Max data members = 1Arch list: Head=1, Tail=9, Force scn: 0x00000000029c57a6scn: 0x0000000000000000Activation ID: 1217928802Snapshot Controlfile filename name #31: +DATA/snapcf_rac12201.fSnapshot Controlfile checkpoint scn: 0x00000000026d24dd 05/25/2019 22:40:30SCN compatibility 1Auto-rollover enabledControlfile Checkpointed at scn: 0x0000000002a231ff 05/30/2019 09:15:32thread:0 rba:(0x0.0.0)enabled threads: 00000000 00000000 00000000 00000000 00000000 00000000.......
再接下来是检查点记录信息,这部分内容包含了Low Cache RBA 和 On Disk RBA信息,在执行数据库实例恢复时,前者是恢复的起点,后者是恢复的终点,其分别指向了日志文件中的确定地址:
***************************************************************************CHECKPOINT PROGRESS RECORDS***************************************************************************(size = 8180, compat size = 8180, section max = 35, section in-use = 0, last-recid= 0, old-recno = 0, last-recno = 0)(extent = 1, blkno = 2, numrecs = 35)THREAD #1 - status:0x2 flags:0x0 dirty:54low cache rba:(0x13c.ec78.0) on disk rba:(0x13c.edda.0)on disk scn: 0x0000000002a232bc 05/30/2019 09:15:37resetlogs scn: 0x0000000000157e2e 03/31/2019 23:47:49heartbeat: 1009031373 mount id: 1222276307
控制文件还有跟多其它记录,大家可以转储出来仔细阅读接下来的每个条目。
接下来我们看看PostgreSQL控制文件都记录了什么。
PostgreSQL控制文件内容
相比Oracle的控制文件,PostgreSQL控制文件内容就少了很多,主要分为是三部分,初始化静态信息、WAL及检查点的动态信息、一些配置信息。
我们可以用过pg_controldata命令直接读取PostgreSQL控制文件内容:
[postgres@lsl-test1 ~]$ /usr/pgsql-11/bin/pg_controldata -D /pg/pg11/datapg_control version number: 1100Catalog version number: 201809051Database system identifier: 6691945724594983959Database cluster state: in productionpg_control last modified: Thu 30 May 2019 03:20:03 PM CSTLatest checkpoint location: 0/60001E8Latest checkpoint's REDO location: 0/60001E8Latest checkpoint's REDO WAL file: 000000010000000000000006Latest checkpoint's TimeLineID: 1Latest checkpoint's PrevTimeLineID: 1Latest checkpoint's full_page_writes: onLatest checkpoint's NextXID: 0:1048576Latest checkpoint's NextOID: 10000Latest checkpoint's NextMultiXactId: 65536Latest checkpoint's NextMultiOffset: 52352Latest checkpoint's oldestXID: 2296015872Latest checkpoint's oldestXID's DB: 0Latest checkpoint's oldestActiveXID: 0Latest checkpoint's oldestMultiXid: 65536Latest checkpoint's oldestMulti's DB: 0Latest checkpoint's oldestCommitTsXid:0Latest checkpoint's newestCommitTsXid:0Time of latest checkpoint: Thu 30 May 2019 03:20:03 PM CSTFake LSN counter for unlogged rels: 0/1Minimum recovery ending location: 0/0Min recovery ending loc's timeline: 0Backup start location: 0/0Backup end location: 0/0End-of-backup record required: nowal_level setting: replicawal_log_hints setting: offmax_connections setting: 100max_worker_processes setting: 8max_prepared_xacts setting: 0max_locks_per_xact setting: 64track_commit_timestamp setting: offMaximum data alignment: 8Database block size: 8192Blocks per segment of large relation: 131072WAL block size: 8192Bytes per WAL segment: 16777216Maximum length of identifiers: 64Maximum columns in an index: 32Maximum size of a TOAST chunk: 1996Size of a large-object chunk: 2048Date/time type storage: 64-bit integersFloat4 argument passing: by valueFloat8 argument passing: by valueData page checksum version: 0Mock authentication nonce: 0000000000000000000000000000000000000000000000000000000000000000
下面详细介绍下各参数含义。
创建时间可以通过to_timestamp转换查看到。
创建时间可以通过to_timestamp转换查看到。postgres=# SELECT to_timestamp(((6691945724594983959>>32) & (2^32 -1)::bigint)); to_timestamp ------------------------2019-05-17 18:47:10+08(1 row)
Database cluster state 记录实例的状态。源码文件中看到数据库的几种状态,源码pg_control.h中可以看到:
starting up:表示数据库正在启动状态。shut down: 数据库实例(非Standby)正常关闭后控制文件中就是此状态。shut down in recovery:Standby实例正常关闭后控制文件中就是此状态。shutting down:正常停库时,先做checkpoint,开始做checkpoint时,会把状态设置为此状态,做完后把状态设置为shut down。in crash recovery:数据库实例非异常停止后,重新启动后,会先进行实例的恢复,在实例恢复时的状态就是此状态。in archive recovery:Standby实例正常启动后,就是此状态。in production:数据库实例正常启动后就是此状态。Standby数据库正常启动后不是此状态
重建控制文件
如果控制文件损坏或丢失,数据库将运行异常,也无法启动。对于Oracle和PostgreSQL 控制文件同样重要。
Oracle控制文件重建
对于Oracle来说,当控制文件损坏无备份的情况下,可以通过手工重建控制文件的方法来恢复控制文件。
具体命令如下图:
详细信息可以查看官方文档:
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/CREATE-CONTROLFILE.html#GUID-9B389F28-C4D0-405D-BFE6-48237E8BD791https://docs.oracle.com/en/database/oracle/oracle-database/12.2/admin/managing-control-files.html#GUID-98A05D29-DD80-4D87-9615-76CBCF8FE694
PostgreSQL控制文件重建
pg9.6前使用 pg_resetxlog,pg10之后使用pg_resetwal清理wal日志或重置控制文件中一些控制信息。
命令详细介绍可以查看官方文档:
https://www.postgresql.org/docs/11/app-pgresetwal.html
[postgres@lsl-test1 ~]$ /usr/pgsql-11/bin/pg_resetwal -n -D /pg/pg11/datapg_resetwal: lock file "postmaster.pid" existsIs a server running? If not, delete the lock file and try again.[postgres@lsl-test1 ~]$ ./pg_resetwal --helppg_resetwal resets the PostgreSQL write-ahead log.Usage: pg_resetwal [OPTION]... DATADIROptions: -c, --commit-timestamp-ids=XID,XID set oldest and newest transactions bearing commit timestamp (zero means no change)[-D, --pgdata=]DATADIR data directory -e, --epoch=XIDEPOCH set next transaction ID epoch -f, --force force update to be done -l, --next-wal-file=WALFILE set minimum starting location for new WAL -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID -n, --dry-run no update, just show what would be done -o, --next-oid=OID set next OID -O, --multixact-offset=OFFSET set next multitransaction offset -V, --version output version information, then exit -x, --next-transaction-id=XID set next transaction ID --wal-segsize=SIZE size of WAL segments, in megabytes -?, --help show this help, then exit
下面看下命令各个参数具体含义:
-c参数有两个参数值,一个最旧的事务号,一个最新的事务号。最旧的事务号的安全值,可以在pg_commit_ts目录查询最小的文件名;最新事务ID的安全值,可以在pg_commit_ts目录查询最大的文件名。文件名都是16进制。实际测试在11的版本pg_commit_ts目录下未发现文件。
[postgres@lsl-test1 data]$ cd pg_commit_ts/[postgres@lsl-test1 pg_commit_ts]$ ls -ltotal 0
-e参数是设置事务号的新纪元(epoch),除了pg_resetwal设置的字段之外,事务ID新纪元实际上并不存储在数据库的任何位置。您可能需要调整此值,以确保Slony或者Skytools等复制系统能够正确工作。如果是这样的话,应该可以从下游复制数据库的状态获得适当的值。
-l 参数通过指定下一个WAL段文件的名称,手动设置WAL启动位置。该选项使用WAL文件名,而不是LSN。下一个段的名字应该大于当前存在pg_wal目录下的任何WAL段文件名。
[postgres@lsl-test1 pg_commit_ts]$ cd ../pg_wal/[postgres@lsl-test1 pg_wal]$ ls -ltotal 32768-rw-------. 1 postgres postgres 16777216 May 30 17:26 000000010000000000000007-rw-------. 1 postgres postgres 16777216 May 30 17:26 000000010000000000000008drwx------. 2 postgres postgres 6 May 17 17:22 archive_status
这些名称也是十六进制的,文件名包含三部分 ,第一部分时间线号(timeline ID) ,第二部分逻辑日志号 ,第三部分日志段号。
-m参数也是两个值,一个是下一个多事务号,一个是最旧的多事务号。
下一个多事务号的安全值,可以在目录pg_multixact/offsets查找数值最大的文件名,加1然后乘以65536(0x10000)。
最旧的多事务号的安全值可以通过查询目录下数值最小的文件名乘以65536。文件名都是十六进制。
[postgres@lsl-test1 pg_wal]$ cd ../pg_multixact/offsets/[postgres@lsl-test1 offsets]$ ls -ltotal 8-rwx------. 1 postgres postgres 8192 May 17 18:04 0000
-o参数是设置下一个OID(OID,object 是pg内部使用,作为系统表的主键),我们恢复时可以不设置这个参数,因为设置一个超过数据库中最大值OID没有好的办法。
-O参数是设置下一个多事务偏移量。查找pg_multixact/members目录下数值最大的文件名,+1乘以52352 (0xCC80),可以计算出偏移量的安全值。目录下文件的文件名也是十六进制的。
[postgres@lsl-test1 offsets]$ ls -l ../members/total 8-rwx------. 1 postgres postgres 8192 May 17 17:22 0000
--wal-segsize参数设置新的WAL段大小 。
-x参数是手工设置下一个事务ID,pg_xact目录下可以查看数值最大的文件名,+1乘以 1048576 (0x100000),获取安全值。文件名也是十六进制。
[postgres@lsl-test1 offsets]$ ls -l ../../pg_xact/total 16-rwx------. 1 postgres postgres 8192 May 17 18:04 0000-rw-------. 1 postgres postgres 8192 May 30 17:26 0001
PostgreSQL控制文件恢复测试
测试过程如下(基于PostgreSQL 11.2) :
1. 新建测试数据, 用到with oids的表, 因为OID无法确定, 看看是否会有异常
postgres=# create table lsl_oid1(id int primary key) with oids; CREATE TABLEpostgres=# insert into lsl_oid1 select generate_series(1,100000);INSERT 0 100000postgres=# select min(oid),max(oid) from lsl_oid1 ; min | max -------+--------16400 | 116399(1 row)
2. 关闭数据,记录下控制文件信息
[postgres@lsl-test1 bin]$ /usr/pgsql-11/bin/pg_ctl stop -D /pg/pg11/datawaiting for server to shut down.... doneserver stopped
## 记下pg_controldata信息, 方便修复后进行比对
[postgres@lsl-test1 bin]$ /usr/pgsql-11/bin/pg_controldata -D /pg/pg11/datapg_control version number: 1100Catalog version number: 201809051Database system identifier: 6691945724594983959Database cluster state: shut downpg_control last modified: Thu 30 May 2019 05:26:41 PM CSTLatest checkpoint location: 0/79E9888Latest checkpoint's REDO location: 0/79E9888Latest checkpoint's REDO WAL file: 000000010000000000000007Latest checkpoint's TimeLineID: 1Latest checkpoint's PrevTimeLineID: 1Latest checkpoint's full_page_writes: onLatest checkpoint's NextXID: 0:1048585Latest checkpoint's NextOID: 116400Latest checkpoint's NextMultiXactId: 65536Latest checkpoint's NextMultiOffset: 52352Latest checkpoint's oldestXID: 561Latest checkpoint's oldestXID's DB: 13878Latest checkpoint's oldestActiveXID: 0Latest checkpoint's oldestMultiXid: 1Latest checkpoint's oldestMulti's DB: 13878Latest checkpoint's oldestCommitTsXid:0Latest checkpoint's newestCommitTsXid:0Time of latest checkpoint: Thu 30 May 2019 05:26:40 PM CSTFake LSN counter for unlogged rels: 0/1Minimum recovery ending location: 0/0Min recovery ending loc's timeline: 0Backup start location: 0/0Backup end location: 0/0End-of-backup record required: nowal_level setting: replicawal_log_hints setting: offmax_connections setting: 100max_worker_processes setting: 8max_prepared_xacts setting: 0max_locks_per_xact setting: 64track_commit_timestamp setting: offMaximum data alignment: 8Database block size: 8192Blocks per segment of large relation: 131072WAL block size: 8192Bytes per WAL segment: 16777216Maximum length of identifiers: 64Maximum columns in an index: 32Maximum size of a TOAST chunk: 1996Size of a large-object chunk: 2048Date/time type storage: 64-bit integersFloat4 argument passing: by valueFloat8 argument passing: by valueData page checksum version: 0Mock authentication nonce: 0000000000000000000000000000000000000000000000000000000000000000
3. 模拟控制文件故障,直接删除控制文件
[postgres@lsl-test1 global]$ pwd/pg/pg11/data/global[postgres@lsl-test1 global]$ rm -rf pg_control
4. 启动数据库,模拟控制文件丢失场景
[postgres@lsl-test1 global]$ /usr/pgsql-11/bin/pg_ctl startwaiting for server to start....postgres: could not find the database systemExpected to find it in the directory "/pg/pg11/data",but could not open file "/pg/pg11/data/global/pg_control": No such file or directorystopped waitingpg_ctl: could not start serverExamine the log output.
下面开始正式重建控制文件,使得数据库可以正常启动。
5. 首先创建一个名为pg_control的空文件
[postgres@lsl-test1 global]$ touch $PGDATA/global/pg_control
6. 使用pg_resetwal修复pg_control
确认pg_resetwal参数值。
首先确认-c参数,上面参数详细分析发现目录下为空,因此暂时忽略此参数。
-e参数是设置下一个事务号的新纪元,我们测试环境没有其它复制系统因此也可以忽略。
-l参数,查看pg_wal下文件文件名,大于文件名最大值即可。
[postgres@lsl-test1 ~]$ cd /pg/pg11/data/pg_wal[postgres@lsl-test1 pg_wal]$ ls -ltotal 32768-rw-------. 1 postgres postgres 16777216 May 30 17:26 000000010000000000000007-rw-------. 1 postgres postgres 16777216 May 30 17:26 000000010000000000000008drwx------. 2 postgres postgres 6 May 17 17:22 archive_status之类要大于最大值,因此我们可以去-l=000000010000000000000009-m参数取pg_multixact/offsets目录下最大值加1然后乘以65536(0x10000)和最小值 乘以65536(0x10000)[postgres@lsl-test1 pg_commit_ts]$ cd ../pg_multixact/offsets/[postgres@lsl-test1 offsets]$ ls -ltotal 8-rwx------. 1 postgres postgres 8192 May 17 18:04 0000
因此-m可以取0x10000,0x00000。
-o参数不确定时,由于测试没有复制软件因此可以暂时忽略。
-O 查找pg_multixact/members目录下数值最大的文件名,+1乘以52352 (0xCC80)。
[postgres@lsl-test1 offsets]$ cd ../members/[postgres@lsl-test1 members]$ ls -ltotal 8-rwx------. 1 postgres postgres 8192 May 17 17:22 0000
因此-O=0xCC80。
-x参数查找pg_xact目录下可以查看数值最大的文件名,+1乘以 1048576 (0x100000)。
[postgres@lsl-test1 members]$ cd ../../pg_xact/[postgres@lsl-test1 pg_xact]$ ls -ltotal 16-rwx------. 1 postgres postgres 8192 May 17 18:04 0000-rw-------. 1 postgres postgres 8192 May 30 17:26 0001
因此-x=0x200000。
不加-f参数时可以查看要写入控制文件中的参数内容。
[postgres@lsl-test1 pg_xact]$ /usr/pgsql-11/bin/pg_resetwal -l 000000010000000000000009 -m 0x10000,0x00000 -O 0xCC80 -x 0x200000 -D /pg/pg11/datapg_resetwal: oldest multitransaction ID (-m) must not be 0[postgres@lsl-test1 pg_xact]$ /usr/pgsql-11/bin/pg_resetwal -l 000000010000000000000009 -m 0x10000,0x00001 -O 0xCC80 -x 0x200000 -D /pg/pg11/datapg_resetwal: pg_control exists but is broken or wrong version; ignoring itGuessed pg_control values:pg_control version number: 1100Catalog version number: 201809051Database system identifier: 6696828635748080009Latest checkpoint's TimeLineID: 1Latest checkpoint's full_page_writes: offLatest checkpoint's NextXID: 0:3Latest checkpoint's NextOID: 10000Latest checkpoint's NextMultiXactId: 1Latest checkpoint's NextMultiOffset: 0Latest checkpoint's oldestXID: 3Latest checkpoint's oldestXID's DB: 0Latest checkpoint's oldestActiveXID: 0Latest checkpoint's oldestMultiXid: 1Latest checkpoint's oldestMulti's DB: 0Latest checkpoint's oldestCommitTsXid:0Latest checkpoint's newestCommitTsXid:0Maximum data alignment: 8Database block size: 8192Blocks per segment of large relation: 131072WAL block size: 8192Bytes per WAL segment: 16777216Maximum length of identifiers: 64Maximum columns in an index: 32Maximum size of a TOAST chunk: 1996Size of a large-object chunk: 2048Date/time type storage: 64-bit integersFloat4 argument passing: by valueFloat8 argument passing: by valueData page checksum version: 0Values to be changed:First log segment after reset: 000000010000000000000009NextMultiXactId: 65536OldestMultiXid: 1OldestMulti's DB: 0NextMultiOffset: 52352NextXID: 2097152OldestXID: 2297064448OldestXID's DB: 0If these values seem acceptable, use -f to force reset.[postgres@lsl-test1 pg_xact]$ cd ../global/[postgres@lsl-test1 global]$ ls -l pg_control-rw-r--r--. 1 postgres postgres 0 May 30 17:36 pg_control
7. 启动数据库
确认控制文件参数无误后加上-f会写入到控制文件里。
[postgres@lsl-test1 global]$ /usr/pgsql-11/bin/pg_resetwal -l 000000010000000000000009 -m 0x10000,0x00001 -O 0xCC80 -x 0x200000 -D /pg/pg11/data -fpg_resetwal: pg_control exists but is broken or wrong version; ignoring itWrite-ahead log reset
启动数据库。
[postgres@lsl-test1 global]$ /usr/pgsql-11/bin/pg_ctl start -D /pg/pg11/data/waiting for server to start....2019-05-30 22:42:50.946 CST [2471] LOG: listening on IPv6 address "::1", port 54322019-05-30 22:42:50.946 CST [2471] LOG: listening on IPv4 address "127.0.0.1", port 54322019-05-30 22:42:50.949 CST [2471] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"2019-05-30 22:42:50.961 CST [2471] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"2019-05-30 22:42:50.987 CST [2471] LOG: redirecting log output to logging collector process2019-05-30 22:42:50.987 CST [2471] HINT: Future log output will appear in directory "log".doneserver started
8. 查看测试数据是否正常,然后插入新数据看数据库是否可用。
[postgres@lsl-test1 global]$ psqlpsql (11.2)Type "help" for help.postgres=# select min(oid),max(oid),count(*) from lsl_oid1 ; min | max | count -------+--------+--------16400 | 116399 | 100000(1 row)postgres=# insert into lsl_oid1 select generate_series(100001,200000); INSERT 0 100000postgres=# select min(oid),max(oid),count(*) from lsl_oid1 ; min | max | count -------+--------+--------16384 | 116399 | 200000(1 row)
数据库可以正常访问。
至此,本文关于Oracle和PostgreSQL的控制文件内容介绍到这里。
想了解更多关于数据库、云技术的内容吗?
快来关注“数据和云”公众号、“云和恩墨”官方网站,我们期待与大家一同学习和进步!
(扫描上方二维码,关注“数据和云”公众号,即可查看更多科技文章)
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。