CRS-0184 Cannot communicate with the CRS daemon

发布时间：2020-08-11 04:39:46 作者：贺子_DBA时代
来源：ITPUB博客阅读：307

oracle rac遇到了问题：报错：

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4534: Cannot communicate with Event Manager‘

问题分析：由于网站上云，oracle有一套rac从idc机房撤回到了公司本地，，按着步骤关闭了数据库，领导关闭的，只是su - oracle 然后shu immediate,关闭了oracle实例，asm实例则没有关闭，然后搬到公司按着原来的位置插好了网线并尝试启动，我只尝试着把ora010的实例起来了，然后就不管了，后来要用这套库的时候，我才看ora102的状态，才意识到数据库实例和asm实例都没有启动，于是尝试启动，但是报错如下：

首先先说下oracle rac服务器需要重启的时候，oracle相关资源关闭的的流程：

方法一：

1）关闭oracle实例

[grid@ora102 ~]$ srvctl stop database -d ORCL

2）关闭asm实例

[grid@ora102 ~]$ srvctl stop asm -n ora102

[grid@ora102 ~]$ srvctl stop asm -n ora101

如果报错就强制关闭，如下

[root@ora101 bin]# ./srvctl stop asm

PRCR-1065 : Failed to stop resource ora.asm

CRS-2529: Unable to act on 'ora.asm' because that would require stopping or relocating 'ora.DATA.dg', but the force option was not specified

加上强制关闭即可：

[grid@ora101 ~]$ srvctl stop asm -f

[grid@ora101 ~]$ srvctl status asm

ASM is not running.

3)最后还需要关闭crs

[root@ora101 bin]# ./crsctl stop cluster -all

方法二：

1)关闭oracle实例,两个节点都执行

su - oracle

sqlplus / as sysdba

shu immediate

2）关闭asm实例，两个节点都执行

su - grid

sqlplus / as sysasm

shu immediate

sqlplu abort强制关闭

[grid@ora101 ~]$ sqlplus / as sysasm

SQL> shu abort

ASM instance shutdown

3)最后还需要关闭crs

[root@ora101 bin]# ./crsctl stop cluster -all

检查数据库和asm实例的状态，以及crs的状态

[grid@ora101 ~]$ srvctl status asm

ASM is running on ora101,ora102

[grid@ora101 ~]$ srvctl status database -d ORCL

Instance orcl1 is not running on node ora101

Instance orcl2 is not running on node ora102

好了言归正传，继续说遇到的问题。

[root@ora102 ~]# su - grid

[grid@ora102 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Wed Nov 29 22:28:20 2017

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Real Application Clusters and Automatic Storage Management options

SQL> startup

报错。。。

在ora102节点上检查集群服务的状态，报错

[root@ora102 ~]# /u01/app/11.2.0/grid/bin/crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

根据上面报错，可以判断出crs是有问题。

尝试启动也报错：注意需要使用root

[root@ora102 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4640: Oracle High Availability Services is already active

CRS-4000: Command Start failed, or completed with errors.

正常情况是：

[root@ora102 bin]# /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

检查crs服务，发现有问题：

[grid@ora102 ~]$ crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services demon

CRS-4534: Cannot communicate with Event Manager‘

然后节点ora102查看ip情况，发现vip和scan ip都已经不在，vip在节点ora101上了，可以判断出节点ora102已经脱离了集群。

查看ip配置。。。

[root@ora102 ~]# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.0.44 ora101

192.168.0.45 ora102

192.168.0.46 ora101-vip

192.168.0.47 ora102-vip

192.168.0.48 ora-cluster-scan

172.168.56.101 ora101-priv

172.168.56.102 ora102-priv

查看节点的ip情况，发现只有物理ip（192.168.0.45 ）了。

[root@ora102 ~]# ip a

1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo