The following message was being repeatedly reported in our alert log:
2012-12-05 04:44:36.067 [client(10553)]CRS-10051:CVU found following errors with Clusterware setup : PRVF-5408 : NTP Time Server "22.214.171.124" is common only to the following nodes "ora005" PRVF-5408 : NTP Time Server "126.96.36.199" is common only to the following nodes "ora005"
Certainly a worrying message -- imagine the mess that the RAC database / cluster could get itself into if the system time is not synchronised!
The first step was to confirm that the nodes in the cluster were indeed using different NTP time servers. As the root user, the
/usr/sbin/ntpq -pn command can be used to list all of the known time servers, and indicate which is in use:
[root@ora005 ~]# /usr/sbin/ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== 127.127.1.0 .LOCL. 10 l 17 64 377 0.000 0.000 0.001 +188.8.131.52 184.108.40.206 2 u 977 1024 377 38.428 6.279 5.851 *220.127.116.11 18.104.22.168 2 u 3 1024 377 35.373 -1.448 0.805
And on the other node in the cluster:
[root@ora006 ~]# /usr/sbin/ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== 127.127.1.0 .LOCL. 10 l 49 64 377 0.000 0.000 0.001 *22.214.171.124 126.96.36.199 2 u 415 1024 377 34.758 -0.536 0.156 +188.8.131.52 184.108.40.206 3 u 431 1024 377 4.487 2.902 0.271
The "*" in the first column indicates the timeserver being used for synchronization. In the case above, we can see that one node is using
220.127.116.11 and the other node is using
18.104.22.168, hence the reason for the error message. The most obvious solution at this point is to contact your system administrator to configure both nodes in the cluster to use the same time server.
It is worth noting that there can be any number of time servers in use within an organisation, with each time server obtaining it's own time from another server (describing a hierarchy of time servers). Provided that both nodes in the cluster derive their time from a common time-server within the hierarchy then the error can be safely ignored.
/usr/sbin/ntptrace -n command can be used to display the hierarchy time servers, as shown below. Each server is given a stratum number, which indicates it's position in the hierarchy so we can see that the server with a stratum of 3 obtains it from the server with a stratum of 2, which itself obtains the time from the server with a stratum of 1 (the root server).
[root@ora005 ~]# /usr/sbin/ntptrace -n 127.0.0.1: stratum 3, offset 0.002072, synch distance 0.157163 22.214.171.124: stratum 2, offset -0.000556, synch distance 0.102195 126.96.36.199: stratum 1, offset -0.000587, synch distance 0.00607, refid 'GPS'
If the output from both nodes show that the time is being obtained from a common server and you are unable to change your configuration then you can safely disable the CVU checks via
srvctl disable cvu [-n <node_name>]
Note: Disabling the CVU service will prevent all cluster checks from being performed, not just time server checks. This really is a solution of last resort