Sometimes last week, one of my team member has an interesting error which reads like Cluster resource ‘SQL Network Name’ of type ‘Network Name’ in clustered role ‘MSSQLSERVER’ failed. And troubleshooting this error was really an interesting, and easy too. We were able to locate exact entries in the logs.
Log Name: System Source: Microsoft-Windows-Kernel-Power Date: 8/25/2017 12:24:34 AM Event ID: 41 Task Category: (63) Level: Critical Keywords: (2) User: SYSTEM Computer: SQL2K1601.SSCITATION.COM Description: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
What has happened?
One of the node in the cluster has unexpected shutdown, and this very cluster was not in operation for the next 11 minutes until the node1 come back (the same node which has unexpected shutdown). By default, and design, the behavior of cluster is to fail-over the resources to another node when something like this happens. But in fact, it did not happened. When a ticket was assigned to us, what we have found is the SQL Cluster Network Name was in off-line state and it won’t come up when we tried to bring it online. Below is what is written in the logs of Node2.
Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 8/25/2017 12:35:10 AM Event ID: 1227 Task Category: Network Name Resource Level: Error Keywords: User: SYSTEM Computer: SQL2K1602.SSCITATION.com Description: Network Name resource 'SQL Network Name (SQL2K16SQL)' (with associated network name SQL2K16SQL') has Kerberos Authentication support enabled. Failed to add required credentials to the LSA - the associated error code is '1068'.
Resolving the issue Cluster resource ‘SQL Network Name’ of type ‘Network Name’ in clustered role ‘MSSQLSERVER’ failed
During our investigation we have noticed two things:
- The DHCP client service was disabled on Node2
- Which cause Network Location Awareness service to not to start
As soon as we have fixed this two issue and tested the fail-over and fail-back it was all good. The Cluster started to functioning as normal as it was expected.
Side Note: You may want to read an article about CLIUSR and may want to try out it, its the cluster account that we can use to manage and run Cluster Services.