Lately, I have had the chance to work on the replication topology a lot. Few of our customers had implemented one or other topology. And, as I have said already, most of my blog post comes from my weekend on-call. Well, I am the on-call guy for my team, I work during Saturday and Sunday all the time. Last weekend one of our customers paged us as they are seeing a message publication is not available because it has not been fully generated.
Let me explain what was the case here. The customer has a very large database (VLDB), sized little over 2 Terabyte. The transactional replication was tested and implemented in QA environment already, it was working fine. The client has decided to implement this in the Production environment. And, that is where the issue has surfaced.
I was being paged and asked to join the conference call. The client has explained the issue and we have started troubleshooting. I have asked the customer to run the query to check publication status and to see what it return, the output will have a status column which should return values like 0 (inactive), 1 (Subscribed) or 2 (Active).
select publisher_database_id, publisher_id, publisher_db, publication_id,article_id, subscriber_id, subscriber_db, status from MSsubscriptions
The output was 3, that sounds like a corrupt publication/replication configuration to me. And I have advised them to drop and recreate everything. That’s when the customer told me that exact same configuration is working fine in QA environment. I have then started comparing the Publication properties for both the environments – The Production and The QA. It turned out that the issue was with the Publication Access.
For some reason, the client has forgotten to add service accounts to publication access. This was not allowing the agent jobs to read the snapshots and logs. As soon as I have added service accounts to publication access list the replication starts progressing. The publication status was also changed to 2.
The moral of the story is, having a checklist is a good idea when configuring a replication topology. Also, we need to keep patience, especially, when a database is vldb.
I assume you have like reading this blog post so far. In that case, you may also want to explore few other tips and tricks I have shared on High Availability and Disaster Recovery topic here.
photo credit: rawpixel.com Business people syncing data by mobile phone via photopin (license)