Wednesday, February 11, 2009

SQL 2008 Replication - debugging distributor agent error message

Check details of the replication error using replication monitor. For example, in case of a general no row found error at subscriber, we would find error description of something like below -

Command attempted:
if @@trancount > 0 rollback tran
(Transaction sequence number: 0x000019A0000032DF000800000000, Command ID: 1)

Error messages:
• The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598
The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)
Get help: http://help/20598
Error doesn’t give the name of the table or the command text that resulted in this failure. Below are two ways to get this
information –

1. sp_browsereplcmds

In order to use this, we will need to know values for @xact_seqno_start ,@xact_seqno_end ,@publisher_database_id ,@article_id ,@command_id
@xact_seqno_start and @xact_seqno_end would be the same as we would be using @command_id.

Error message doesn’t give us the @publisher_database_id or @article_id, so we need to find it out.

Below querries should give us @article_id and @publisher_database_id

use distribution

go

select * from MSrepl_commands
where command_id = 1 /* from replication monitor*/
go
select * from MSpublisher_databases

once we get all variables we can execute the proc

sp_browsereplcmds @xact_seqno_start = '0x000019a0000032e50008'
,@xact_seqno_end = '0x000019a0000032e50008'
,@publisher_database_id = 1
,@article_id = 1
,@command_id= 1

In the above example, I get below value for command column -

{CALL [dbo].[sp_MSupd_dboPerson] (,'test',,,,,,,,,72328,0x0200)}

Person is the table name (part of the replication Proc name), 'test' is the changed value of second column for Primary Key Value of 72328

2.Using ‘OutputVerboseLevel’ and ‘Output’ options of Distribution agent

We can set distribution agent to log to a file with ‘OutputVerboseLevel’ value of ‘2’. This will log both error messages as progress report. I would suggest using this for debugging purpose only. ‘0’ doesn’t
seem to give detailed error message in SQL 2008 (RTM).
For the Distribution Job ‘Run Agent’ step add below two options
-Output C:\ReplDistbOutput.txt -OutputVerboseLevel 2

Here’s the portion of the log file -

2009-02-12 00:49:58.668 Last transaction timestamp: 0x000019a0000032df000800000000
Transaction seqno: 0x000019a0000032e50008
Command Id: 1
Partial: 0
Type: 30
Command: <>
2009-02-12 00:49:58.684 sp_MSget_repl_commands timestamp returned: 0x0x000019a0000032e500085a87a1e7, 2, local rowcount: 2
2009-02-12 00:49:58.699
42000 The row was not found at the Subscriber when applying the replicated command. 20598
2009-02-12 00:49:58.699 sp_MSget_repl_commands timestamp value is: 0x0x000019a0000032e5000800000000
2009-02-12 00:49:58.715
42000 The row was not found at the Subscriber when applying the replicated command. 20598
2009-02-12 00:49:58.762
Failed command = {CALL [dbo].[sp_MSupd_dboPerson] (,?,,,,,,,,,?,0x0200)} {CALL [dbo].[sp_MSupd_dboPerson] (,?,,,,,,,,,?,0x0200)}
2009-02-12 00:49:58.777 Parameterized values for above command(s): {{'dl1fh1l', 72328}, {'dl1fh111l', 72328}}
2009-02-12 00:49:58.809 Disconnecting from Subscriber 'SHAMSH\ECM1'
2009-02-12 00:49:58.824 Disconnecting from OLE DB Subscriber 'SHAMSH\ECM1'

2 comments:

Admir said...

All right
We found what transaction is puling
down the replication, But what We can do to remove that transaction or command so replication can Continue normaly

aSQLGuy said...

Admir,

In the above example, you will have to create a record at the subscriber table for the primary key value in question.