Paladin Consultants, LLC

"Good Information is just good business"

JA slide show
Home Resources Links And Tools
Print E-mail

All replication operations are based on the principle that the changing of data on one data source is followed by the reproduction of that new data on one or more additional distributed locations. This may be done to reduce network congestion, to partition data according to geographic or product lines, or to provide a failsafe duplication of servers for critical data.

The roles in a replication scenario are those of published, distributor, and subscriber. As we will see, one server can in some circumstances play all three roles, but must play at least one role. The publisher contains the articles or objects that are to be published. The distributor contains the data that is to be changed and promulgated. The distributor may or may not be the same server as the publisher. When it is NOT the same, it must receive changed data from the publisher before it forwards that data to the subscriber. It is the distribution database that does most of the work in a replication.

The subscriber receives the changed data from the distributor. There may be multiple subscribers. It is possible to have a configuration where the subscribers also can make changes to the data. This is known as an updating subscriber. The changes made by the updating subscriber are then sent to the publisher, onto the distributor, and then onto the other subscribers.

The items subscribed to are known as publications, which can be tablesm wueries, stored procedures, indexes, and virtually any SQ: Server object. You can also export filtered sets of data, unique to each subscriber. These data may be filtered vertically, in which case only certain rows are sent to certain subscribers, or horizontally, in which only certain columns are sent. Or you can filter in both directions, and you can extend the filtering to include joins of other tables.

Scenarios

There are 6 basic configurations possible in any replication scenario, and the choice of them is a function of the requirements of the business. They are:

  • Central publisher – Where a SQL Server is both the published and distributor of the data. This can be where a backup copy of the database is needed, or publication of master data to remote locations. With heavy server utilization and a lot of changes, this configuration is a weak choice.
  • Central Publisher with Remote Distributor – It is the same conditions as above. This will handle a heavier workload, and is a popular configuration.
  • Publishing Subscriber – The published is a distribution agent to one subscriber, then the subscriber distributes toJanuary 16, 2007eas, and on an expensive or slow connection.
  • Central Subscriber – Several publishers, one subscriber – perhaps when consolidating disparate data at a central site.
  • Multiple Subscribers or Multiple Publishers – Common tables on all servers, but each server maintains its own particular rows. Care needs to be taken not to overlap ownership.
  • Updating Subscribers – Subscriber can update a table to which it subscribes, and those updates are automatically sent to the subscriber, which then updates to other subscribers. Continuous updating is preferred, so this scenario has a lot of overhead in terms of network traffic, and data conflict resolution.

The subscriptions can be either push (managed by the publication server) win which articles are pushed to the subscribers, or pull (managed by the individual subscriber machines, in which articles are requested by the subscribers. A special case of anonymous subscriptions are common on the Internet is where anyone can subscribe and request articles. Publications are managed on whatever is the distribution server in the distribution database, which will contain all the transactions and all the information necessary to make the data transfers, publishers, articles, agents, subscribers, etc in special tables set up to monitor the process.

Agents

Replication is accomplished through SQL Server Agents which wake up and perform their tasks at programmed times. These can be seen and manipulated in the Enterprise Manager Console (Management|SQLServerAgent|Jobs)

  • Snapshot Agent – Responsible for initiating the data for any new subscriber. It synchronizer the data for new subscribers, making sure that all the data schemas on publisher and subscriber are congruent.
  • Log Reader Agent – Moves transactions from the publisher database to the distribution database each database published has its own log reader agent on the distribution server.
  • Distribution Agent – Used with a push subscription, the distribution agent is responsible for moving snapshots and transaction to the subscribers.
  • Merge Agent – In push subscriptions, the merge agent runs on the publisher, whereas in pull subscriptions, the merge agent runs on the subscriber. It is the merge agent that reconciles which data has been updated since the original initialization snapshot was made. Conflicts are resolved using a set of rules on the publisher database.

Replication Implementation

Some implementations require immediate replications, such as an airline reservation system. This is called immediate transaction consistency. To accomplish this a two phase commit is used to assure that the transactions are implemented on all servers. If they are not, then the transaction is rolled back. This type of processing requires a high speed LAN.

Other applications require a less real time synchronization. This is called latent transactional consistency, and it provides for synchronization on all the servers, though not an immediate synchronization. The degree of latency is dictated by the goodness of the connections between the servers, and the business requirements of the application. Site autonomy is also a consideration in implementing these types of replication configurations. The degree to which the individual sites are programmed to recognize each other and accept changes is also important to the application as a whole.

There are at least 6 different models of data distribution. Each type implies a different amount of latency and site autonomy.

  • Distributed transactions – All sites have the same data at all times.
  • Transactional replication with updating subscribers – Changes can be made at the local site and at the publisher, then replicated to the other subscribers. Technique combines distribution and replication.
  • Transactional replication – Data changed only at source – no conflicts can occur
  • Snapshot replication with updating subscribers – Like transactional replication with updating subscribers but the entire publication is updated to subscribers.
  • Snapshot replication – No subscriber updates permitted, and entire publication is sent to subscribers with updates.
  • Merge replication – Sites make update to their data, then update the publisher.

 

Snapshot replication is simply a copy of each table in a publication. That copy then is sent in its entirety to the subscribers. A schema and datafile are created and transmitted. It takes relatively a lot of bandwidth, but is simple to implement.

Transactional replication captures transactions from the transaction log and forwards it to the subscribers. It is an efficient method because only the minimum amount of data is moved. Updates are instantaneous, or at programmed intervals.

Merged replication allows updates at all sites, then the updated and added data is merged at the publishing site and to all subscribers. Because of multiple update possibilities, there are some times when conflicts will have to be resolved. This can be accomplished in a stored procedure that contains the logical workflow of the business rules. Transactional and merge replications are subject to row size limitations of 6000 and 8000 bytes.

In order to resolve conflicts and assure proper merging, timestamp information in each replicated table is advised. Additionally, unique identifier fields, and GUID columns are frequently used.

Methodology

  1. Replication scripts are all setup the same. Create a distributor to do the publishing. (It may be the same machine as the publisher)
  2. Setup Publishing
  3. Create publishing articles for distribution.
  4. Define subscribers, and subscribe.

The way to do these things is extremely easy. The scripts are created and managed from the Enterprise Manager under the Tools|Replication|Create and Manage Publications. There are wizards there that will walk you through the process.

After you have set up the replication scripts, you must then test and debug them, monitor them to ensure that the scripts you wrote do what you expect them to do. Again, the Enterprise Manager is the best tool. Look under the Replication branch in the console tree for Publishers, Agents, and Replication alerts. In addition to the Enterprise Manager, there are some system stored procedures which allow you to check up on various aspects of the replication process. Among the most important are these:

sp_helppublication ,
sp_helparticle,
sp_helpdistributor,
sp_helpsubscriberinfo,
sp_helpsubscription,
sp_helpreplcounters, and
sp_publicationvalidation.

Finally, the NT Performance Monitor will show activity witfrom the SQL Server objects added there on SQL Server installation. These objects are:

SQLServer:Replication Agents,
SQLServer:Replication Dist,
SQLServer:Replication Logreader,
SQLServer:Replication Merge, and
SQLServer:Replication Snapshot.

 

Resources

 

This resource discusses replication project at length.

This resource discusses two way replication.

Metro NY / NJ SQL Server Consultants

We specialize is custom database software. Call us for a free consultation (973) 635 0080

 

Computer Consultants: Database Development
Computer Programming: Web Site Services
Computer Programming: Custom Software
Computer Consulting: IT Consulting
Paladin Consultants, LLC Home Page
Computer Consulting: Contact Us

 

 
Bookmark and Share