基于Python的SQL Server数据库对象同步轻量级实现

程序员文章站 2022-03-22 12:54:10

缘由日常工作中经常遇到类似的问题：把某个服务器上的某些指定的表同步到另外一台服务器。类似需求用SSIS或者其他ETL工作很容易实现，比如用SSIS的话，就会会存在相当一部分反复的手工操作。建源的数据库信息，目标的数据库信息，如果是多个表，需要一个一个地拉source和target，然后一个一个地m ......

缘由

日常工作中经常遇到类似的问题：把某个服务器上的某些指定的表同步到另外一台服务器。
类似需求用ssis或者其他etl工作很容易实现，比如用ssis的话，就会会存在相当一部分反复的手工操作。
建源的数据库信息，目标的数据库信息，如果是多个表，需要一个一个地拉source和target，然后一个一个地mapping，然后运行实现数据同步。
然后很可能，这个workflow使用也就这么一次，就寿终正寝了，却一样要浪费时间去做这个etl。

快速数据同步实现

于是在想，可不可能快速实现类似需求，尽最大程度减少重复的手工操作？类似基于命令行的方式，简单快捷，不需要太多的手动操作。
于是就有了本文，基于python（目的是顺便熟悉一下python的语法），快速实现sql server的数据库之间的数据同步操作，
后面又稍微扩展了一下，可以实现不同服务器的数据库之间的表结构，表对应的数据，存储过程，函数，用户自定义类型表（user define table type）的同步
目前支持在两个sql server数据源之间：每次同步一张或者多张表/存储过程，也可以同步整个数据库的所有表/存储过程（以及表/存储过程依赖的其他数据库对象）。
支持sqlserver2012以上版本

需要考虑到一些基本的校验问题：在源服务器上，需要同步的对象是否存在，或者输入的对象是否存在于源服务器的数据库里。

在目标服务器上，对于表的同步：
1，表的存在依赖于schema，需要考虑到表的schema是否存在，如果不存在先在target库上创建表对应的schema
2，target表中是否有数据？如果有数据，是否以覆盖的方式执行
对于存储过程的同步：
1，类似于表，需要考虑存储过程的schema是否存在，如果不存在先在target库上创建表对应的schema
2，类似于表，arget数据库中是否已经存在对应的存储过程，是否以覆盖的方式执行
3，存储过程可能依赖于b表，某些函数，用户自定义表变量等等，同步存储过程的时候需要先同步依赖的对象，这一点比较复杂，实现过程中遇到在很多很多的坑
　可能存在对象a依赖于对象b，对象b依赖于对象c……，这里有点递归的意思
　这一点导致了重构大量的代码，一开始都是直来直去的同步，无法实现这个逻辑，切实体会到代码的“单一职责”原则

参数说明
参数说明如下，大的包括三类：
1，源服务器信息（服务器地址，实例名，数据库名称，用户名，密码），没有用户名密码的情况下，使用windows身份认证模式
2，目标服务器信息（服务器地址，实例名，数据库名称，用户名，密码），没有用户名密码的情况下，使用windows身份认证模式
3，同步的对象类型以及对象
4，同步的对象在目标服务器上存在的情况下，是否强制覆盖

基于Python的SQL Server数据库对象同步轻量级实现

其实在同步数据的时候，也可以把需要同步的行数提取出来做参数，比较简单，这里暂时没有做。
比如需要快速搭建一个测试环境，需要同步所有的表结构和每个表的一部分数据即可。

表以及数据同步

表同步的原理是，创建目标表，遍历源数据的表，生成insert into values(***),(***),(***)格式的sql，然后插入目标数据库，这里大概步骤如下：
1，表依赖于schema，所以同步表之前先同步schema
2，强制覆盖的情况下，会drop掉目标表（如果存在的话），方式目标表与源表结构不一致，非强制覆盖的情况下，如果字段不一致，抛出异常
3，同步表结构，包括字段，索引，约束等等，但是无法支持外键，刻意去掉了外键，想想为什么？因吹斯汀。
4，需要筛选出来非计算列字段，insert语句只能是非计算列字段（又导致重构了部分代码）
5，转义处理，在拼凑sql的时候，需要进行转义处理，否则会导致sql语句错误，目前处理了字符串中的’字符，二进制字段，时间字段的转义或者其他处理
6，鉴于insert into values(***),(***),(***)语法上允许的最大值是1000，因此每生成1000条数据，就同步一次
7，自增列的identity_insert 标识打开与关闭处理

使用如下参数，同步源数据库的三张表到目标数据库，因为这里是在本机命名实例下测试，因此实例名和端口号输入

python syncdatabaseobject.py -s_h="127.0.0.1" -s_i="sql2017" -s_p=49744 -s_d=db01 -t_h=127.0.0.1 -t_i="sql2017" -t_p=49744 -t_d="db02" -obj_type="tab" -obj="[dbo].[table01],schema1.table01,schema2.table01" -f="y"

执行同步的效果

基于Python的SQL Server数据库对象同步轻量级实现

说明：
1，如果输入obj_type="tab" 且-obj=为none的情况下，会同步源数据库中的所有表。
2，这个效率取决于机器性能和网络传输，本机测试的话，每秒中可以提交3到4次，也就是每秒钟可以提交3000~4000行左右的数据。

已知的问题：
1，当表的索引为filter index的时候，无法生成包含where条件的索引创建语句，那个看起来蛋疼的表结构导出语句，暂时没时间改它。
2，暂时不支持其他少用的类型字段，比如地理空间字段什么的。

存储过程对象的同步

存储过程同步的原理是，在源数据库上生成创建存储过程的语句，然后写入目标库，这里大概步骤如下：
1，存储过程依赖于schema，所以同步存储过程之前先同步schema（同表）
2，同步的过程会检查依赖对象，如果依赖其他对象，暂停当前对象同步，先同步依赖对象
3，重复第二步骤，直至完成
4，对于存储过程的同步，如果是强制覆盖的话，强制覆盖至对存储过程自己生效，对依赖对象并不生效，如果依赖对象不存在，就创建，否则不做任何事情

使用如下参数，同步源数据库的两个存储过程到目标数据库，因为这里是在本机命名实例下测试，因此实例名和端口号输入

python syncdatabaseobject.py -s_h="127.0.0.1" -s_i="sql2017" -s_p=49744 -s_d=db01 -t_h=127.0.0.1 -t_i="sql2017" -t_p=49744 -t_d="db02" -obj_type="sp" -obj="[dbo].[sp_test01],[dbo].[sp_test02]" -f="y"

说明：测试要同步的存储过程之一为[dbo].[sp_test01]，它依赖于其他两个对象：dbo.table01和dbo.fn_test01()

create proc [dbo].[sp_test01]
as
begin
 set nocount on;
 delete from dbo.table01 where id = 1000
 select dbo.fn_test01()
end

而dbo.fn_test01()的如下，依赖于另外一个对象：dbo.table02

create function [dbo].[fn_test01]
(
)
returns int
as
begin 
 declare @count int = 0
 select @count = count(1) from dbo.table02
 return @count
end

因此，这个测试的[dbo].[sp_test01]就依赖于其他对象，如果其依赖的对象不存在，同步的时候，仅仅同步这个存储过程本身，是没有意义的
同步某一个对象的依赖对象，使用过如下sql查出来的，因此这里就层层深入，同步依赖对象。
这里就类似于同步a的时候，a依赖于b和c，然后停止同步a，先同步b和c，同步b或者c的时候，可能又依赖于其他对象，然后继续先同步其依赖对象。

基于Python的SQL Server数据库对象同步轻量级实现

效果如下

基于Python的SQL Server数据库对象同步轻量级实现

如果输入obj_type="sp" 且-obj=为none的情况下，会同步源数据库中的所有存储过程以及其依赖对象

已知的问题：
1，加密的存储过程或者函数是无法实现同步的，因为无法生成创建对象的脚本
1，table type的同步也是一个蛋疼的过程，目前支持，但是支持的并不好，原因是创建table type之前，先删除依赖于table type的对象，否则无法删除与创建。

特别说明

依赖对象的解决，还是比较蛋疼的
如果在默认schema为dbo的对象，在存储过程或者函数中没有写schema（参考如下修改后的sp，不写相关表的schema dbo，dbo.test01==>test01），
使用 sys.dm_sql_referenced_entities这个系统函数是无法找到其依赖的对象的,奇葩的是可以找到schema的类型，却没有返回对象本身。
这一点导致在代码中层层深入，进行了长时间的debug，完全没有想到这个函数是这个鸟样子，因为这里找到依赖对象的类型，却找不到对象本身，次奥！！！
另外一种情况就是动态sql了，无法使用 sys.dm_sql_referenced_entities这个系统函数找到其依赖的对象。

基于Python的SQL Server数据库对象同步轻量级实现

其他对象的同步

　　支持其他数据库对象的同步，比如function，table type等，因为可以在同步其他存储过程对象的时候附带的同步function，table type，这个与表或者存储过程类似，不做过多说明。　

已知问题：

1，201906122030：经测试，目前暂时不支持sequence对象的同步。

需要改进的地方

1，代码结构优化，更加清晰和条例的结构（一开始用最直接简单粗暴的方式快速实现，后面重构了很多代码，现在自己看起来还有很多不舒服的痕迹）
2，数据同步的效率问题，对于多表的导入导出操作，依赖于单线程，多个大表导出串行的话，可能存在效率上的瓶颈，如何根据表的数据量，尽可能平均地分配多多个线程中，提升效率
3，更加友好清晰的异常提示以及日志记录，生成导出日志信息。
4，异构数据同步，mysql《==》sql server《==》oracle《==》pgsql

代码端午节写好了，这几天抽空进行了一些测试以及bug fix，应该还潜在不少未知的bug，工作量比想象中的大的多了去了。

# -*- coding: utf-8 -*-
# !/usr/bin/env python3

__author__ = 'mssql123'
__date__ = '2019-06-07 09:36'

import os
import sys
import time
import datetime
import pymssql
from decimal import decimal

usage = '''
         -----parameter explain-----
         
         source database parameter
         -s_h           : soure database host                                          ----- must require parameter
         -s_i           : soure database instace name                                  ----- default instance name mssql
         -s_d           : soure database name                                          ----- must require parameter
         -s_u           : soure database login                                         ----- default windows identifier
         -s_p           : soure database login password                                ----- must require when s_u is not null
         -s_p           : soure database instance port                                 ----- default port 1433
         
         target database parameter
         -t_h           : target database host                                         ----- must require parameter
         -t_i           : target database instace name                                 ----- default instance name mssql
         -t_d           : target database name                                         ----- must require parameter
         -t_u           : target database login                                        ----- default windows identifier
         -t_p           : target database login password                               ----- must require when s_u is not null
         -t_p           : target database instance port                                ----- default port 1433

         sync object parameter
         -obj_type      : table or sp or function or other databse object              ----- tab or sp or fn or tp
         -obj           : table|sp|function|type name                                  ----- whick table or sp sync
         
         overwirte parameter
         -f             : force overwirte target database object                           ----- f or n 


         --help: help document
         example:
         python datatransfer.py -s_h=127.0.0.1 -s_p=1433 -s_i="mssql" -s_d="db01" -obj_type="tab"  -obj="dbo.t1,dbo.t2"  -t_h=127.0.0.1 -t_p=1433 -t_i="mssql" -t_d="db02"  -f="y"                   
         python datatransfer.py -s_h=127.0.0.1 -s_p=1433 -s_i="mssql" -s_d="db01" -obj_type="sp"  -obj="dbo.sp1,dbo.sp2" -t_h=127.0.0.1 -t_p=1433 -t_i="mssql" -t_d="db02"  -f="y"
         '''

class syncdatabaseobject(object):
    # source databse
    s_h = none
    s_i = none
    s_p = none
    s_u = none
    s_p = none
    s_d = none

    # obj type
    s_obj_type = none
    # sync objects
    s_obj = none

    # target database
    t_h = none
    t_i = none
    t_p = none
    t_u = none
    t_p = none
    t_d = none

    f = none

    file_path = none

    def __init__(self, *args, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)



    # connect to sqlserver
    def get_connect(self, _h, _i, _p, _u, _p, _d):
        cursor = false
        try:
            if ( _u) and (_p):
                conn = pymssql.connect(host=_h,
                                       server=_i,
                                       port=_p,
                                       user = _u,
                                       password = _p,
                                       database=_d)
            else:
                conn = pymssql.connect(host=_h,
                                       server=_i,
                                       port=_p,
                                       database=_d)
            if (conn):
                return conn
        except:
            raise
        return conn

    # check connection
    def validated_connect(self, _h, _i, _p, _u,_p, _d):
        if not (self.get_connect(_h, _i, _p,_u,_p, _d)):
            print("connect to " + str(_h) + " failed,please check you parameter")
            exit(0)


    '''
    this is supposed to be a valid object name just like xxx_name,or dbo.xxx_name,or [schema].xxx_name or schema.[xxx_name]
    then transfer this kind of valid object name to format object name like [dbo].[xxx_name](give a default dbo schema name when no schema name)
    other format object name consider as unvalid,will be rasie error in process
    format object name
        1,xxx_name              ======> [dbo].[xxx_name]
        2,dbo.xxx_name          ======> [dbo].[xxx_name]
        3,[schema].xxx_name     ======> [dbo].[xxx_name]
        3,schema.xxx_name       ======> [schema].[xxx_name]
        4,[schema].[xxx_name]   ======> [schema].[xxx_name]
        5,[schema].[xxx_name    ======> rasie error format message
    '''
    @staticmethod
    def format_object_name(name):
        format_name = ""
        if ("." in name):
            schema_name = name[0:name.find(".")]
            object_name = name[name.find(".") + 1:]
            if not ("[" in schema_name):
                schema_name = "[" + schema_name + "]"
            if not ("[" in object_name):
                object_name = "[" + object_name + "]"
            format_name = schema_name + "." + object_name
        else:
            if ("[" in name):
                format_name = "[dbo]." + name
            else:
                format_name = "[dbo]." + "[" + name + "]"
        return format_name

    '''
    check user input object is a valid object
    '''
    def exits_object(self, conn, name):
        conn = conn
        cursor_source = conn.cursor()
        # get object by name from source db
        sql_script = r'''select top 1 1 from
                                (
                                    select  concat(quotename(schema_name(schema_id)),'.',quotename(name)) as obj_name from sys.objects  
                                    union all
                                    select  concat(quotename(schema_name(schema_id)),'.',quotename(name)) as obj_name  from sys.types  
                                )t where obj_name = '{0}'
                                 '''.format(self.format_object_name(name))
        cursor_source.execute(sql_script)
        result = cursor_source.fetchall()
        if not result:
            return 0
        else:
            return 1
        conn.cursor.close()
        conn.close()

    # table variable sync
    def sync_table_variable(self, tab_name,is_reference):
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()

        if (self.exits_object(conn_source, self.format_object_name(tab_name))) > 0:
            pass
        else:
            print("----------------------- warning message -----------------------")
            print("--------warning: object " + tab_name + " not existing in source database ------------")
            print("----------------------- warning message -----------------------")
            print()
            return

        exists_in_target = 0
        sql_script = r'''select top 1 1
                             from sys.table_types tp
                             where is_user_defined = 1 
                             and concat(quotename(schema_name(tp.schema_id)),'.',quotename(tp.name)) = '{0}' ''' \
            .format((self.format_object_name(tab_name)))
        # if the table schema exists in target server,skip
        cursor_target.execute(sql_script)
        exists_in_target = cursor_target.fetchone()

        # weather exists in target server database
        if (self.f == "y"):
            if (is_reference != "y"):
                # skiped,table type can not drop when used by sp
                sql_script = r'''
                                         if object_id('{0}') is not null
                                             drop type {0}
                                     '''.format(self.format_object_name(tab_name))
                cursor_target.execute(sql_script)
                conn_target.commit()
        else:
            if exists_in_target:
                print("----------------------- warning message -----------------------")
                print("the target table type " + tab_name + " exists ,skiped sync table type from source")
                print("----------------------- warning message -----------------------")
                print()
                return

        sql_script = r'''      
                                   declare @sql nvarchar(max) = ''

                                   select @sql =
                                   'create type ' + '{0}' + 'as  table' + char(13) + '(' + char(13) +
                                   stuff((
                                                   select char(13) + '    , [' + c.name + '] ' +
                                                       case when c.is_computed = 1
                                                           then 'as ' + object_definition(c.[object_id], c.column_id)
                                                           else
                                                               case when c.system_type_id != c.user_type_id
                                                                   then '[' + schema_name(tp.[schema_id]) + '].[' + tp.name + ']'
                                                                   else '[' + upper(y.name) + ']'
                                                               end  +
                                                               case
                                                                   when y.name in ('varchar', 'char', 'varbinary', 'binary')
                                                                       then '(' + case when c.max_length = -1


                                                                                       then 'max'
                                                                                       else cast(c.max_length as varchar(5))
                                                                                   end + ')'
                                                                   when y.name in ('nvarchar', 'nchar')
                                                                       then '(' + case when c.max_length = -1
                                                                                       then 'max'
                                                                                       else cast(c.max_length / 2 as varchar(5))
                                                                                   end + ')'
                                                                   when y.name in ('datetime2', 'time2', 'datetimeoffset')
                                                                       then '(' + cast(c.scale as varchar(5)) + ')'
                                                                   when y.name = 'decimal'
                                                                       then '(' + cast(c.[precision] as varchar(5)) + ',' + cast(c.scale as varchar(5)) + ')'
                                                                   else ''
                                                               end +
                                                               case when c.collation_name is not null and c.system_type_id = c.user_type_id
                                                                   then ' collate ' + c.collation_name
                                                                   else ''
                                                               end +
                                                               case when c.is_nullable = 1
                                                                   then ' null'
                                                                   else ' not null'
                                                               end +
                                                               case when c.default_object_id != 0
                                                                   then ' constraint [' + object_name(c.default_object_id) + ']' +
                                                                           ' default ' + object_definition(c.default_object_id)
                                                                   else ''
                                                               end
                                                       end
                                                   from sys.table_types tp
                                                   inner join sys.columns c on c.object_id = tp.type_table_object_id
                                                   inner join sys.types y on y.system_type_id = c.system_type_id 
                                                   where tp.is_user_defined = 1 and y.name<>'sysname'
                                                   and concat(quotename(schema_name(tp.schema_id)),'.',quotename(tp.name)) = '{0}'
                                                   order by c.column_id

                                   for xml path(''), type).value('.', 'nvarchar(max)'), 1, 7, '      ')
                                       + ');'

                                   select @sql  as  script
                               '''.format(self.format_object_name(self.format_object_name((tab_name))))
        cursor_target = conn_target.cursor()
        cursor_source.execute(sql_script)
        row = cursor_source.fetchone()
        try:
            if not exists_in_target:
                # execute the script on target server
                cursor_target.execute(str(row[0]))  # drop current stored_procudre if exists
                conn_target.commit()
                print("*************table type " + self.format_object_name(tab_name) + " synced *********************")
                print()  # give a blank row when finish
        except:
            print("----------------------- error message -----------------------")
            print("-----------table type " + self.format_object_name(tab_name) + " synced error ---------------")
            print("----------------------- error message -----------------------")
            print()
            # raise



        cursor_source.close()
        conn_source.close()
        cursor_target.close()
        conn_target.close()

    #schema sync
    def sync_schema(self):
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p,self.s_u,self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p,self.t_u,self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()
        arr_schema = []
        # get all table in database when not define table name
        schema_result = cursor_source.execute(r'''
                                                  select name from sys.schemas where schema_id>4 and schema_id<16384
                                                ''')
        for row in cursor_source.fetchall():
            cursor_target.execute(r''' if not exists(select * from sys.schemas where name = '{0}')
                                                  begin
                                                      exec('create schema [{0}]')
                                                  end
                                              '''.format(str(row[0])))
            conn_target.commit()

        cursor_source.close()
        conn_source.close()
        cursor_target.close()
        conn_target.close()


    def sync_table_schema_byname(self, tab_name,is_reference):
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()

        if (self.exits_object(conn_source, self.format_object_name(tab_name))) == 0:
            print("----------------------- warning message -----------------------")
            print("---------------warning: object " + tab_name + " not existing in source database ----------------")
            print("----------------------- warning message -----------------------")
            print()

        # if exists a reference table for sp,not sync the table agagin
        if (self.exits_object(conn_target, self.format_object_name(tab_name))) > 0:
            if (is_reference == "y"):
                return
            if (self.f != "y"):
                print("----------------------- warning message -----------------------")
                print("---------------warning: object " + tab_name + " not existing in source database ----------------")
                print("----------------------- warning message -----------------------")
                print()

        sql_script = ''' select quotename(s.name)+'.'+ quotename(o.name)
                         from sys.objects o with (nowait)
                        join sys.schemas s with (nowait) on o.[schema_id] = s.[schema_id]
                        where  o.[type] = 'u'
                        and o.is_ms_shipped = 0
                        and quotename(s.name)+'.'+ quotename(o.name) = '{0}'
                    '''.format(self.format_object_name(tab_name))
        cursor_source.execute(sql_script)
        row = cursor_source.fetchone()

        sql_script = r''' select top 1 1 from sys.tables 
                                     where type_desc = 'user_table' 
                                     and concat(quotename(schema_name(schema_id)),'.',quotename(name)) = '{0}' 
                                  '''.format((self.format_object_name(tab_name)))
        # if the table schema exists in target server,skip
        cursor_target.execute(sql_script)
        exists_in_target = cursor_target.fetchone()
        if exists_in_target:
            if (self.f == "y"):
                if (is_reference != "y"):
                    cursor_target.execute("drop table {0}".format(tab_name))
            # print("----------------------- warning message -----------------------")
            # print("the target table type " + tab_name + " exists ,skiped sync table type from source")
            # print("----------------------- warning message -----------------------")
        else:
            sql_script = r'''      declare 
                                          @object_name sysname
                                        , @object_id int

                                    select 
                                          @object_name = '[' + s.name + '].[' + o.name + ']'
                                        , @object_id = o.[object_id]
                                    from sys.objects o with (nowait)
                                    join sys.schemas s with (nowait) on o.[schema_id] = s.[schema_id]
                                    where quotename(s.name) + '.' + quotename(o.name) = '{0}'
                                        and o.[type] = 'u'
                                        and o.is_ms_shipped = 0

                                    declare @sql nvarchar(max) = ''

                                    ;with index_column as 
                                    (
                                        select 
                                              ic.[object_id]
                                            , ic.index_id
                                            , ic.is_descending_key
                                            , ic.is_included_column
                                            , c.name
                                        from sys.index_columns ic with (nowait)
                                        join sys.columns c with (nowait) on ic.[object_id] = c.[object_id] and ic.column_id = c.column_id
                                        where ic.[object_id] = @object_id
                                    ),
                                    fk_columns as 
                                    (
                                         select 
                                              k.constraint_object_id
                                            , cname = c.name
                                            , rcname = rc.name
                                        from sys.foreign_key_columns k with (nowait)
                                        join sys.columns rc with (nowait) on rc.[object_id] = k.referenced_object_id and rc.column_id = k.referenced_column_id 
                                        join sys.columns c with (nowait) on c.[object_id] = k.parent_object_id and c.column_id = k.parent_column_id
                                        where k.parent_object_id = @object_id
                                    )
                                    select @sql = 'create table ' + @object_name + '' + '(' + '' + stuff((
                                        select '' + ', [' + c.name + '] ' + 
                                            case when c.is_computed = 1
                                                then 'as ' + cc.[definition] 
                                                else upper(tp.name) + 
                                                    case when tp.name in ('varchar', 'char', 'varbinary', 'binary', 'text')
                                                           then '(' + case when c.max_length = -1 then 'max' else cast(c.max_length as varchar(5)) end + ')'
                                                         when tp.name in ('nvarchar', 'nchar')
                                                           then '(' + case when c.max_length = -1 then 'max' else cast(c.max_length / 2 as varchar(5)) end + ')'
                                                         when tp.name in ('datetime2', 'time2', 'datetimeoffset') 
                                                           then '(' + cast(c.scale as varchar(5)) + ')'
                                                         when tp.name = 'decimal' 
                                                           then '(' + cast(c.[precision] as varchar(5)) + ',' + cast(c.scale as varchar(5)) + ')'
                                                        else ''
                                                    end +
                                                    case when c.collation_name is not null then ' collate ' + c.collation_name else '' end +
                                                    case when c.is_nullable = 1 then ' null' else ' not null' end +
                                                    case when dc.[definition] is not null then ' default' + dc.[definition] else '' end + 
                                                    case when ic.is_identity = 1 then ' identity(' + cast(isnull( /*ic.seed_value*/ 1, '0') as char(1)) + ',' + cast(isnull(ic.increment_value, '1') as char(1)) + ')' else '' end 
                                            end + ''
                                        from sys.columns c with (nowait)
                                        join sys.types tp with (nowait) on c.user_type_id = tp.user_type_id
                                        left join sys.computed_columns cc with (nowait) on c.[object_id] = cc.[object_id] and c.column_id = cc.column_id
                                        left join sys.default_constraints dc with (nowait) on c.default_object_id != 0 and c.[object_id] = dc.parent_object_id and c.column_id = dc.parent_column_id
                                        left join sys.identity_columns ic with (nowait) on c.is_identity = 1 and c.[object_id] = ic.[object_id] and c.column_id = ic.column_id
                                        where c.[object_id] = @object_id
                                        order by c.column_id
                                        for xml path(''), type).value('.', 'nvarchar(max)'), 1, 2, '' + ' ')
                                        + isnull((select '' + ', constraint [' + k.name + '] primary key (' + 
                                                        (select stuff((
                                                             select ', [' + c.name + '] ' + case when ic.is_descending_key = 1 then 'desc' else 'asc' end
                                                             from sys.index_columns ic with (nowait)
                                                             join sys.columns c with (nowait) on c.[object_id] = ic.[object_id] and c.column_id = ic.column_id
                                                             where ic.is_included_column = 0
                                                                 and ic.[object_id] = k.parent_object_id 
                                                                 and ic.index_id = k.unique_index_id     
                                                             for xml path(n''), type).value('.', 'nvarchar(max)'), 1, 2, ''))
                                                + ')' + ''
                                                from sys.key_constraints k with (nowait)
                                                where k.parent_object_id = @object_id 
                                                    and k.[type] = 'pk'), '') + ')'  + ''
                                        + isnull((select (
                                            select '' +
                                                 'alter table ' + @object_name + ' with' 
                                                + case when fk.is_not_trusted = 1 
                                                    then ' nocheck' 
                                                    else ' check' 
                                                  end + 
                                                  ' add constraint [' + fk.name  + '] foreign key(' 
                                                  + stuff((
                                                    select ', [' + k.cname + ']'
                                                    from fk_columns k
                                                    where k.constraint_object_id = fk.[object_id] and 1=2
                                                    for xml path(''), type).value('.', 'nvarchar(max)'), 1, 2, '')
                                                   + ')' +
                                                  ' references [' + schema_name(ro.[schema_id]) + '].[' + ro.name + '] ('
                                                  + stuff((
                                                    select ', [' + k.rcname + ']'
                                                    from fk_columns k
                                                    where k.constraint_object_id = fk.[object_id]
                                                    for xml path(''), type).value('.', 'nvarchar(max)'), 1, 2, '')
                                                   + ')'
                                                + case 
                                                    when fk.delete_referential_action = 1 then ' on delete cascade' 
                                                    when fk.delete_referential_action = 2 then ' on delete set null'
                                                    when fk.delete_referential_action = 3 then ' on delete set default' 
                                                    else '' 
                                                  end
                                                + case 
                                                    when fk.update_referential_action = 1 then ' on update cascade'
                                                    when fk.update_referential_action = 2 then ' on update set null'
                                                    when fk.update_referential_action = 3 then ' on update set default'  
                                                    else '' 
                                                  end 
                                                + '' + 'alter table ' + @object_name + ' check constraint [' + fk.name  + ']' + ''
                                            from sys.foreign_keys fk with (nowait)
                                            join sys.objects ro with (nowait) on ro.[object_id] = fk.referenced_object_id
                                            where fk.parent_object_id = @object_id
                                            for xml path(n''), type).value('.', 'nvarchar(max)')), '')
                                        + isnull(((select
                                             '' + 'create' + case when i.is_unique = 1 then ' unique' else '' end 
                                                    + ' nonclustered index [' + i.name + '] on ' + @object_name + ' (' +
                                                    stuff((
                                                    select ', [' + c.name + ']' + case when c.is_descending_key = 1 then ' desc' else ' asc' end
                                                    from index_column c
                                                    where c.is_included_column = 0
                                                        and c.index_id = i.index_id
                                                    for xml path(''), type).value('.', 'nvarchar(max)'), 1, 2, '') + ')'  
                                                    + isnull('' + 'include (' + 
                                                        stuff((
                                                        select ', [' + c.name + ']'
                                                        from index_column c
                                                        where c.is_included_column = 1
                                                            and c.index_id = i.index_id
                                                        for xml path(''), type).value('.', 'nvarchar(max)'), 1, 2, '') + ')', '')  + ''
                                            from sys.indexes i with (nowait)
                                            where i.[object_id] = @object_id
                                                and i.is_primary_key = 0
                                                and i.[type] = 2
                                            for xml path(''), type).value('.', 'nvarchar(max)')
                                        ), '')

                                    select @sql  as  script '''.format(self.format_object_name(tab_name))
            cursor_target = conn_target.cursor()
            cursor_source.execute(sql_script)
            row = cursor_source.fetchone()
            if not row[0]:
                return

            try:
                # print(row[0])
                # execute the script on target server
                if not exists_in_target:
                    cursor_target.execute(row[0])  # drop current table schema if exists
                    conn_target.commit()
                    print("*************schema " + self.format_object_name(tab_name) + " synced *************")
                    print()  # give a blank row when finish
            except:
                print("----------------------- warning message -----------------------")
                print("-----------schema " + self.format_object_name(tab_name) + " synced failed---------------")
                print("----------------------- warning message -----------------------")
                print()
                raise

        cursor_source.close()
        conn_source.close()
        cursor_target.close()
        conn_target.close()


    def get_table_column(self, conn, tab_name):
        column_names = ""
        conn = conn
        cursor_source = conn.cursor()
        # get object by name from source db
        sql_script = r'''select name from sys.columns 
                         where object_id = object_id('{0}') and is_computed=0 order by object_id
                                 '''.format(self.format_object_name(tab_name))
        cursor_source.execute(sql_script)
        result = cursor_source.fetchall()
        for row in result:
            column_names = column_names + row[0] + ","

        return column_names[0:len(column_names) - 1]
        conn.cursor.close()
        conn.close()

    def sync_table_schema(self):
        is_reference = "y"
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()
        arr_table = []
        if (self.s_obj):
            for tab_name in self.s_obj.split(","):
                if (tab_name) and (self.exits_object(conn_source, tab_name)) > 0:
                    self.sync_table_schema_byname(tab_name,is_reference)
        else:
            # sync all tables
            # get all table in database when not define table name
            sql_script = ''' select quotename(s.name)+'.'+ quotename(o.name)
                             from sys.objects o with (nowait)
                             join sys.schemas s with (nowait) on o.[schema_id] = s.[schema_id]
                             where  o.[type] = 'u' and o.is_ms_shipped = 0
                         '''
            cursor_source.execute(sql_script)
            for row in cursor_source.fetchall():
                self.sync_table_schema_byname(str(row[0]),is_reference)

    # sync data from soure table to target table
    def sync_table_data(self):
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()
        arr_table = []
        if (self.s_obj):
            arr_table = self.s_obj.split(',')
            for tab_name in arr_table:
                if (self.exits_object(conn_target, self.format_object_name(tab_name))) > 0:
                    if (self.f != "y"):
                        arr_table.remove(tab_name)
                        print("----------------------- warning message -----------------------")
                        print("-----------------warning: object " + tab_name + " existing in target database ---------------------")
                        print("----------------------- warning message -----------------------")
        else:
            # get all table in database when not define table name
            tab_result = cursor_source.execute(r''' select quotename(s.name)+'.'+ quotename(o.name)
                                                    from sys.objects o with (nowait)
                                                    join sys.schemas s with (nowait) on o.[schema_id] = s.[schema_id]
                                                    where  o.[type] = 'u'
                                                        and o.is_ms_shipped = 0
                                            ''')
            for row in cursor_source.fetchall():
                arr_table.append(str(row[0]))


        insert_columns = ""
        insert_columns = self.get_table_column(conn_source, tab_name)

        for tab_name in arr_table:
            if (self.f != "y"):
                sql_script = "select top 1 {0} from {1} ".format(insert_columns,tab_name)
                # if exists data in target table,break
                cursor_target.execute(sql_script)
                exists = cursor_target.fetchone()
                if exists:
                    print("----------------------- warning message -----------------------")
                    print("the target table type " + tab_name + " exists ,skiped sync table type from source")
                    print("----------------------- warning message -----------------------")
                    print()
                    continue
            else:
                sql_script = "truncate table {0} ".format(tab_name)
                # if exists data in target table,break
                cursor_target.execute(sql_script)
                conn_target.commit()

                
            insert_columns = ""
            insert_columns = self.get_table_column(conn_source, tab_name)

            insert_prefix = ""

            # weather has identity column
            cursor_source.execute(r'''select 1 from sys.columns 
                                    where object_id = object_id('{0}') and is_identity =1
                                    '''.format(tab_name))
            exists_identity = none
            exists_identity = cursor_source.fetchone()
            if (exists_identity):
                insert_prefix = "set identity_insert {0} on; ".format(tab_name)

            # data source
            insert_sql = ""
            values_sql = ""
            current_row = ""
            counter = 0
            sql_script = r''' select {0} from {1}   '''.format(insert_columns, tab_name)
            cursor_source.execute(sql_script)

            # create insert columns
            '''
            for field in cursor_source.description:
                insert_columns = insert_columns + str(field[0]) + ","
            insert_columns = insert_columns[0:len(insert_columns) - 1]
            '''
            insert_prefix = insert_prefix + "insert into {0} ({1}) values ".format(tab_name, insert_columns)

            for row in cursor_source.fetchall():
                counter = counter + 1
                for key in row:
                    if (str(key) == "none"):
                        current_row = current_row + r''' null, '''
                    else:
                        if (type(key) is datetime.datetime):
                            current_row = current_row + r''' '{0}', '''.format(str(key)[0:23])
                        elif (type(key) is str):
                            # 我槽，这里又有一个坑：https://blog.csdn.net/dadaowuque/article/details/81016127
                            current_row = current_row + r''' '{0}', '''.format(
                                key.replace("'", "''").replace('\u0000', '').replace('\x00', ''))
                        elif (type(key) is decimal):
                            d = decimal(key)
                            s = '{0:f}'.format(d)
                            current_row = current_row + r''' '{0}', '''.format(s)
                        elif (type(key) is bytes):
                            # print(hex(int.from_bytes(key, 'big', signed=true) ))
                            current_row = current_row + r''' {0}, '''.format(
                                hex(int.from_bytes(key, 'big', signed=false)))
                        else:
                            current_row = current_row + r''' '{0}', '''.format(key)

                current_row = current_row[0:len(current_row) - 2]  # remove the the last one char ","
                values_sql = values_sql + "(" + current_row + "),"
                current_row = ""

                # execute the one batch when
                if (counter == 1000):
                    insert_sql = insert_prefix + values_sql
                    insert_sql = insert_sql[0:len(insert_sql) - 1]  # remove the the last one char ","
                    if (exists_identity):
                        insert_sql = insert_sql + " ;set identity_insert {0} off;".format(tab_name)
                    try:
                        cursor_target.execute(insert_sql)
                    except:
                        print(
                            "----------------------error " + tab_name + " data synced failed-------------------------")
                        raise

                    conn_target.commit()
                    insert_sql = ""
                    values_sql = ""
                    current_row = ""
                    counter = 0
                    print(time.strftime("%y-%m-%d %h:%m:%s",
                                        time.localtime()) + "*************** " + self.format_object_name(
                        tab_name) +" "+ str(1000) + " rows synced *************")

            if (values_sql):
                insert_sql = insert_prefix + values_sql
                insert_sql = insert_sql[0:len(insert_sql) - 1]  # remove the the last one char ","
                if (exists_identity):
                    insert_sql = insert_sql + " ; set identity_insert {0} off;".format(tab_name)
                # execute the last batch
                try:
                    cursor_target.execute(insert_sql)
                except:
                    print("------------------error " + tab_name + " data synced failed------------------------")
                    raise

                conn_target.commit()
                insert_sql = ""
                values_sql = ""
                current_row = ""
                print(time.strftime("%y-%m-%d %h:%m:%s",
                                    time.localtime()) + "*************** " + self.format_object_name(tab_name) +" "+ str(
                    counter) + " rows synced *************")
                print(time.strftime("%y-%m-%d %h:%m:%s",
                                    time.localtime()) + "----------------synced " + self.format_object_name(tab_name) + " data finished---------------")
                print()
        cursor_source.close()
        conn_source.close()
        cursor_target.close()
        conn_target.close()

    def sync_dependent_object(self, obj_name):
        #如果是因为依赖对象而被同步的，先检查target中是否存在，如果存在就不继续同步，这里打一个标记来实现
        is_refernece = "y"
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()
        '''
        find dependent objects
        if exists dependent objects,sync dependent objects objects in advance
        '''
        sql_check_dependent = r'''              
                             select * from
                            (
                                select
                                distinct rtrim(lower(s.type)) collate  chinese_prc_ci_as as obj_type, 
                                quotename(d.referenced_schema_name)+'.'+quotename(d.referenced_entity_name) collate  chinese_prc_ci_as as obj
                                from sys.dm_sql_referenced_entities('{0}','object')  as d
                                    inner join sys.sysobjects s on s.id = d.referenced_id
                                union all
                                select
                                distinct rtrim(lower(d.referenced_class_desc)) collate  chinese_prc_ci_as  as obj_type,
                                quotename(d.referenced_schema_name)+'.'+quotename(d.referenced_entity_name) collate  chinese_prc_ci_as as obj
                                from sys.dm_sql_referenced_entities('{0}','object')  as d
                                    inner join sys.types s on s.user_type_id = d.referenced_id
                            )t
                             '''.format(self.format_object_name(obj_name))

        cursor_source.execute(sql_check_dependent)
        result = cursor_source.fetchall()
        for row in result:
            if row[1]:
                if (row[0] == "u"):
                    if (row[1]):
                        self.sync_table_schema_byname(row[1],is_refernece)
                elif (row[0] == "fn" or row[0] == "if"):
                    if (row[1]):
                        self.sync_procudre_by_name("f", row[1],is_refernece)
                elif (row[0] == "type"):
                    if (row[1]):
                        self.sync_table_variable(row[1],is_refernece)

    def sync_procudre_by_name(self, type, obj_name,is_reference):
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()

        if (self.exits_object(conn_source, self.format_object_name(obj_name))) == 0:
            print("---------------warning: object " + obj_name + " not existing in source database ----------------")

        if (self.exits_object(conn_target, self.format_object_name(obj_name))) > 0:
            if (self.f != "y"):
                print("---------------warning: object " + obj_name + " not existing in source database ----------------")
                return

        ''' 
        本来想直接生成删除语句的：    
        这里有一个该死的转义，怎么都弄不好，中午先去吃饭吧，
        下午回来想了一下，换一种方式，不要死磕转义问题了
        sql_script =  
                    select
                    'if object_id('+''''+quotename(schema_name(uid))+ '' + quotename(name)+''''+') is not null '
                    +' drop proc '+quotename(schema_name(uid))+ '.' + quotename(name) ,
                    object_definition(id)
                    from sys.sysobjects where xtype = 'p' and  uid not in (16,19)

        '''
        sql_script = r'''
                    select
                    quotename(schema_name(uid))+'.'+quotename(name),
                    object_definition(id)
                    from sys.sysobjects where xtype in ('p','if','fn') and  uid not in (16,19)
                  '''
        if (obj_name):
            sql_script = sql_script + " and  quotename(schema_name(uid))+ '.' + quotename(name) ='{0}' ".format(
                self.format_object_name(obj_name))



        cursor_source.execute(sql_script)
        row = cursor_source.fetchone()
        try:
            if type == "f":
                sql_script = r'''
                               if object_id('{0}') is not null  
                                    drop function {0}
                            '''.format(self.format_object_name(row[0]))
            elif type == "p":
                sql_script = r'''
                                if object_id('{0}') is not null  
                                    drop proc {0}
                            '''.format(self.format_object_name(row[0]))

            cursor_target.execute(sql_script)  # drop current stored_procudre if exists
            conn_target.commit()

            #sync dependent object
            if(is_reference !="n"):
                self.sync_dependent_object(self.format_object_name(row[0]))

            # sync object it self
            cursor_target.execute(str(row[1]))  # execute create stored_procudre script
            conn_target.commit()
            print("*************sync sp: " + self.format_object_name(row[0]) + " finished *****************")
        except:
            print("------------------  sync " + row[0] + "sp error --------------------------")

        cursor_source.close()
        conn_source.close()
        cursor_target.close()
        conn_target.close()

    def sync_procudre(self, type):
        is_reference = "n"
        conn_source = self.get_connect(self.s_h, self.s_i, self.s_p, self.s_u, self.s_p, self.s_d)
        conn_target = self.get_connect(self.t_h, self.t_i, self.t_p, self.t_u, self.t_p, self.t_d)
        cursor_source = conn_source.cursor()
        cursor_target = conn_target.cursor()
        if (self.s_obj):
            for proc_name in self.s_obj.split(","):
                self.sync_dependent_object(proc_name)
                self.sync_procudre_by_name(type, proc_name,is_reference)
        # sync all sp and function
        else:
            sql_script = r'''
                        select
                        quotename(schema_name(uid))+'.'+quotename(name),
                        object_definition(id)
                        from sys.sysobjects where xtype = upper('{0}') and  uid not in (16,19)
                     '''.format(type)
            cursor_source.execute(sql_script)
            for row in cursor_source.fetchall():
                self.sync_dependent_object(row[0])
                self.sync_procudre_by_name(type, row[0],is_reference)



if __name__ == "__main__":

    '''
    sync = syncdatabaseobject(s_h="127.0.0.1",
                              s_i = "sql2017",
                              s_p = 49744,
                              s_d="db01",

                              t_h="127.0.0.1",
                              t_i="sql2017",
                              t_p=49744,
                              t_d="db02",

                              s_obj_type = "sp",
                              s_obj = "dbo.sp_test01",

                              f="y")
    sync.sync_procudre("p")
    '''


    p_s_h = ""
    p_s_i = "mssql"
    p_s_p = 1433
    p_s_d = ""

    p_s_u = none
    p_s_p = none

    p_s_obj = ""
    p_type = ""


    p_t_s = ""
    p_t_i = "mssql"
    p_t_p = "1433"
    p_t_d = ""

    p_t_u = none
    p_t_p = none

    # force conver target database object,default not force cover target database object
    p_f = "n"

    # sync obj type table|sp
    p_obj_type = none
    # sync whick database object
    p_obj = none

    if len(sys.argv) == 1:
        print(usage)
        sys.exit(1)
    elif sys.argv[1] == '--help':
        print(usage)
        sys.exit()
    elif len(sys.argv) >= 2:
        for i in sys.argv[1:]:
            _argv = i.split('=')

            # source server name
            if _argv[0] == '-s_h':
                p_s_h = _argv[1]
            # source server instance name
            if _argv[0] == '-s_i':
                if (_argv[1]):
                    p_s_i = _argv[1]
            # source server instance port
            if _argv[0] == '-s_p':
                if (_argv[1]):
                    p_s_p = _argv[1]
            # source database name
            if _argv[0] == '-s_d':
                p_s_d = _argv[1]

            if _argv[0] == '-s_u':
                p_s_u = _argv[1]

            if _argv[0] == '-s_p':
                p_s_p = _argv[1]



            if _argv[0] == '-t_h':
                p_t_h = _argv[1]

            if _argv[0] == '-t_i':
                if (_argv[1]):
                    p_t_i = _argv[1]

            if _argv[0] == '-t_p':
                if (_argv[1]):
                    p_t_p = _argv[1]

            if _argv[0] == '-t_d':
                p_t_d = _argv[1]


            if _argv[0] == '-t_u':
                p_t_u = _argv[1]

            if _argv[0] == '-t_p':
                p_t_p = _argv[1]



            if _argv[0] == '-f':
                if (_argv[1]):
                    p_f = _argv[1]


            # object type
            if _argv[0] == '-obj_type':
                if not (_argv[1]):
                    print("-obj_type can not be null （-obj=tab|-obj=sp|-obj=fn|-obj=type）")
                    exit(0)
                else:
                    p_obj_type = _argv[1]

            # object name
            if _argv[0] == '-obj':
                if (_argv[1]):
                    p_obj = _argv[1]




        # require para
        if p_s_h.strip() == "":
            print("source server host cannot be null")
            exit(0)
        if p_s_d.strip() == "":
            print("source server host database name cannot be null")
            exit(0)

        if p_t_h.strip() == "":
            print("target server host cannot be null")
            exit(0)

        if p_t_d.strip() == "":
            print("target server host database name cannot be null")
            exit(0)

        sync = syncdatabaseobject(s_h=p_s_h,
                                  s_i=p_s_i,
                                  s_p=p_s_p,
                                  s_d=p_s_d,
                                  s_u=p_s_u,
                                  s_p=p_s_p,


                                  s_obj=p_obj,

                                  t_h=p_t_h,
                                  t_i=p_t_i,
                                  t_p=p_t_p,
                                  t_d=p_t_d,
                                  t_u=p_t_u,
                                  t_p=p_t_p,

                                  f=p_f)

        sync.validated_connect(p_s_h, p_s_i, p_s_p, p_s_d, p_s_u, p_s_p)
        sync.validated_connect(p_t_h, p_t_i, p_t_p, p_t_d, p_t_u, p_t_p)

        if(p_f.upper()=="y"):
            confirm = input("confirm you want to overwrite the target object? ")
            if confirm.upper() != "y":
                exit(0)



        print("-------------------------- sync begin ----------------------------------")
        print()

        if (p_obj_type == "tab"):
            # sync schema
            sync.sync_schema()
            # sync table schema
            sync.sync_table_schema()
            # sync data
            sync.sync_table_data()
        elif (p_obj_type == "sp"):
            # sync schema
            sync.sync_schema()
            # sync sp
            sync.sync_procudre("p")
        elif (p_obj_type == "fn"):
            # sync schema
            sync.sync_schema()
            # sync sp
            sync.sync_procudre("fn")
        elif (p_obj_type == "tp"):
            # sync schema
            sync.sync_schema()
            # sync sp
            sync.sync_table_variable()
        else:
            print("-obj_type is not validated")

        print()
        print("-------------------------- sync finish ----------------------------------")

上一篇：穷举（二）：直接确定区间穷举

下一篇：第四章爬虫进阶之图形验证码识别技术

基于Python的SQL Server数据库对象同步轻量级实现

基于Python的SQL Server数据库实现对象同步轻量级

基于Python的SQL Server数据库实现对象同步轻量级

Python实现从SQL型数据库读写dataframe型数据的方法【基于pandas】