Python 规则12 应用规则集_从mysql拉取数据
程序员文章站
2022-05-28 12:22:10
...
说明
正好有个需求需要从mysql拉取数据,可以实操一下之前定义的规则集。
内容
1 导入包
import FuncDict as fd
import pandas as pd
cur_func_dict = fd.FuncDict1('pull_mysql_data', pack_fpath='./funcs/',lmongo=fd.func_lmongo)
import funcs as fs
cur_func_dict.fs = fs
func_lmongo = fd.func_lmongo
2 读取规则集
query_res_list = func_lmongo.query_many(dbname='rules', cname='rules',kv_list = [{'rule_set':'PullMysqlByStep'}], max_recs=10000)[0]
PullMysqlByStep_df = pd.DataFrame(query_res_list)
我发现拉取id的默认参数是0~10000,步长1000,所以要重新实例化
3 读取参数模板
param_df_cols = fs.get_rule_data(func_lmongo, 'rules', 'meta', 'r_002')
para_df = pd.DataFrame(columns = param_df_cols)
# 需要根据id重新定制化
# min 18662016, max 50618618
# 修改r_0004 的参数
m001_params ={'prj_id':'prj_003',
'prj_name':'manual_pull_mysql_data1',
'rule_set':'PullMysqlByStep',
'rule_id':'r_0004',
'params':{'min_idx':18662016, 'max_idx':50618619, 'step':100000}}
para_df = para_df.append(m001_params, ignore_index=True)
# 实例化
the_rule_set = fs.gen_rule_set_df(PullMysqlByStep_df,para_df,fs)
4 输入参数
# 测试的变量空间
# mysql 连接
cfg_config ={}
cfg_config['user'] = 'xxx'
cfg_config['port'] = 1111
cfg_config['host'] = 'IP'
cfg_config['password'] = 'xxx'
cfg_config['db'] = 'xxx'
# 文件夹地址
folder_path = './mysql_pull_1/'
# 需要的字段 ','.join(query_var_list) -> '*'
query_var_list = ['the_id, the_content']
# 表名
query_table_name = 'the_table'
# id字段名
query_id_name = 'id'
cur_dict ={}
cur_dict['input_dict'] = {}
cur_dict['input_dict']['cfg_config'] = cfg_config
cur_dict['input_dict']['folder_path'] = folder_path
cur_dict['input_dict']['query_var_list'] = query_var_list
cur_dict['input_dict']['query_table_name'] = query_table_name
cur_dict['input_dict']['query_id_name'] = query_id_name
5 执行
fs.run_ruleset_v2('pulltest', the_rule_set, cur_dict, fs)
---
...
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000038662016_000038762016.pkl
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000038762016_000038862016.pkl
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000038862016_000038962016.pkl
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000038962016_000039062016.pkl
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000039062016_000039162016.pkl
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000039162016_000039262016.pkl
data save to pickle: ./mysql_pull_1/data/da40754e24ce365a075ec4e6266947cc/slice_000039262016_000039362016.pkl
...
6 总结
整体感觉用的还不错,不过合适的步长要自己找。例如本次id的范围从1千多万到8千多万,但实际只有200w有效数据。
用一次,规则集开发的成本就摊薄一次,挺好的。