欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

xgboost的pmml文件转为hive udf

程序员文章站 2024-02-27 14:41:45
...

1. 将项目拉到本地

git clone [email protected]:jpmml/jpmml-evaluator-hive.git

2. 进到目录中安装

mvn clean install

3. 将得到的其中一个runtime的`jar`包放到HDFS上

hdfs dfs -put jpmml-evaluator-hive-runtime-1.0-SNAPSHOT.jar somedir/

4. 在hive中加载

add jar {hdfs_home}/somedir/jpmml-evaluator-hive-runtime-1.0-SNAPSHOT.jar;
CREATE TEMPORARY FUNCTION BuildArchive AS 'org.jpmml.evaluator.hive.ArchiveBuilderUDF';
DESCRIBE FUNCTION BuildArchive;
DESCRIBE FUNCTION EXTENDED BuildArchive;

5. 处理pmml

将pmml文件中Output的`probability(1)`、`probability(0)`, 括号替换掉,否则hive无法识别

6. 生成模型jar

三个参数分别是,包名,pmml本地路径,输出jar包的本地路径

SELECT BuildArchive('com.mycompany.XGBPredictor', '/home/model/xgboost_model.pmml', '/home/model/XGBPredictor.jar');

7. 将jar包放到HDFS

hdfs dfs -put XGBPredictor.jar somedir/

8. 加载包

add jar {hdfs_home}/somedir/XGBPredictor.jar;
CREATE TEMPORARY FUNCTION XGBPredictor AS 'com.mycompany.XGBPredictor';
DESCRIBE FUNCTION ModelPredictor;
DESCRIBE FUNCTION EXTENDED T3ModelPredictor;

9. 更改serialization的传输格式,不支持kyro

set hive.plan.serialization.format=javaXML;

10. 测试

select
id,
request_time,
XGBPredictor(named_struct(
    'feature1', nvl(feature1, -1),
    'feature2', nvl(feature2, -1),
    'feature3', nvl(feature3, -1)
)) as proba_struct
from some_table
where id in (22, 23, 24)

得到输出:

id	request_time proba_struct
22	2020-01-31 11:55:24	{"_target":0,"proba_0":0.93001777,"proba_1":0.069982216}
23	2020-01-31 10:06:35	{"_target":0,"proba_0":0.9754708,"proba_1":0.024529234}
24	2020-01-31 10:39:16	{"_target":0,"proba_0":0.96165186,"proba_1":0.038348127}

遗留问题

对于需要MapReduce的任务,在reduce阶段会报错,暂时未解决,求大神指点

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        ... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:140)
        ... 22 more
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseArithmetic.initialize(GenericUDFBaseArithmetic.java:53)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:145)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
        at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead.initialize(ExprNodeEvaluatorHead.java:39)
        at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:979)
        at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1005)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:363)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
        at org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:469)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
        ... 22 more

 

相关标签: 工程