欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

azkaban错误排查日志

程序员文章站 2022-07-14 08:34:05
...

azkaban错误排查日志

错误排查一:

azkaban运行kettle任务时报错:

02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 - Error updating batch
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 - Column 'en_name' cannot be null
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 - 
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.core.database.Database.createKettleDatabaseBatchException(Database.java:1425)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.core.database.Database.emptyAndCommit(Database.java:1414)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.trans.steps.tableoutput.TableOutput.dispose(TableOutput.java:586)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at org.pentaho.di.trans.step.RunThread.run(RunThread.java:97)
02-08-2018 14:13:15 CST EXT_T_PMS_HR_EMPLOYEE_PMS INFO - 2018/08/02 14:13:15 - writeData.0 -    at java.lang.Thread.run(Thread.java:748)

这个错误的原因是:运行kettle时,kettle将''自动识别为null,继续按照null插入表的时候,就报上述的错。但是我已经在kettle.properties文件中做修改(如果这里有不了解的同学,可以私聊问我),但仍然报null错,我就郁闷了。
我们知道kettle.properties这个文件在每个运行kettle的用户的主目录中都会存在,我修改的只是用户guaishou下的kettle.properties。却没有修改root用户下的kettle.properties文件【或者说该guaishou用户根本没有kettle.properties这个文件】。这时候,或许就有同学说,那你直接修改这个root用户主目录下的kettle.properties文件补救解决问题了吗?这么做是可以的,但是我们并不应该绕开这个问题。
对于每个问题,都需要思考一下,再给出行动方案。
我在guaishou的用户,在其下运行azkaban的job时,却没有抛出这个错误,这是为何?这么来看,任务都是对的,错误的原因可能起自:azkaban。但是azkaban又会是什么错呢?
原因1:azkaban以不同的身份运行这个任务,从而导致出错。与这个问题相似的也有No such directory错。
原因2:azkaban会调度任务,会将任务分配到不同的executor上执行,当有些节点没有布置这个kettle时,就会报null错,这是因为在两个azkaban可运行的机器中,没有相同的文件目录。
我们可以通过azkaban数据库查找executor的具体信息,如下:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| sys_azkaban        |
+--------------------+
2 rows in set (0.00 sec)

mysql> use sys_azkaban;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| sys_azkaban        |
+--------------------+
2 rows in set (0.00 sec)

mysql> show tables;
+--------------------------+
| Tables_in_sys_azkaban    |
+--------------------------+
| ····                     |
| executors                |
···                        |
+--------------------------+
28 rows in set (0.00 sec)

mysql> select * from executors;
+----+----------+-------+--------+
| id | host     | port  | active |
+----+----------+-------+--------+
| 12 | bi_5.109 | 12321 |      1 |
| 16 | bi_5.110 | 12321 |      1 |
+----+----------+-------+--------+
1 row in set (0.00 sec)

可以看到这个bi_5.109也是一个active的executor。这个null的问题,以及脚本找不到的问题就是源自这个executor下。因为有两个executor,但是我的执行脚本只在bi_5.110这台机器上,所以导致出现No such directory。可以查看一下出现该错的日志信息:
azkaban错误排查日志
可以看到是在bi_5.109上执行该任务,所以报错。

错误排查二

azkaban运行kettle任务报错(错误描述:每当运行任务,就出现faild)。azkaban后台报错:

2018-08-02 15:34:04 INFO  ExecutorManager:265 - Successfully refreshed executor: bi_5.110:12321 (id: 16) with executor info : ExecutorInfo{remainingMemoryPercent=96.25209633680396, remainingMemoryInMB=30812, remainingFlowCapacity=30, numberOfAssignedFlows=0, lastDispatchedTime=1533192900505, cpuUsage=0.0}
2018-08-02 15:34:04 INFO  ExecutorManager:1813 - Using dispatcher for execution id :4222
2018-08-02 15:34:04 ERROR ExecutorManager:1392 - Rolling back executor assignment for execution id:4222
azkaban.executor.ExecutorManagerException: java.io.IOException: java.nio.file.FileSystemException: executions/4222/azkabanJob/batch101/batch101.job -> /data/software/azkaban/exec/projects/7.5/azkabanJob/batch101/batch101.job: Operation not permitted
    at azkaban.executor.ExecutorApiGateway.callWithExecutionId(ExecutorApiGateway.java:78)
    at azkaban.executor.ExecutorApiGateway.callWithExecutable(ExecutorApiGateway.java:43)
    at azkaban.executor.ExecutorManager.dispatch(ExecutorManager.java:1389)
    at azkaban.executor.ExecutorManager.access$1500(ExecutorManager.java:65)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1750)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.processQueuedFlows(ExecutorManager.java:1730)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.run(ExecutorManager.java:1668)

查看azkaban下的executor的权限,如下:

[root@bi_5 projects]# ll
total 0
drwxr-xr-x 2 guaishou guaishou  61 Jul 27 14:09 1.1
drwxrwxr-x 2 guaishou guaishou  21 Jul 27 14:09 11.1
drwxr-xr-x 2 guaishou guaishou 109 Jul 27 14:09 1.2
drwxr-xr-x 2 guaishou guaishou  19 Jul 27 14:09 1.3
drwxrwxr-x 2 guaishou guaishou  39 Aug  2 14:55 13.1
drwxrwxr-x 2 guaishou guaishou  24 Jul 27 14:09 2.1
drwxr-xr-x 2 guaishou guaishou  23 Jul 27 14:09 3.1
drwxr-xr-x 2 root     root      39 Jul 27 20:11 4.1
drwxrwxr-x 2 guaishou guaishou  24 Jul 27 14:09 5.1
drwxr-xr-x 2 root     root      43 Jul 30 15:02 6.1
drwxr-xr-x 3 root     root      23 Jul 31 15:29 7.1
drwxr-xr-x 3 root     root      23 Aug  1 17:36 7.3
drwxr-xr-x 3 root     root      23 Aug  2 10:46 7.5
drwxrwxr-x 2 guaishou guaishou  24 Jul 27 14:09 8.1
[root@bi_5 projects]# cd ..
[root@bi_5 exec]# cd ..
[root@bi_5 azkaban]# ll
total 0
drwxr-xr-x 11 guaishou guaishou 155 Aug  2 14:31 exec

修改./exec/下的所有情况的文件权限并如下所示:
[root@bi_5 azkaban]# chown guaishou.guaishou ./exec/ -R
[root@bi_5 azkaban]# cd exec/
[root@bi_5 exec]# ll
total 12
drwxr-xr-x  2 guaishou guaishou  107 Jul 27 14:09 bin
drwxr-xr-x  2 guaishou guaishou   78 Jul 27 17:35 conf
-rw-r--r--  1 guaishou guaishou    6 Aug  2 14:31 currentpid
drwxrwsr-x  3 guaishou guaishou   17 Aug  2 15:34 executions
-rw-r--r--  1 guaishou guaishou    6 Aug  2 14:31 executor.port
drwxr-xr-x  2 guaishou guaishou    6 Jul 27 14:09 extlib
drwxr-xr-x  2 guaishou guaishou 4096 Jul 27 14:09 lib
drwxr-xr-x  2 guaishou guaishou   89 Jul 27 14:09 logs
drwxr-xr-x  3 guaishou guaishou   21 Jul 27 14:09 plugins
drwxr-xr-x 16 guaishou guaishou  148 Aug  2 14:55 projects
drwxr-xr-x  2 guaishou guaishou    6 Aug  2 14:55 temp

Web UI报错页面如下所示:
azkaban错误排查日志
这个粉红色的页面,正是因为没有权限运行这些任务导致。按照上述修改即可。