欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

kettle-如何在kettle中编写java代码

程序员文章站 2022-04-28 09:40:34
...

User Defined Java Class

Table of contents

You can use the User Defined Java Class step to enter your own Java class to drive the functionality of a complete step. You can program your own plugin into a step, yet the goal of this step is not to do full-scale Java development inside of a step. A whole plugin system is available to help with that part (see Embed and Extend PDI Functionality). The goal is for you to just define Java methods and logic. For this step, the Janino project libraries are used to compile Java code in the form of classes at runtime.

Not 100% Java

Janino and this step do not need the complete Java class. It only needs the class body (such as the imports, constructors, and methods). The step does not need the full class declaration. This step was designed with this approach, over the definition of the full class, to hide technical details and methods for ease of use.

You enter your main code into the Processor, which defines the processRow() method. In PDI, the following imports are already a part of the Processor code:

  • org.pentaho.di.trans.steps.userdefinedjavaclass.*
  • org.pentaho.di.trans.step.*
  • org.pentaho.di.core.row.*
  • org.pentaho.di.core.*
  • org.pentaho.di.core.exception.*

The imports listed above are only a part of the Processor code. They are not a part of any code blocks you might enter into additional Class code tabs.

If you need to add other imports to your Processor code, include them at the very top of the code you will create for this step, as shown in the following example:

import java.util.*;

Janino, essentially a Java complier, only supports a sub-set of the Java 1.8.x specification. To see a complete list of the features and limitations, see the Janino homepage.

General

kettle-如何在kettle中编写java代码

Enter the following information in the transformation step name field.

  • Step Name: Specifies the unique name of the transformation step on the canvas. The step name is set to ‘User Defined Java Class’ by default.

Use the Class code panel and the option tabs to enter your defined Java class. After you specify your Java class, you click Test class to test it.

Class Code

kettle-如何在kettle中编写java代码

Add your defined Java code directly in the Processor tab in the Class code panel. You can create additional tabs for more code blocks by right-clicking and selecting Add new. This menu also includes options for copying a tab, setting a transformation class, or removing a class type.

Process Rows

The Processor code defines the processRow() method, which is the heart of the step. This method is called by the transformation in a tight loop and will continue until false is returned.

getRow() method must be called before the first get(Fieds.in, FIELD_NAME) that helps to avoid situations with unexpected fields ordering in the data obtained from the previous step (such as Mapping input specification).

A very simple example that calculates firstname+" "+lastname and stores it into a nameField is shown in the following example Processor code block:

String firstnameField;
String lastnameField;
String nameField;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
// Let's look up parameters only once for performance reason.
//
if (first) {
  firstnameField = getParameter("FIRSTNAME_FIELD");
  lastnameField = getParameter("LASTNAME_FIELD");
  nameField = getParameter("NAME_FIELD");
  first=false;
}

// First, get a row from the default input hop
//
Object[] r = getRow();

// If the row object is null, we are done processing.
//
if (r == null) {
  setOutputDone();
  return false;
}

// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
//
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

String firstname = get(Fields.In, firstnameField).getString(r);
String lastname = get(Fields.In, lastnameField).getString(r);

// Set the value in the output field
//
String name = firstname+" "+lastname;
get(Fields.Out, nameField).setValue(outputRow, name);

// putRow will send the row on to the default output hop.
//
putRow(data.outputRowMeta, outputRow);

return true;
}

Error Handling

If you want PDI to handle errors that may occur while running your class in a transformation, you must implement for your own error handling code. Before adding any error handling code, right-click on the User Defined Java Class step in the PDI client canvas and select Error Handling in the menu that appears. The resulting Step error handling settings dialog box contains options for specifying an error target step and associated field names that you will use to implement error handling in your defined code.

The following try code block from the User Defined Java Class – Lambda Examples.ktr in the data-intregation/samples/transformations directory contains an example of such error handling:

try {

Object     numList = strsList.stream()
                        .map( new ToInteger() )
                     .sorted( new ReverseCase() )
                     .collect( Collectors.toList() );

    get( Fields.Out, "reverseOrder" ).setValue( row, numList.toString() );

} catch (NumberFormatException ex) {
    // Number List contains a value that cannot be converteds to an Integer.
    rowInError = true;
    errMsg = ex.getMessage();
    errCnt = errCnt + 1;
}

if ( !rowInError ) {
    putRow( data.outputRowMeta, row );
} else {
    // Output errors to the error hop. Right click on step and choose "Error Handling..."
    putError(data.outputRowMeta, row, errCnt, errMsg, "Not allowed", "DEC_0");
}

The try in the code sample above tests to see if numList contains valid numbers. If the list contains a number that is not valid, putError is used to handle the error and direct it to the wlog: ErrorPath step in the sample transformation. The ErrorPath step is also specified in the Target steps tab of the User Define Java Class step.

Logging

You need to implement logging in your defined step if you want PDI to log data actions from your class, such as read, write, output, or update data. The following code is an example of how to implement logging:

putRow( data.outputMeta, r );

if ( checkFeedback( getLinesOutput() ) ) {
  if ( log.isBasic() ) {
    logBasic( "Have I got rows for you! " + getLinesOutput() );
  }
}

Class and Code Fragments

kettle-如何在kettle中编写java代码

You can navigate through your defined classes along with related code snippets and fields through the Classes and Code Fragments panel. You can right-click on any item in this tree to either Delete, Rename, or Show Sample.

Classes

The Classes folder indicates what classes have corresponding code block tabs in the Class Code panel.

Code Snippits

The Code Snippits folder shows the internal PDI code related to the User Defined Java Class step. These snippits are shown as reference for the code of your class.

Input Fields

The Input fields folder contains any input fields you define in your code. While working with your defined code, you will be handling input and output fields. Many ways exist for handling input fields. For example, to start, examine the following description of an input row:

RowMetaInterface inputRowMeta = getInputRowMeta();

The inputRowMeta object contains the metadata of the input row. It includes all the fields, their data types, lengths, names, format masks, and more. You can use this object to look up input fields. For example, if you want to look for a field called customer, you would use the following code:

ValueMetaInterface customer = inputRowMeta.searchValueMeta("year");

Because looking up field names can be slow if you need to do it for every row that passes through a transformation, you could look up field names in advance in a first block of code, as shown in the following example:

if (first) {
 yearIndex = getInputRowMeta().indexOfValue(getParameter("YEAR"));
 if (yearIndex<0) {
   throw new KettleException("Year field not found in the input row, check parameter 'YEAR'\!");
 }
}

To get the Integer value contained in the year field, you can then use the following construct:

Object[] r = getRow();
...
Long year = inputRowMeta().getInteger(r, yearIndex);

To make this process easier, you can use a shortcut in the following form:

Long year = get(Fields.In, "year").getInteger(r);

This method also takes into account the index-based optimization mentioned above.

The Java data types that you get from previous steps always corresponds to the PDI data type as described on the PDI Rows Of Data page.

Info Fields

The Info fields folder contains any information fields you define in your code. These fields will not appear in the Classes and Code Fragments panel until they are defined in your code. If no information fields are defined in your code, nothing will show in this folder.

Output Fields

You can define all the new fields you want to use as the output of the step in the Fields tab. Setting fields in this tab will automatically calculate the layout of the output row metadata and store it in data.outputRowMeta, which enables you to create the output row.

In cases where the step writes as many (or as few) rows as it reads, you can resize the row you get on input, as shown in the following example code:

	Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());

or in the following example code:

Object[] outputRowData = createOutputRow(r, data.outputRowMeta.size());

If you are copying the rows, create separate copies to prevent subsequent steps from modifying the same Object[] copy many times at once, as shown in the following example:

Object[] outputRowData = RowDataUtil.createResizedCopy(r, data.outputRowMeta.size());

As with accessing input fields, output fields can be addressed through the index in the output row, as shown in the following example:

outputRowData[getInputRowMeta().size()] = easterDate(year.intValue());

or using the shortcut that is shown in the following example:

get(Fields.Out, "easter").setValue(r, easterDate(year.intValue());

The Java data types that you pass to subsequent steps always need to correspond to the PDI data type as described on the PDI Rows Of Data page.

Options

The User Defined Java Class step features several tabs. Each tab is described below.

Fields Tab

kettle-如何在kettle中编写java代码
The Fields table defines the output field you want to pass on to next steps in the transformation. Any field specified in the table will appear in the Output fields folder in the Classes and Code Fragments panel.

Parameters Tab

kettle-如何在kettle中编写java代码

You can use the Parameters table to avoid using hard-coded string values, such as field names (customer for example).

Another example is in the User Defined Java Class - Calculate the date of Easter.ktr sample transformation, which is in the data-intregation/samples/transformations directory. That sample KTR has a parameter called YEAR. The YEAR parameter is referenced with the getParameter() method, as shown in the following example:

getParameter("YEAR")

At runtime, it will return the year String value.

Class Member Variables and the getVariable Function

When getting parameters that point to transformation parameters, the User Defined Java Class step behaves differently depending on when the getVariable function is called. If it is in the init() method, the parameters behave as expected. If the initialization is on a class member variable, the variable is not resolved by design, as shown in the following example:

private final String par = getVariable("somePar"); // DOES NOT resolve correctly
private String par2 = null;
 
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
   logBasic("Parameter value="+par+"\[MEMBER INIT\]");
   logBasic("Parameter value="+par2+"\[INIT FUNCTION\]");
   setOutputDone();
   return false;
}
 
public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface) {
   par2 = getVariable("somePar"); // WORKS FINE
   return parent.initImpl(stepMetaInterface, stepDataInterface);
}

Info Steps Tab

kettle-如何在kettle中编写java代码
As the GetRow() method returns the first row from any input stream (either input stream or info stream), the Info steps table is used when the rowMeta input and rowMeta information vary.

Read or get all the data values from the information stream before calling the getRow() method, as shown in the following code example:

if (first){
 first = false;
 
 /* TODO: Your code here. (Using info fields)
 
 FieldHelper infoField = get(Fields.Info, "info_field_name");
 
 RowSet infoStream = findInfoRowSet("info_stream_tag");
 
 Object[] infoRow = null;
 
 int infoRowCount = 0;
 
 // Read all rows from info step before calling getRow() method, which returns first row from any
 // input rowset. As rowMeta for info and input steps varies getRow() can lead to errors.
 while((infoRow = getRowFrom(infoStream)) != null){
 
   // do something with info data
   infoRowCount++;
 }
 */
}
 
Object[] r = getRow();
 
if (r == null) {
       setOutputDone();
       return false;
}

Target Steps Tab

kettle-如何在kettle中编写java代码
You can use the Target steps table to target the output of the User Defined Java Class to specific steps in your transformation.

Examples

The data-integration/samples/transformations directory contains the following example KTRs that show how to use this step:

Example KTR File Description
User Defined Java Class - Calculate the date of Easter.ktr Develops a class to calculate the date of Easter.
User Defined Java Class - Concatenate firstname and lastname.ktr Develops a class to combine first and last names into a full name.
User Defined Java Class - Query User Defined Java Classdatabase catalog.ktr Shows how a user-defined class would access a database.
Last updatedDec 8, 2018 Save as PDF User Defined Java Class - Real-time search on Twitter.ktr Shows how to use a user-defined class in a real-time system.
User Defined Java Class - LambdaExamples.ktr Shows how to use the Java streaming API with a User Defined Java Class step.

We recommend starting with the User Defined Java Class - Calculate the date of Easter.ktr example transformation.

Metadata Injection Support

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.

参考

https://help.pentaho.com/Documentation/8.2/Products/Data_Integration/Transformation_Step_Reference/User_Defined_Java_Class

相关标签: etl