Spark ml之Binarizer
程序员文章站
2024-02-15 15:03:16
...
- Binarizer 二值化器
- Binarization 二值化是将数值特征阀值化为二进制(0/1)特征的过程。
- Binarizer(ML提供的二元化方法),二元化涉及的参数有inputCol,outputCol和threshold阀值, 输入的特征值大于阀值将二值化为1.0,小于等于阀值的将二值化为0.0,inputCol支持向量Vector和双精度Double类型
示例:
import org.apache.spark.ml.feature.Binarizer
import org.apache.spark.sql.SparkSession
/**
*
* @author wangjuncheng
* Binarizer 二值化器
*
**/
object Binarizer extends App {
val spark = SparkSession
.builder()
.master("local[*]")
.appName("ml_learn")
// .enableHiveSupport()
.config("", "")
.getOrCreate()
val data = Array((0,0.1),(1,0.8),(2,0.2))
val dataframe = spark.createDataFrame(data).toDF("id","feature")
//Binarizer
val binarizer = new Binarizer()
.setInputCol("feature")
.setOutputCol("binarized_feature")
.setThreshold(0.5)
val binarizerDataFrame = binarizer.transform(dataframe)
//结果
println(binarizer.getThreshold)
binarizerDataFrame.show()
spark.stop()
}
输出结果:
上一篇: 分类算法-决策树、随机森林