ML.NET速览
程序员文章站
2022-05-11 14:28:14
什么是ML.NET? ML.NET是由微软创建,为.NET开发者准备的开源机器学习框架。它是跨平台的,可以在macOS,Linux及Windows上运行。 机器学习管道 ML.NET通过管道(pipeline)方式组合机器学习过程。整个管道分为以下四个部分: Load Data 加载数据 Trans ......
什么是ml.net?
ml.net是由微软创建,为.net开发者准备的开源机器学习框架。它是跨平台的,可以在macos,linux及windows上运行。
机器学习管道
ml.net通过管道(pipeline)方式组合机器学习过程。整个管道分为以下四个部分:
- load data 加载数据
- transform data 转换数据
- choose algorithm 选择算法
- train model 训练模型
示例
建立一个控制台项目。
dotnet new console -o myapp cd myapp
添加ml.net类库包。
dotnet add package microsoft.ml
在工程文件夹下创建一个名为iris-data.txt的文本文件,内容如下:
5.1,3.5,1.4,0.2,iris-setosa 4.9,3.0,1.4,0.2,iris-setosa 4.7,3.2,1.3,0.2,iris-setosa 4.6,3.1,1.5,0.2,iris-setosa 5.0,3.6,1.4,0.2,iris-setosa 5.4,3.9,1.7,0.4,iris-setosa 4.6,3.4,1.4,0.3,iris-setosa 5.0,3.4,1.5,0.2,iris-setosa 4.4,2.9,1.4,0.2,iris-setosa 4.9,3.1,1.5,0.1,iris-setosa 5.4,3.7,1.5,0.2,iris-setosa 4.8,3.4,1.6,0.2,iris-setosa 4.8,3.0,1.4,0.1,iris-setosa 4.3,3.0,1.1,0.1,iris-setosa 5.8,4.0,1.2,0.2,iris-setosa 5.7,4.4,1.5,0.4,iris-setosa 5.4,3.9,1.3,0.4,iris-setosa 5.1,3.5,1.4,0.3,iris-setosa 5.7,3.8,1.7,0.3,iris-setosa 5.1,3.8,1.5,0.3,iris-setosa 5.4,3.4,1.7,0.2,iris-setosa 5.1,3.7,1.5,0.4,iris-setosa 4.6,3.6,1.0,0.2,iris-setosa 5.1,3.3,1.7,0.5,iris-setosa 4.8,3.4,1.9,0.2,iris-setosa 5.0,3.0,1.6,0.2,iris-setosa 5.0,3.4,1.6,0.4,iris-setosa 5.2,3.5,1.5,0.2,iris-setosa 5.2,3.4,1.4,0.2,iris-setosa 4.7,3.2,1.6,0.2,iris-setosa 4.8,3.1,1.6,0.2,iris-setosa 5.4,3.4,1.5,0.4,iris-setosa 5.2,4.1,1.5,0.1,iris-setosa 5.5,4.2,1.4,0.2,iris-setosa 4.9,3.1,1.5,0.1,iris-setosa 5.0,3.2,1.2,0.2,iris-setosa 5.5,3.5,1.3,0.2,iris-setosa 4.9,3.1,1.5,0.1,iris-setosa 4.4,3.0,1.3,0.2,iris-setosa 5.1,3.4,1.5,0.2,iris-setosa 5.0,3.5,1.3,0.3,iris-setosa 4.5,2.3,1.3,0.3,iris-setosa 4.4,3.2,1.3,0.2,iris-setosa 5.0,3.5,1.6,0.6,iris-setosa 5.1,3.8,1.9,0.4,iris-setosa 4.8,3.0,1.4,0.3,iris-setosa 5.1,3.8,1.6,0.2,iris-setosa 4.6,3.2,1.4,0.2,iris-setosa 5.3,3.7,1.5,0.2,iris-setosa 5.0,3.3,1.4,0.2,iris-setosa 7.0,3.2,4.7,1.4,iris-versicolor 6.4,3.2,4.5,1.5,iris-versicolor 6.9,3.1,4.9,1.5,iris-versicolor 5.5,2.3,4.0,1.3,iris-versicolor 6.5,2.8,4.6,1.5,iris-versicolor 5.7,2.8,4.5,1.3,iris-versicolor 6.3,3.3,4.7,1.6,iris-versicolor 4.9,2.4,3.3,1.0,iris-versicolor 6.6,2.9,4.6,1.3,iris-versicolor 5.2,2.7,3.9,1.4,iris-versicolor 5.0,2.0,3.5,1.0,iris-versicolor 5.9,3.0,4.2,1.5,iris-versicolor 6.0,2.2,4.0,1.0,iris-versicolor 6.1,2.9,4.7,1.4,iris-versicolor 5.6,2.9,3.6,1.3,iris-versicolor 6.7,3.1,4.4,1.4,iris-versicolor 5.6,3.0,4.5,1.5,iris-versicolor 5.8,2.7,4.1,1.0,iris-versicolor 6.2,2.2,4.5,1.5,iris-versicolor 5.6,2.5,3.9,1.1,iris-versicolor 5.9,3.2,4.8,1.8,iris-versicolor 6.1,2.8,4.0,1.3,iris-versicolor 6.3,2.5,4.9,1.5,iris-versicolor 6.1,2.8,4.7,1.2,iris-versicolor 6.4,2.9,4.3,1.3,iris-versicolor 6.6,3.0,4.4,1.4,iris-versicolor 6.8,2.8,4.8,1.4,iris-versicolor 6.7,3.0,5.0,1.7,iris-versicolor 6.0,2.9,4.5,1.5,iris-versicolor 5.7,2.6,3.5,1.0,iris-versicolor 5.5,2.4,3.8,1.1,iris-versicolor 5.5,2.4,3.7,1.0,iris-versicolor 5.8,2.7,3.9,1.2,iris-versicolor 6.0,2.7,5.1,1.6,iris-versicolor 5.4,3.0,4.5,1.5,iris-versicolor 6.0,3.4,4.5,1.6,iris-versicolor 6.7,3.1,4.7,1.5,iris-versicolor 6.3,2.3,4.4,1.3,iris-versicolor 5.6,3.0,4.1,1.3,iris-versicolor 5.5,2.5,4.0,1.3,iris-versicolor 5.5,2.6,4.4,1.2,iris-versicolor 6.1,3.0,4.6,1.4,iris-versicolor 5.8,2.6,4.0,1.2,iris-versicolor 5.0,2.3,3.3,1.0,iris-versicolor 5.6,2.7,4.2,1.3,iris-versicolor 5.7,3.0,4.2,1.2,iris-versicolor 5.7,2.9,4.2,1.3,iris-versicolor 6.2,2.9,4.3,1.3,iris-versicolor 5.1,2.5,3.0,1.1,iris-versicolor 5.7,2.8,4.1,1.3,iris-versicolor 6.3,3.3,6.0,2.5,iris-virginica 5.8,2.7,5.1,1.9,iris-virginica 7.1,3.0,5.9,2.1,iris-virginica 6.3,2.9,5.6,1.8,iris-virginica 6.5,3.0,5.8,2.2,iris-virginica 7.6,3.0,6.6,2.1,iris-virginica 4.9,2.5,4.5,1.7,iris-virginica 7.3,2.9,6.3,1.8,iris-virginica 6.7,2.5,5.8,1.8,iris-virginica 7.2,3.6,6.1,2.5,iris-virginica 6.5,3.2,5.1,2.0,iris-virginica 6.4,2.7,5.3,1.9,iris-virginica 6.8,3.0,5.5,2.1,iris-virginica 5.7,2.5,5.0,2.0,iris-virginica 5.8,2.8,5.1,2.4,iris-virginica 6.4,3.2,5.3,2.3,iris-virginica 6.5,3.0,5.5,1.8,iris-virginica 7.7,3.8,6.7,2.2,iris-virginica 7.7,2.6,6.9,2.3,iris-virginica 6.0,2.2,5.0,1.5,iris-virginica 6.9,3.2,5.7,2.3,iris-virginica 5.6,2.8,4.9,2.0,iris-virginica 7.7,2.8,6.7,2.0,iris-virginica 6.3,2.7,4.9,1.8,iris-virginica 6.7,3.3,5.7,2.1,iris-virginica 7.2,3.2,6.0,1.8,iris-virginica 6.2,2.8,4.8,1.8,iris-virginica 6.1,3.0,4.9,1.8,iris-virginica 6.4,2.8,5.6,2.1,iris-virginica 7.2,3.0,5.8,1.6,iris-virginica 7.4,2.8,6.1,1.9,iris-virginica 7.9,3.8,6.4,2.0,iris-virginica 6.4,2.8,5.6,2.2,iris-virginica 6.3,2.8,5.1,1.5,iris-virginica 6.1,2.6,5.6,1.4,iris-virginica 7.7,3.0,6.1,2.3,iris-virginica 6.3,3.4,5.6,2.4,iris-virginica 6.4,3.1,5.5,1.8,iris-virginica 6.0,3.0,4.8,1.8,iris-virginica 6.9,3.1,5.4,2.1,iris-virginica 6.7,3.1,5.6,2.4,iris-virginica 6.9,3.1,5.1,2.3,iris-virginica 5.8,2.7,5.1,1.9,iris-virginica 6.8,3.2,5.9,2.3,iris-virginica 6.7,3.3,5.7,2.5,iris-virginica 6.7,3.0,5.2,2.3,iris-virginica 6.3,2.5,5.0,1.9,iris-virginica 6.5,3.0,5.2,2.0,iris-virginica 6.2,3.4,5.4,2.3,iris-virginica 5.9,3.0,5.1,1.8,iris-virginica
粘贴下面的代码到program文件中。
using system; using microsoft.ml; using microsoft.ml.runtime.api; using microsoft.ml.runtime.data; namespace myapp { class program { public class irisdata { public float sepallength; public float sepalwidth; public float petallength; public float petalwidth; public string label; } public class irisprediction { [columnname("predictedlabel")] public string predictedlabels; } static void main(string[] args) { var mlcontext = new mlcontext(); string datapath = "iris-data.txt"; var reader = mlcontext.data.textreader(new textloader.arguments() { separator = ",", hasheader = true, column = new[] { new textloader.column("sepallength", datakind.r4, 0), new textloader.column("sepalwidth", datakind.r4, 1), new textloader.column("petallength", datakind.r4, 2), new textloader.column("petalwidth", datakind.r4, 3), new textloader.column("label", datakind.text, 4) } }); idataview trainingdataview = reader.read(new multifilesource(datapath)); var pipeline = mlcontext.transforms.categorical.mapvaluetokey("label") .append(mlcontext.transforms.concatenate("features", "sepallength", "sepalwidth", "petallength", "petalwidth")) .append(mlcontext.multiclassclassification.trainers.stochasticdualcoordinateascent(label: "label", features: "features")) .append(mlcontext.transforms.conversion.mapkeytovalue("predictedlabel")); var model = pipeline.fit(trainingdataview); var prediction = model.makepredictionfunction<irisdata, irisprediction>(mlcontext).predict( new irisdata() { sepallength = 3.3f, sepalwidth = 1.6f, petallength = 0.2f, petalwidth = 5.1f, }); console.writeline($"predicted flower type is: {prediction.predictedlabels}"); } } }
通过dotnet run
命令运行程序后可得到预测结果。
predicted flower type is: iris-virginica
解例
例子中定义了两个类,irisdata与irisprediction。irisdata类是用于训练的数据结构,而irisprediction则用于预测。
mlcontext类用于定义ml.net的上下文(context),可以理解为是它的运行时环境。
接着,创建一个textreader,用于读取数据集文件,可以看到其中规定了读取的格式。这里即是机器学习管道的第一步。
第二步,转换irisdata类中label属性的类型,使之成为数值类型,因为只有数值类型的数据才能在模型训练中被使用。再将sepallength,sepalwidth,petallength与petalwidth合并为一,统合为数据集的features。
第三步,为训练选择合适的算法,并传入标签(label)和特征(features)。
第四步,训练模型。
完成模型后,就可以用它进行预测了。因为最后预测的结果是字符串类型,所以在上述第三步的操作后有必要加上转换操作,把结果从数值类型再转回字符串类型。