SKIL/工作流程/添加资源
添加资源
运行作业需要先将外部资源添加到SKIL的系统中。在添加资源之前,你需要将相关的凭证文件存储在SKIL集群的一个节点中。
存储凭证
下面显示了存储每种受支持资源类型的凭据的格式。
注意
For HDFS and YARN, no credentials are required as settings are done locally. You'll have to configure the SPARK_HOME
environment variable and point it to the spark root folder for YARN.
对于HDFS和YARN,不需要凭证,因为设置是在本地完成的。你必须配置SPARK_HOME
环境变量,并将其指向YARN的spark root文件夹。
{
"accessKey": "<access_key>",
"secretKey": "<secret_key>"
}
在哪里可以找到凭证?
请访问以下链接以根据你的资源需求获取安全凭据:
- AWS S3 and EMR
- Azure Storage and HDInsight
-
Google Storage and Cloud DataProc - 将此信息保存在一个文件中,并给出
serviceaccountfile
键的路径,如上述代码段中所述。
添加资源
存储完资源凭据后,可以使用以下方法添加相应的资源:
- CLI
- REST端
- UI
1. CLI
skil resources命令通过CLI管理资源。以下代码段显示了如何添加每种类型的资源:
AWS S3
skil resources create-s3 --name <resource_name> --credentialUri <credentials_uri> --bucketId <bucket_id> --region <region>
AWS EMR
skil resources create-emr --name <resource_name> --credentialUri <credentials_uri> --clusterId <cluster_id> --region <region>
Google Storage
skil resources create-google-storage --name <resource_name> --credentialUri <credentials_uri> --projectId <project_id> --bucketName <bucket_name>
Google Cloud DataProc
skil resources create-dataproc --name <resource_name> --credentialUri <credentials_uri> --projectId <project_id> --sparkClusterName <spark_cluster_name> --region <region>
Azure Storage
skil resources create-azure-storage --name <resource_name> --credentialUri <credentials_uri> --containerName <container_name>
Azure HDInsight
skil resources create-hdinsight --name <resource_name> --credentialUri <credentials_uri> --subscriptionId <subscription_id> --resourceGroupName <resource_group_name> --clusterName <cluster_name>
HDFS
skil resources create-hdfs --name <resource_name> --credentialUri <credentials_uri> --nameNodeHost <name_node_host> --nameNodePort <name_node_port>
YARN
skil resources create-yarn --name <resource_name> --credentialUri <credentials_uri> --localSparkHome <local_spark_home>
2. REST 端
使用类似“curl”的工具,你可以通过向http://host:port/resource端点发送post请求来添加资源。通过REST端点添加资源的一般格式如下:
curl -d '<resource_request_data>' -H "Authorization: Bearer <auth_token>" -H "Content-Type: application/json" -X POST http://host:port/resource
注意
你可以通过运行以下curl请求来获取<auth_token>:
curl -d '{"userId":"<userId>", "password":"<password>"}' -H "Content-Type: application/json" -X POST http://localhost:9008/login
其中,<userid>和<password>是登录SKIL的凭据。
对于每种类型的资源,<resource_request_data>将具有以下格式:
AWS S3
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.storage.AzureStorageResourceDetails",
"containerName":"<container_name>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像"file:///path/to/credentials.json
"type":"STORAGE",
"subType":"AzureStorage",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
AWS EMR
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.compute.EMRResourceDetails",
"clusterId":"<cluster_id>",
"region":"<region>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像 "file:///path/to/credentials.json
"type":"COMPUTE",
"subType":"EMR",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
Google Storage
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.storage.GoogleStorageResourceDetails",
"projectId":"<project_id>",
"bucketName":"<bucket_name>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像"file:///path/to/credentials.json
"type":"STORAGE",
"subType":"GoogleStorage",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
Google Cloud DataProc
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.compute.DataProcResourceDetails",
"projectId":"<project_id>",
"region":"<region>",
"sparkClusterName":"<spark_cluster_name>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像 "file:///path/to/credentials.json
"type":"COMPUTE",
"subType":"DataProc",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
Azure Storage
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.storage.AzureStorageResourceDetails",
"containerName":"<container_name>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像 "file:///path/to/credentials.json
"type":"STORAGE",
"subType":"AzureStorage",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
Azure HDInsight
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.compute.HDInsightResourceDetails",
"subscriptionId":"<subscription_id>",
"resourceGroupName":"<resource_group_name>",
"clusterName":"<cluster_name>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像"file:///path/to/credentials.json
"type":"COMPUTE",
"subType":"HDInsight",
"credentialId":<credentials_id> //一个整数
}
//你只需要提供credentialsUri或credentialsId
HDFS
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.storage.HDFSResourceDetails",
"nameNodeHost":"<name_node_host>",
"nameNodePort":"<name_node_port>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像 "file:///path/to/credentials.json
"type":"STORAGE",
"subType":"HDFS",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
YARN
{
"resourceName":"<resource_name>",
"resourceDetails": {
"@class":"io.skymind.resource.model.subtypes.compute.YARNResourceDetails",
"localSparkHome":"<local_spark_home>"
},
"credentialUri":"<credentials_uri>", // 通常看起来像 "file:///path/to/credentials.json
"type":"COMPUTE",
"subType":"YARN",
"credentialId":<credentials_id> // 一个整数
}
//你只需要提供credentialsUri或credentialsId
注意
如果你已被授予凭证,那么你可以在请求中省略credentialsId
,反之亦然。
3. UI
你可以通过单击SKIL仪表盘右上角的“齿轮”图标,然后转到“资源(Resources)”来访问添加资源的用户界面:
单击 "添加资源(Add Resource)"来添加资源 :
选择要添加的资源类型:
现在,填写详细信息,最后单击“添加(Add)…”添加所需资源:
上一篇: 通过Dockerfile定制企业镜像
下一篇: mysql binlog数据恢复