欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  网络运营

IBM云对象存储 - Linux主机通过rclone和COS API上传大文件

程序员文章站 2022-03-15 11:37:54
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-rclone[root@centos-s3fs ~]# yum install -y unzip[root@centos-s3fs ~]# curl https://rclone.org/install.sh | sudo bash...rclone v1.53.3 has successfully installed.Now run "rclone...

云对象存储作为主流公有云数据存储服务已大规模应用,但其基于HTTP/HTTPs协议(RESTful API)、扁平数据结构和网络依赖等特性,在某些文件归档和备份场景中,通过类似s3fs转成文件系统挂载使用时,或多或少有些限制,比如大文件传送或高频繁I/O ,这里以IBM Cloud Object Storage为例,我们尝试通过rclone和ICOS API方式来实现较为稳定的文件传送和带宽控制。

1. Rclone

IBM Cloud官方配置连接文档:
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-rclone

[root@centos-s3fs ~]# yum install -y unzip
[root@centos-s3fs ~]# curl https://rclone.org/install.sh | sudo bash
...
rclone v1.53.3 has successfully installed.
Now run "rclone config" for setup. Check https://rclone.org/docs/ for more details.

[root@centos-s3fs ~]# rclone config
2020/12/19 09:55:26 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> icos-test
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / 1Fichier
   \ "fichier"
 2 / Alias for an existing remote
   \ "alias"
 3 / Amazon Drive
   \ "amazon cloud drive"
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
... ...
Storage> 4
** See help for s3 backend at: https://rclone.org/s3/ **

Choose your S3 provider.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Amazon Web Services (AWS) S3
   \ "AWS"
 2 / Alibaba Cloud Object Storage System (OSS) formerly Aliyun
   \ "Alibaba"
 3 / Ceph Object Storage
   \ "Ceph"
 4 / Digital Ocean Spaces
   \ "DigitalOcean"
 5 / Dreamhost DreamObjects
   \ "Dreamhost"
 6 / IBM COS S3
   \ "IBMCOS"
... ...
provider> 6
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
Only applies if access_key_id and secret_access_key is blank.
Enter a boolean value (true or false). Press Enter for the default ("false").
Choose a number from below, or type in your own value
 1 / Enter AWS credentials in the next step
   \ "false"
 2 / Get AWS credentials from the environment (env vars or IAM)
   \ "true"
env_auth> 1
AWS Access Key ID.
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
access_key_id> xxxxxxxx
AWS Secret Access Key (password)
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
secret_access_key> xxxxxxxxxx
Region to connect to.
Leave blank if you are using an S3 clone and you don't have a region.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Use this if unsure. Will use v4 signatures and an empty region.
   \ ""
 2 / Use this only if v4 signatures don't work, eg pre Jewel/v10 CEPH.
   \ "other-v2-signature"
region> 2
Endpoint for IBM COS S3 API.
Specify if using an IBM COS On Premise.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / US Cross Region Endpoint
   \ "s3.us.cloud-object-storage.appdomain.cloud"
 2 / US Cross Region Dallas Endpoint
   \ "s3.dal.us.cloud-object-storage.appdomain.cloud"
 3 / US Cross Region Washington DC Endpoint
   \ "s3.wdc.us.cloud-object-storage.appdomain.cloud"
 4 / US Cross Region San Jose Endpoint
   \ "s3.sjc.us.cloud-object-storage.appdomain.cloud"
 5 / US Cross Region Private Endpoint
   \ "s3.private.us.cloud-object-storage.appdomain.cloud"
... ...
endpoint> s3.private.eu-de.cloud-object-storage.appdomain.cloud
Location constraint - must match endpoint when using IBM Cloud Public.
For on-prem COS, do not make a selection from this list, hit enter
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / US Cross Region Standard
   \ "us-standard"
 2 / US Cross Region Vault
   \ "us-vault"
 3 / US Cross Region Cold
   \ "us-cold"
 4 / US Cross Region Flex
   \ "us-flex"
 5 / US East Region Standard
   \ "us-east-standard"
 6 / US East Region Vault
   \ "us-east-vault"
... ...
location_constraint>
Canned ACL used when creating buckets and storing or copying objects.

This ACL is used for creating objects and if bucket_acl isn't set, for creating buckets too.

For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl

Note that this ACL is applied when server side copying objects as S3
doesn't copy the ACL from the source but rather writes a fresh one.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Owner gets FULL_CONTROL. No one else has access rights (default). This acl is available on IBM Cloud (Infra), IBM Cloud (Storage), On-Premise COS
   \ "private"
 2 / Owner gets FULL_CONTROL. The AllUsers group gets READ access. This acl is available on IBM Cloud (Infra), IBM Cloud (Storage), On-Premise IBM COS
   \ "public-read"
 3 / Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. This acl is available on IBM Cloud (Infra), On-Premise IBM COS
   \ "public-read-write"
 4 / Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access. Not supported on Buckets. This acl is available on IBM Cloud (Infra) and On-Premise IBM COS
   \ "authenticated-read"
acl> 2
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[icos-test]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = other-v2-signature
endpoint = s3.private.eu-de.cloud-object-storage.appdomain.cloud
acl = public-read
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
icos-test            s3

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>

完成rclone设定,上述配置会保存在rclone.conf, 如果是多个客户端,可将配置文件部署在其他linux主机上,不用一一设定。

[root@centos-s3fs ~]# cat .config/rclone/rclone.conf
[icos-test]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = other-v2-signature
endpoint = s3.private.eu-de.cloud-object-storage.appdomain.cloud
acl = public-read

简单测试一个100G文件上传

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold

[root@centos-s3fs ~]# rclone lsd icos-test:
          -1 2020-12-18 03:52:07        -1 eu-de-cold
          -1 2020-09-09 03:30:40        -1 liutao-cos
          -1 2020-11-25 12:56:39        -1 mariadb-backup
          -1 2020-05-21 13:52:16        -1 video-on-demand

[root@centos-s3fs ~]# rclone ls icos-test:eu-de-cold
107374182400 100G.file

[root@centos-s3fs ~]# rclone delete icos-test:eu-de-cold/100G.file

如果当前linux主机同时运行着其他业务,rclone势必会争抢部分网卡出站资源,数据传送的带宽控制可通过以下三个参数实现:

  • –s3-chunk-size=16M # 上传分段chunk size
  • –s3-upload-concurrency=10 # 上传连接并发数
  • –bwlimit=“08:00,20M 12:00,30M 13:00,50M 18:00,80M 23:00,off” #带宽定时调整

rclone-test-1:

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=52M --s3-upload-concurrency=15

# 这里因为机器资源限制,虽然我们设置15并发,但实际系统只能接受12
[root@centos-s3fs ~]# netstat -anp |grep 10.1.129.58 | wc -l 
12

52M_chunk_size + 15_concurrency , 传送速率可以达到平均220MB/s,基本当前实例的极限
IBM云对象存储 - Linux主机通过rclone和COS API上传大文件
rclone-test-2:

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=16M --s3-upload-concurrency=10

#并发连接降至10,检查发现有10个传送连接
[root@centos-s3fs ~]# netstat -anp |grep 10.1.129.58 | wc -l 
10

16M_chunk_size + 10_concurrency , 传送降低至120MB/s,通过这两个参数基本结合文件大小,即可定位当前实例的最佳实践
IBM云对象存储 - Linux主机通过rclone和COS API上传大文件
rclone-test-3:

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=16M --s3-upload-concurrency=10 --bwlimit "08:00,20M 12:00,30M 13:00,50M 18:00,80M 23:00,off"
2020/12/19 12:44:33 NOTICE: Scheduled bandwidth change. Limit set to 30MBytes/s

可以看到带宽速率因为当前时间段的设定,因此降到30MB/s左右

Interface        RX      TX    12:50:11
eth0 	 416.318KB/s   32.2685MB/s
Interface        RX      TX    12:50:12
eth0 	 450.342KB/s   31.9155MB/s

2. ICOS API

IBM COS提供非常完整的S3 API,常见开发语言比如java, python, node.js, GO 等都有对应的SDK工具包,开发人员可轻松上手
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-sdk-about
IBM云对象存储 - Linux主机通过rclone和COS API上传大文件
访问IBM github站点安装COS python SDK,https://github.com/IBM/ibm-cos-sdk-python,下面是IBM在线文档提供的code样本,只用把其中的cos端点,api key以及服务的 CRN匹配已有环境的设定即可。

import ibm_boto3
from ibm_botocore.client import Config, ClientError

# Constants for IBM COS values
COS_ENDPOINT = "https://s3.private.us-south.cloud-object-storage.appdomain.cloud"
COS_API_KEY_ID = "xxx"
COS_INSTANCE_CRN = "xxx"

def upload_large_file(bucket_name, item_name, file_path):
    print("Starting large file upload for {0} to bucket: {1}".format(item_name, bucket_name))

    # set the chunk size to 5 MB
    part_size = 1024 * 1024 * 5

    # set threadhold to 5 MB
    file_threshold = 1024 * 1024 * 5

    # Create client connection
    cos_cli = ibm_boto3.client("s3",
        ibm_api_key_id=COS_API_KEY_ID,
        ibm_service_instance_id=COS_INSTANCE_CRN,
        config=Config(signature_version="oauth"),
        endpoint_url=COS_ENDPOINT
    )
    
    # set the transfer threshold and chunk size in config settings
    transfer_config = ibm_boto3.s3.transfer.TransferConfig(
        multipart_threshold=file_threshold,
        multipart_chunksize=part_size
    )

    # create transfer manager
    transfer_mgr = ibm_boto3.s3.transfer.TransferManager(cos_cli, config=transfer_config)

    try:
        # initiate file upload
        future = transfer_mgr.upload(file_path, bucket_name, item_name)

        # wait for upload to complete
        future.result()

        print ("Large file upload complete!")
    except Exception as e:
        print("Unable to complete large file upload: {0}".format(e))
    finally:
        transfer_mgr.shutdown()
        
def main()
    upload_large_file('iso-image-bucket', '100G.file', '/data/100G.file' )

if __name__ == "__main__":
    main()

执行上传,如果带宽过大,可以通过设置 part_size和file_threshold来调整传送速度

[root@centos-s3fs ~]# python upload_large_file.py
Starting large file upload for 100G.file to bucket: iso-image-bucket
Large file upload complete!

上传成功!!!
IBM云对象存储 - Linux主机通过rclone和COS API上传大文件

总结:
除了上述方式以外,IBM COS同时也支持很多s3兼容工具,类似cyberduck,tntdrive, cloudberry, s3browser,包括命令行工具aws s3cli等, 而且如果客户有浏览器的工作站,无需安装任何工具,自带的免费Aspera浏览器插件进行公网跨区域传输文件,相对三方工具传送质量和速度都很有保障

Happy Learning ! :)

本文地址:https://blog.csdn.net/weixin_42599323/article/details/111412387