IBM云对象存储 - Linux主机通过rclone和COS API上传大文件
云对象存储作为主流公有云数据存储服务已大规模应用,但其基于HTTP/HTTPs协议(RESTful API)、扁平数据结构和网络依赖等特性,在某些文件归档和备份场景中,通过类似s3fs转成文件系统挂载使用时,或多或少有些限制,比如大文件传送或高频繁I/O ,这里以IBM Cloud Object Storage为例,我们尝试通过rclone和ICOS API方式来实现较为稳定的文件传送和带宽控制。
1. Rclone
IBM Cloud官方配置连接文档:
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-rclone
[root@centos-s3fs ~]# yum install -y unzip
[root@centos-s3fs ~]# curl https://rclone.org/install.sh | sudo bash
...
rclone v1.53.3 has successfully installed.
Now run "rclone config" for setup. Check https://rclone.org/docs/ for more details.
[root@centos-s3fs ~]# rclone config
2020/12/19 09:55:26 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> icos-test
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / 1Fichier
\ "fichier"
2 / Alias for an existing remote
\ "alias"
3 / Amazon Drive
\ "amazon cloud drive"
4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
\ "s3"
... ...
Storage> 4
** See help for s3 backend at: https://rclone.org/s3/ **
Choose your S3 provider.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Amazon Web Services (AWS) S3
\ "AWS"
2 / Alibaba Cloud Object Storage System (OSS) formerly Aliyun
\ "Alibaba"
3 / Ceph Object Storage
\ "Ceph"
4 / Digital Ocean Spaces
\ "DigitalOcean"
5 / Dreamhost DreamObjects
\ "Dreamhost"
6 / IBM COS S3
\ "IBMCOS"
... ...
provider> 6
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
Only applies if access_key_id and secret_access_key is blank.
Enter a boolean value (true or false). Press Enter for the default ("false").
Choose a number from below, or type in your own value
1 / Enter AWS credentials in the next step
\ "false"
2 / Get AWS credentials from the environment (env vars or IAM)
\ "true"
env_auth> 1
AWS Access Key ID.
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
access_key_id> xxxxxxxx
AWS Secret Access Key (password)
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
secret_access_key> xxxxxxxxxx
Region to connect to.
Leave blank if you are using an S3 clone and you don't have a region.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Use this if unsure. Will use v4 signatures and an empty region.
\ ""
2 / Use this only if v4 signatures don't work, eg pre Jewel/v10 CEPH.
\ "other-v2-signature"
region> 2
Endpoint for IBM COS S3 API.
Specify if using an IBM COS On Premise.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / US Cross Region Endpoint
\ "s3.us.cloud-object-storage.appdomain.cloud"
2 / US Cross Region Dallas Endpoint
\ "s3.dal.us.cloud-object-storage.appdomain.cloud"
3 / US Cross Region Washington DC Endpoint
\ "s3.wdc.us.cloud-object-storage.appdomain.cloud"
4 / US Cross Region San Jose Endpoint
\ "s3.sjc.us.cloud-object-storage.appdomain.cloud"
5 / US Cross Region Private Endpoint
\ "s3.private.us.cloud-object-storage.appdomain.cloud"
... ...
endpoint> s3.private.eu-de.cloud-object-storage.appdomain.cloud
Location constraint - must match endpoint when using IBM Cloud Public.
For on-prem COS, do not make a selection from this list, hit enter
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / US Cross Region Standard
\ "us-standard"
2 / US Cross Region Vault
\ "us-vault"
3 / US Cross Region Cold
\ "us-cold"
4 / US Cross Region Flex
\ "us-flex"
5 / US East Region Standard
\ "us-east-standard"
6 / US East Region Vault
\ "us-east-vault"
... ...
location_constraint>
Canned ACL used when creating buckets and storing or copying objects.
This ACL is used for creating objects and if bucket_acl isn't set, for creating buckets too.
For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl
Note that this ACL is applied when server side copying objects as S3
doesn't copy the ACL from the source but rather writes a fresh one.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / Owner gets FULL_CONTROL. No one else has access rights (default). This acl is available on IBM Cloud (Infra), IBM Cloud (Storage), On-Premise COS
\ "private"
2 / Owner gets FULL_CONTROL. The AllUsers group gets READ access. This acl is available on IBM Cloud (Infra), IBM Cloud (Storage), On-Premise IBM COS
\ "public-read"
3 / Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. This acl is available on IBM Cloud (Infra), On-Premise IBM COS
\ "public-read-write"
4 / Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access. Not supported on Buckets. This acl is available on IBM Cloud (Infra) and On-Premise IBM COS
\ "authenticated-read"
acl> 2
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[icos-test]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = other-v2-signature
endpoint = s3.private.eu-de.cloud-object-storage.appdomain.cloud
acl = public-read
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
icos-test s3
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>
完成rclone设定,上述配置会保存在rclone.conf, 如果是多个客户端,可将配置文件部署在其他linux主机上,不用一一设定。
[root@centos-s3fs ~]# cat .config/rclone/rclone.conf
[icos-test]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = other-v2-signature
endpoint = s3.private.eu-de.cloud-object-storage.appdomain.cloud
acl = public-read
简单测试一个100G文件上传
[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold
[root@centos-s3fs ~]# rclone lsd icos-test:
-1 2020-12-18 03:52:07 -1 eu-de-cold
-1 2020-09-09 03:30:40 -1 liutao-cos
-1 2020-11-25 12:56:39 -1 mariadb-backup
-1 2020-05-21 13:52:16 -1 video-on-demand
[root@centos-s3fs ~]# rclone ls icos-test:eu-de-cold
107374182400 100G.file
[root@centos-s3fs ~]# rclone delete icos-test:eu-de-cold/100G.file
如果当前linux主机同时运行着其他业务,rclone势必会争抢部分网卡出站资源,数据传送的带宽控制可通过以下三个参数实现:
- –s3-chunk-size=16M # 上传分段chunk size
- –s3-upload-concurrency=10 # 上传连接并发数
- –bwlimit=“08:00,20M 12:00,30M 13:00,50M 18:00,80M 23:00,off” #带宽定时调整
rclone-test-1:
[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=52M --s3-upload-concurrency=15
# 这里因为机器资源限制,虽然我们设置15并发,但实际系统只能接受12
[root@centos-s3fs ~]# netstat -anp |grep 10.1.129.58 | wc -l
12
52M_chunk_size + 15_concurrency , 传送速率可以达到平均220MB/s,基本当前实例的极限
rclone-test-2:
[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=16M --s3-upload-concurrency=10
#并发连接降至10,检查发现有10个传送连接
[root@centos-s3fs ~]# netstat -anp |grep 10.1.129.58 | wc -l
10
16M_chunk_size + 10_concurrency , 传送降低至120MB/s,通过这两个参数基本结合文件大小,即可定位当前实例的最佳实践
rclone-test-3:
[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=16M --s3-upload-concurrency=10 --bwlimit "08:00,20M 12:00,30M 13:00,50M 18:00,80M 23:00,off"
2020/12/19 12:44:33 NOTICE: Scheduled bandwidth change. Limit set to 30MBytes/s
可以看到带宽速率因为当前时间段的设定,因此降到30MB/s左右
Interface RX TX 12:50:11
eth0 416.318KB/s 32.2685MB/s
Interface RX TX 12:50:12
eth0 450.342KB/s 31.9155MB/s
2. ICOS API
IBM COS提供非常完整的S3 API,常见开发语言比如java, python, node.js, GO 等都有对应的SDK工具包,开发人员可轻松上手
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-sdk-about
访问IBM github站点安装COS python SDK,https://github.com/IBM/ibm-cos-sdk-python,下面是IBM在线文档提供的code样本,只用把其中的cos端点,api key以及服务的 CRN匹配已有环境的设定即可。
import ibm_boto3
from ibm_botocore.client import Config, ClientError
# Constants for IBM COS values
COS_ENDPOINT = "https://s3.private.us-south.cloud-object-storage.appdomain.cloud"
COS_API_KEY_ID = "xxx"
COS_INSTANCE_CRN = "xxx"
def upload_large_file(bucket_name, item_name, file_path):
print("Starting large file upload for {0} to bucket: {1}".format(item_name, bucket_name))
# set the chunk size to 5 MB
part_size = 1024 * 1024 * 5
# set threadhold to 5 MB
file_threshold = 1024 * 1024 * 5
# Create client connection
cos_cli = ibm_boto3.client("s3",
ibm_api_key_id=COS_API_KEY_ID,
ibm_service_instance_id=COS_INSTANCE_CRN,
config=Config(signature_version="oauth"),
endpoint_url=COS_ENDPOINT
)
# set the transfer threshold and chunk size in config settings
transfer_config = ibm_boto3.s3.transfer.TransferConfig(
multipart_threshold=file_threshold,
multipart_chunksize=part_size
)
# create transfer manager
transfer_mgr = ibm_boto3.s3.transfer.TransferManager(cos_cli, config=transfer_config)
try:
# initiate file upload
future = transfer_mgr.upload(file_path, bucket_name, item_name)
# wait for upload to complete
future.result()
print ("Large file upload complete!")
except Exception as e:
print("Unable to complete large file upload: {0}".format(e))
finally:
transfer_mgr.shutdown()
def main()
upload_large_file('iso-image-bucket', '100G.file', '/data/100G.file' )
if __name__ == "__main__":
main()
执行上传,如果带宽过大,可以通过设置 part_size和file_threshold来调整传送速度
[root@centos-s3fs ~]# python upload_large_file.py
Starting large file upload for 100G.file to bucket: iso-image-bucket
Large file upload complete!
上传成功!!!
总结:
除了上述方式以外,IBM COS同时也支持很多s3兼容工具,类似cyberduck,tntdrive, cloudberry, s3browser,包括命令行工具aws s3cli等, 而且如果客户有浏览器的工作站,无需安装任何工具,自带的免费Aspera浏览器插件进行公网跨区域传输文件,相对三方工具传送质量和速度都很有保障
Happy Learning ! :)
本文地址:https://blog.csdn.net/weixin_42599323/article/details/111412387