欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

CDH5.0.2升级至CDH5.2.0

程序员文章站 2024-01-19 11:06:40
...
升级需求
1.为支持spark kerberos安全机制
2.为满足impala trunc函数
3.为解决impala import时同时query导致impala hang问题
升级步骤
参考http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/installation_upgrade.html
优先升级cloudera manager,再升级cdh
1.准备工作:
统一集群root密码,需要运维帮忙操作下
agent自动重启关闭
事先下载好parcals包

2.CM升级
登录cmserver安装的主机,执行命令:
cat /etc/cloudera-scm-server/db.properties
备份CM数据:
  pg_dump -U scm -p 7432   > scm_server_db_backup.bak
  检查/tmp下是否有文件生成,期间保证tmp下文件不要被删除。
停止CM server :
  sudo service cloudera-scm-server stop
停止CM server依赖的数据库:
  sudo service cloudera-scm-server-db stop
如果这台CM server上有agent在运行也停止:
  sudo service cloudera-scm-agent stop
修改yum的 cloudera-manager.repo文件:
  sudo vim /etc/yum.repos.d/cloudera-manager.repo
   [cloudera-manager]
       # Packages for Cloudera Manager, Version 5, on RedHat or CentOS 6 x86_64
       name=Cloudera Manager
       baseurl=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/
       gpgkey = http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
       gpgcheck = 1
安装:
  sudo
      yum clean all
      sudo yum upgrade 'cloudera-*'
检查:
  rpm  -qa 'cloudera-manager-*'
启动CM server 数据库:
  sudo  service cloudera-scm-server-db start
启动CM server:
  sudo  service cloudera-scm-server start
登录http://172.20.0.83:7180/
  安装agent(步骤略)
升级如果升级jdk,会改变java_home路径,导致java相关服务不可用,需要重新配置java_home
升级CM后需要重启CDH。
3.CDH升级
停止集群所有服务
备份namenode元数据:
  进入namenode dir,执行:
   tar -cvf /root/nn_backup_data.tar ./*
下载parcels
分发包->激活包->关闭(非重启)
开启zk服务
进入HDFS服务->升级hdfs metadata
  namenode上启动元数据
  启动剩余HDFS角色
  namenode响应RPC
  HDFS退出安全模式
备份hive metastore数据库
  mysqldump -h172.20.0.67 -ucdhhive -p111111 cdhhive > /tmp/database-backup.sql
进入hive服务->更新hive metastore
     database scheme
更新oozie sharelib:oozie->install
     oozie share lib
  创建 oozie user
      sharelib
  创建 oozie user
      Dir
更新sqoop:进入sqoop服务->update
     sqoop
  更新sqoop2 server
更新spark(略,可先卸载原来版本,升级后直接安装新版本)
启动集群所有服务:zk->hdfs->spark->flume->hbase->hive->impala->oozie->sqoop2->hue
分发客户端文件:deploy client
     configuration
  deploy hdfs client configuration
  deploy spark client configuration
  deploy hbase client configuration
  deploy yarn client configuration
  deploy hive client configuration
删除老版本包:
  sudo  yum remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client
启动agent:
  sudo service cloudera-scm-agent restart
HDFS
  metadata update
  hdfs server->instance->namenode=>action->Finalize
      Metadata Upgrade

升级过程遇主要问题:
com.cloudera.server.cmf.FeatureUnavailableException: The feature Navigator Audit Server is not available.
        at com.cloudera.server.cmf.components.LicensedFeatureManager.check(LicensedFeatureManager.java:49)
        at com.cloudera.server.cmf.components.OperationsManagerImpl.setConfig(OperationsManagerImpl.java:1312)
        at com.cloudera.server.cmf.components.OperationsManagerImpl.setConfigUnsafe(OperationsManagerImpl.java:1352)
        at com.cloudera.api.dao.impl.ManagerDaoBase.updateConfigs(ManagerDaoBase.java:264)
        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateConfigsHelper(RoleConfigGroupManagerDaoImpl.java:214)
        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateRoleConfigGroup(RoleConfigGroupManagerDaoImpl.java:97)
        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateRoleConfigGroup(RoleConfigGroupManagerDaoImpl.java:79)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.cloudera.api.dao.impl.ManagerDaoBase.invoke(ManagerDaoBase.java:208)
        at com.sun.proxy.$Proxy82.updateRoleConfigGroup(Unknown Source)
        at com.cloudera.api.v3.impl.RoleConfigGroupsResourceImpl.updateRoleConfigGroup(RoleConfigGroupsResourceImpl.java:69)
        at com.cloudera.api.v3.impl.MgmtServiceResourceV3Impl$RoleConfigGroupsResourceWrapper.updateRoleConfigGroup(MgmtServiceResourceV3Impl.java:54)
        at com.cloudera.cmf.service.upgrade.RemoveBetaFromRCG.upgrade(RemoveBetaFromRCG.java:80)
        at com.cloudera.cmf.service.upgrade.AbstractApiAutoUpgradeHandler.upgrade(AbstractApiAutoUpgradeHandler.java:36)
        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgradesForOneVersion(AutoUpgradeHandlerRegistry.java:233)
        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgrades(AutoUpgradeHandlerRegistry.java:167)
        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgrades(AutoUpgradeHandlerRegistry.java:138)
        at com.cloudera.server.cmf.Main.run(Main.java:587)
        at com.cloudera.server.cmf.Main.main(Main.java:198)
2014-11-26 03:17:42,891 INFO ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloade
原先版本使用了60天试用企业版本,该期限已经过期,升级时Navigator服务启动不了,导致整个cloduera manager server启动失败
升级后问题
a.升级后flume原先提供的第三方jar丢失,需要将包重新放在/opt....下
b.sqoop导入mysql的驱动包找不到,需要将包重新放在/opt....下
c.hbase服务异常
Unhandled exception. Starting shutdown.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User hbase/ip-10-1-33-20.ec2.internal@YEAHMOBI.COM (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.hdfs.protocol.ClientProtocol, expected client Kerberos principal is null
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:594)
at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2224)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:993)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:977)
at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:432)
at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:851)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:435)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:127)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:789)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
at java.lang.Thread.run(Thread.java:744)
通过cm将safe配置文件里的hbase.rpc.engine org.apache.hadoop.hbase.ipc.SecureRpcEngine去掉后重启成功。
后来发现是cm server的问题,之前修改了一个hostname,cloudera manager server未重启,重启后,加入该配置重启hbase不会有问题。
d.service monitor,zookeeper也有警告,其他服务都有部分红色异常
Exception in scheduled runnable.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at com.cloudera.cmon.firehose.polling.CdhTask.checkClientConfigs(CdhTask.java:712)
at com.cloudera.cmon.firehose.polling.CdhTask.updateCacheIfNeeded(CdhTask.java:675)
at com.cloudera.cmon.firehose.polling.FirehoseServicesPoller.getDescriptorAndHandleChanges(FirehoseServicesPoller.java:615)
at com.cloudera.cmon.firehose.polling.FirehoseServicesPoller.run(FirehoseServicesPoller.java:179)
at com.cloudera.enterprise.PeriodicEnterpriseService$UnexceptionablePeriodicRunnable.run(PeriodicEnterpriseService.java:67)
at java.lang.Thread.run(Thread.java:745)
后来发现是cm server的问题,之前修改了一个hostname,cloudera manager server未重启,重启后,加入该配置重启hbase不会有问题。
e.mapreduce访问安全机制下的hbase失败
去除client hbase-site safe配置文件内容:hbase.rpc.protection privacy,旧版本中必须加此配置,而新版本文档中也提到需要加此配置,但经过测试加此配置后报如上异常。
14/11/27 12:38:26 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-1-33-24.ec2.internal/10.1.33.24:2181, initiating session
14/11/27 12:38:26 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-1-33-24.ec2.internal/10.1.33.24:2181, sessionid = 0x549ef6088f20309, negotiated timeout = 60000
14/11/27 12:38:41 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM
14/11/27 12:38:55 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM
14/11/27 12:39:15 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM
14/11/27 12:39:34 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM
14/11/27 12:39:55 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM
14/11/27 12:40:19 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM
14/11/27 12:40:36 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM


Caused by: java.io.IOException: Couldn't setup connection for hbase/ip-10-1-33-20.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-32.ec2.internal@YEAHMOBI.COM
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$1.run(RpcClient.java:821)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:796)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:898)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:30014)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1623)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:93)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:90)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
... 31 more
Caused by: javax.security.sasl.SaslException: No common protection layer between client and server
at com.sun.security.sasl.gsskerb.GssKrb5Client.doFinalHandshake(GssKrb5Client.java:252)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:187)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:210)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:770)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:357)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:891)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:888)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:888)
... 40 more
<property>
     <name>hbase.rpc.engine</name>
     <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
mr中使用http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-Guide/cdh5ig_mapreduce_hbase.html  TableMapReduceUtil.addDependencyJars(job);方式加载。
并且使用user api加入例如:
hbase.master.kerberos.principal=hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM
hbase.keytab.path=/home/dev/1015q.keytab
f.升级后impala jdbc安全机制下不可用
java.sql.SQLException: Could not open connection to jdbc:hive2://ip-10-1-33-22.ec2.internal:21050/ym_system;principal=impala/ip-10-1-33-22.ec2.internal@YEAHMOBI.COM: GSS initiate failed
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:187)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:164)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at com.cloudera.example.ClouderaImpalaJdbcExample.main(ClouderaImpalaJdbcExample.java:37)
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:297)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:185)
... 5 more
解决:
hadoop-auth-2.5.0-cdh5.2.0.jar
hive-shims-common-secure-0.13.1-cdh5.2.0.jar
两个包回退版本即可
相关标签: cloudera upgrade