使用idea, sparksql读取hive中的数据
程序员文章站
2022-07-14 15:12:59
...
- 将hive下的conf的hive-site.xml配置文件放在resources下;
- 在应用 pom.xml中配置jar;
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
3.代码:
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName(this.getClass.getSimpleName)
.master("local[2]")
.config("dfs.client.use.datanode.hostname", "true") //以域名的方式返回 访问 相互通信
.enableHiveSupport() //启动hive读取配置文件中的
.getOrCreate()
val sql = "show tables"
val sql2 ="select * from employee"
spark.sql(sql2).show()
spark.table("employee").show()
spark.stop();
}
- 结果:
+-------+----+
|user_id|name|
+-------+----+
| 1|小林|
+-------+----+
- hive-site.xml 配置
<configuration>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://hadoop001:9000/tmp/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop001:3306/hivedb?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://hadoop001:9000/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>javax.jdo.option.Multithreaded</name>
<value>true</value>
</property>
</configuration>
5.异常整理:
- 5.1 config(“dfs.client.use.datanode.hostname”, “true”)配置
19/10/17 16:30:40 WARN DFSClient: Failed to connect to /192.168.0.3:50010 for block BP-744454093-192.168.0.3-1567066072363:blk_1073742661_1838, add to deadNodes and continue.
java.net.ConnectException: Connection timed out: no further information
- 5.2 pom.xml中插入jar spark-hive_2.11.jar
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.