Sqoop将数据从oracle导入到hive时，数据错位的问题解决

程序员文章站 2022-03-08 23:05:40

在使用sqoop将数据从oracle导入hive时，对数据进行验证，发现hive中的数据比oracle的多，然后发现多出来的数据严重错位，大量字段为null 怀疑是某些字段含有了hive默认的分隔符...

在使用sqoop将数据从oracle导入hive时，对数据进行验证，发现hive中的数据比oracle的多，然后发现多出来的数据严重错位，大量字段为null

怀疑是某些字段含有了hive默认的分隔符，如“\n”，“\r”，“\01”

解决办法是增加参数--hive-drop-import-delims来解决

hive will have problems using sqoop-imported data if your database’s rows contain string fields that have hive’s default row delimiters (\nand\rcharacters) or column delimiters (\01characters) present in them. you can use the--hive-drop-import-delimsoption to drop those characters on import to give hive-compatible text data. alternatively, you can use the--hive-delims-replacementoption to replace those characters with a user-defined string on import to give hive-compatible text data. these options should only be used if you use hive’s default delimiters and should not be used if different delimiters are specified.

可以从中看出，“\n”、“\r”是hive默认的行分隔符，而“\01”则是hive默认的列分隔符，如果数据中含有这些分隔符，在不处理的情况下，hive就会对数据进行错误的分割，造成数据错位以及数据增多的问题，解决办法就是

第一种，使用--hive-drop-import-delims 来去除这些字符

第二种，使用--hive-delims-replacement来替换这些字符

上一篇： oracle关于查询的习题

下一篇： Oracle数据库语句汇总

Sqoop将数据从oracle导入到hive时，数据错位的问题解决

使用pyspark模仿sqoop从oracle导数据到hive的主要功能（自动建表，分区导入，增量，解决数据换行符问题）