场景描述

最近在搞大数据的工作,有时候写代码需要提前在本地做验证,比如:数据格式。

本地验证麻烦的地方有三点:

  1. 本地代码连大数据集群,xml 文件怎么配置。
  2. kerberos 认证怎么做。
  3. java 依赖包的不兼容。

最好的办法就是踩一次坑,把那些细节的东西都记录下来,而且记录的标准是别人看了要能复现出来。

代码展示

代码内容很简单,java 使用 HiveMetaStoreClient 读取 Hive 仓库的分区信息。并且配置 kerberos 认证。

依赖:

<repositories>
  <repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  </repository>
</repositories>

<dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.4.0-cdh6.3.3</version>
    </dependency>
  </dependencies>

代码

package org.example;

import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
import org.apache.hadoop.hive.metastore.api.FieldSchema;
import org.apache.hadoop.hive.metastore.api.Partition;
import org.apache.hadoop.hive.metastore.api.Table;
import org.apache.hadoop.security.UserGroupInformation;

import java.io.IOException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;


public class App {
    public static void main(String[] args) throws IOException {
        test();
    }

    public static void test() throws IOException {

        String keyUser = "test@EXAMPLE.COM";
        String keyPath = "/Users/Documents/EnvFile/CDH6.3.2/test.keytab";

        System.setProperty("java.security.krb5.conf", "/Users/Documents/EnvFile/CDH6.3.2/krb5.conf");
        System.setProperty("krb.principal", keyUser);

        HiveConf configuration = new HiveConf();
        configuration.addResource("/Users/Documents/utils/hive-conf/hive-site.xml");
        configuration.set("hadoop.security.authentication", "kerberos");
        configuration.set("kerberos.principal", keyUser);

        UserGroupInformation.setConfiguration(configuration);
        UserGroupInformation.loginUserFromKeytab(keyUser, keyPath);


        try {
            HiveMetaStoreClient client = new HiveMetaStoreClient(configuration);
            List<String> allDatabases = client.getAllDatabases();
            for(String name: allDatabases){
                System.out.println(name);
            }
            System.out.println("============ 查分区字段 ========");
            Table table = client.getTable("default", "test_partition");
            List<FieldSchema> partitionColumns = table.getPartitionKeys();
            List<String> partitionKey = partitionColumns.stream().map(FieldSchema::getName).collect(Collectors.toList());
            for(FieldSchema item : partitionColumns){
                System.out.println(item.getName());
            }
            System.out.println("=========== 查分区字段的值 ========");
            List<Partition> partitions = client.listPartitions("default", "test_partition", (short) -1);

            List<Map<String,String>> result = new ArrayList<>();
            // 组合
            for(Partition item : partitions){
                Map<String,String> kv =  new LinkedHashMap<>();
                List<String> values = item.getValues();
                for(int i=0;i<values.size();i++){
                    kv.put(partitionKey.get(i),values.get(i));
                }
                result.add(kv);
            }
            System.out.println("============result========");
            for (Map<String,String> item: result){
                System.out.println(item);
            }
            
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

result 中的内容

{year=2000, month=1}
{year=2009, month=1}
{year=2010, month=1}
{year=2010, month=2}
{year=2011, month=1}
{year=2012, month=1}
{year=2022, month=1}
{year=2022, month=7}
{year=2023, month=6}
{year=2023, month=7}
{year=2023, month=9}
Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐