7. 使用sql处理json数据

在impala和hive中都有get_json_object函数

当表中有json数据需要处理,提取出其中的某个键对应的值的时候,可以使用get_json_object()函数来处理

7.1. IMPALA

  • 解析单个json字符串

    select get_json_object('{"a":"123", "b": "456"}', '$.a')
    -- 123
    
    select get_json_object('{"a":"123", "b": "456"}', '$.b')
    -- 456
    
  • 解析多个json字符串

    select get_json_object('[{"a":"123", "b": "456"},{"a":"23", "b": "56"}]', '$[0].a')
    -- 123
    
    select get_json_object('[{"a":"123", "b": "456"},{"a":"23", "b": "56"}]', '$[1].a')
    -- 23
    
  • 当 json 字符串中存在 $ 标志符,可用 replace 先替换成空,再进行值的提取

    select get_json_object(replace('{"$a":"123", "$b": "456"}', '$', ''), '$.a')
    -- 123
    

7.2. HIVE

有这样一个json字符串

data =
{
 "store":
        {
         "fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}],  
         "bicycle":{"price":19.95,"color":"red"}
         }, 
 "email":"amy@only_for_json_udf_test.net", 
 "owner":"amy" 
}
  • 解析单层值

    select get_json_object(data, '$.owner') from test;
    结果:amy
    
  • 解析多层值

    select get_json_object(data, '$.store.bicycle.price') from test;
    结果:19.95
    
  • 取出数组值

    select get_json_object(data, '$.store.fruit[0]') from test;
    结果:{"weight":8,"type":"apple"}
    

7.3. SPARK

初次之外,spark还有别的处理json的函数,详见官网

-- get_json_object
SELECT get_json_object('{"a":"b"}', '$.a');
+-------------------------------+
|get_json_object({"a":"b"}, $.a)|
+-------------------------------+
|                              b|
+-------------------------------+