How to manage/handle schema changes while loading JSON file into BigQuery table(如何在将 JSON 文件加载到 BigQuery 表中时管理/处理架构更改)
问题描述
这是我的输入文件的样子:
Here is how my input file looks like:
{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}
{"Id": 2, "Address": {"City":"Mumbai"}}
{"Id": 3, "Address": {"Street":"XYZ Road"}}
{"Id": 4}
{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}
在我的数据流管道中,我如何动态确定每行中存在哪些字段以符合 BigQuery 表架构.例如,在第 2 行中,缺少 Street.我希望 BigQuery 中列 Address.Street 的条目为 "N/A" 或 null 并且不希望管道失败由于架构更改或缺少数据.
In my dataflow pipeline, How I can I dynamically determine which fields are present in each row in order to adhere to the BigQuery table schema.
e.g., In row #2, Street is missing. I want the entry for column Address.Street in the BigQuery to be "N/A" or null and don't want pipeline to fail because of schema change or missing data.
在使用 Python 写入 BigQuery 之前,如何在我的数据流作业中处理此逻辑?
How can I handle this logic in my dataflow job before writing to BigQuery in Python?
推荐答案
我建议将您的数据写入临时表,其中只有一个 line 类型为 string 的字段
I recommend writing your data into temp table with just one field line of type string
完成将数据导入 BigQuery 临时表后 - 现在您可以应用架构逻辑并将临时表中的数据查询到最终表中
After you done with bringing your data to BigQuery temp table - now you can apply schema logic and query your data out of temp table to your final table
以下示例是 BigQuery 标准 SQL,说明如何对一个字段中包含整行的表应用架构逻辑
Below example is for BigQuery Standard SQL of how to apply schema logic against table with whole row in one field
#standardSQL
WITH t AS (
SELECT '{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}' line UNION ALL
SELECT '{"Id": 2, "Address": {"City":"Mumbai"}}' UNION ALL
SELECT '{"Id": 3, "Address": {"Street":"XYZ Road"}}' UNION ALL
SELECT '{"Id": 4} ' UNION ALL
SELECT '{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}'
)
SELECT
JSON_EXTRACT_SCALAR(line, '$.Id') id,
JSON_EXTRACT_SCALAR(line, '$.PhoneNumber') PhoneNumber,
JSON_EXTRACT_SCALAR(line, '$[Address].Street') Street,
JSON_EXTRACT_SCALAR(line, '$[Address].City') City
FROM t
结果如下
Row id PhoneNumber Street City
1 1 null MG Road Pune
2 2 null null Mumbai
3 3 null XYZ Road null
4 4 null null null
5 5 12345678 ABCD Road Bangalore
这篇关于如何在将 JSON 文件加载到 BigQuery 表中时管理/处理架构更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在将 JSON 文件加载到 BigQuery 表中时管理/处理
基础教程推荐
- 尝试制作WhatsApp机器人 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
