Why are timestamps date/time in Hive appearing different than Arcadia?

This usually has to do with the way HIVE writes timestamp vs how ARCENGINE (or another engine like IMPALA) read & interpret.

When Hive stores a timestamp value into Parquet format, it converts local time into UTC time, and when it reads data out, it converts back to local time. Arcengine & Impala, however on the other hand, do no conversion when reading the timestamp field out, hence, UTC time is returned instead of local time.

You can confirm this if you run the same query in Hue using Impala. You will see the same date offset. And can you confirm the data is being stored using Hive into parquet.

For more info refer to to this: https://stackoverflow.com/questions/33556226/impala-timestamps-dont-match-hive-a-timezone-issue

One option is to set in Arcengine / Impala two safety startup flags:

  1. use_local_tz_for_unix_timestamp_conversions=true
  2. convert_legacy_hive_parquet_utc_timestamps=true

Other option, you can force Hive when writing files to not perform the conversion, or to simply write the field as a string in standard time format YYYY-MM-DD HH:MM… By being string type, will prevent any conversion from being applied on either the write or read path.

1 Like