Reading parquet backed tables in Arcadia based on column names instead of position

Reading parquet files by default expects that each column in each file will remain in the same index/position. In cases when the columns are not in the same index/position you might see an error like this:

File 'schema_evolution.db/t2/45331705_data.0.parq'
has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]

In those cases where the column index/position in the parquet files can’t be guaranteed, you can add a run time flag to Arcadia Analytics Engine to instead read Parquet file columns based on name instead of index/position by using the PARQUET_FALLBACK_SCHEMA_RESOLUTION option.

Below is an example of how to apply this setting Cloudera Manager:

Below is an example of how to apply this setting in Ambari:

Click here for more information on this option.

1 Like

You can also set this within Arcadia UI console as connection setting as described here: