Analytical view on rolling time window

Arcadia can support Analytical Views on rolling time window. This is useful in situations where the underlying base table has a large history like 2 years, but the end users are limited to only viewing the most recent 3 or 6 months. So you want to build an analytical view that focuses on that shorted time period (that way it’s smaller in size) and still ensure incremental updates.

The solution to this requires a few key steps:

  1. A Logical View is required to limit the data to the time window of interest. Logical view definition needs to have time partition predicate in the WHERE clause. This will need to use the time partition key that matches your underlying base table.

  2. The Analytical View definition needs to have partition keys that map to all the ones in the underlying base table. If your base table has 2 partitions keys, we will need to include both of them in the analytical view. To create the analytical view requires manually creating at the command line through beeline or arcadia-shell. (UI creation not supported, but subsequent refreshes can be performed through the UI)

Here’s an example setup:

Logical View - date_string predicate rolling 3 months:

create view db.event_3month_lv
SELECT * FROM db.event_base_table
WHERE date_string >= from_unixtime(unix_timestamp(now() - INTERVAL 3 months), ‘yyyy-MM-dd’)

Analytical View - partitioned on date_string & region

create analytical view av_time_window
partitioned by (date_string, region)
stored as parquet as
select sum(1), sum(sales), custom_id, date_string, region
from db.event_3month_lv
group by 3,4,5

Remaining steps

Once the Analytical view is create & initially refreshed, future refreshes will be incremental over the 3 month rolling window. So we would remove older dates and add newer ones.

Here’s some docs on analytical views creation syntax via command line – using beeline to arcengine:

1 Like