Tracking query performance and usage statistics

Today you can track and analyze any query that’s executed through the Arcadia Analytics Engine by visiting port 35000 on any node that you’re connecting to and executing queries with.

You can also access and query previously completed queries through the arcadia_catalog.completed_queries table, which is created by default when installing Arcadia services and configuring an Arcadia connection:

For more optimal access to this data, its necessary to create a separate historical queries table (i.e. arcaida_catalog.historical_queries). Below are a series of steps you can use to create your historical queries table and also set a periodic schedule for updating and adding queries to this table.

NOTE: This process is exclusive to queries that are ran through the Arcadia Analytics Engine.

Step 1: Create table to store completed queries history (historical_queries)**

This will create the table and also populate it with all available completed queries data.


CREATE TABLE arcadia_catalog.historical_queries
PARTITIONED BY (start_date)
SORT BY (start_time, end_time)
STORED AS PARQUET AS SELECT *, to_date(start_time) as start_date FROM arcadia_catalog.completed_queries;

Step 2: Create a query to make updates to the completed queries history table (historical_queries)

Example SQL job ( update_historical_queries.sql ) :


INSERT INTO arcadia_views.historical_queries 

PARTITION (start_date) 

SELECT *, to_date(start_time) as start_date FROM arcadia_catalog.completed_queries WHERE end_time > (SELECT max(end_time) FROM arcadia_catalog.historical_queries);

COMPUTE INCREMENTAL STATS arcadia_catalog.historical_queries;

Step 3: Create a Cron job to run update script

Example crontab to select and store previous day’s query activity (Runs at 2am every day):

0  2  *  *  * arcadia /opt/cloudera/parcels/ARCADIAENTERPRISE/lib/arcengine/shell/arcadia-shell -f  ***</path/to/update_historical_queries.sql>***