Visual showing dynamically showing Top / Bottom K

We are going to combine two features in this how-to guide:

  1. Parameters: If you haven’t already watched our training video on parameters and filters please start by watching that:
    08. Intro to Parameters

  2. Derived Data : this allows us to generate multi-pass SQL
    09. Advanced - Derived Data

Ultimately what we want to do is setup a single visual that displays Top or Bottom K records based on filter selection by the user something like the following:

How to Video

top_bottom_report_v3

Expressions used in Video

1st Pass

Calculated fields:

count() over () as 'total_rows'

row_number () over ( order by sum ([sales]) desc) as 'rank'

2nd Pass

Filter expression:

case when '<<comparator:<>>' = '>=' then [rank] <<comparator:<>> [total_rows]-<<threshold:2>> else [rank] <<comparator:<>> <<threshold:2>> end

1 Like

@shaun,

I tried to make it work on a bar chart.
X axis has -> [dimension column] from derived data
Y axis has -> max([measure column]) from derived data
Filter is same as applied in Pass 2 here.

I am getting the error as below on refreshing visual:

Visual 298 - Could not load data
error: (HY000, None) AnalysisException: HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (CASE WHEN ‘<=’ = ‘>=’ THEN (T1.rank) <= (T1.total_rows) - 10 ELSE (T1.rank) <= 10 END)

Kindly suggest.

Thanks!

@soumyaawasthi It seems something is incorrect in your visual setup.

Can you double check the query that’s being run on port 35000? that might help guide what’s going on.

It’s possible you are applying some sort of aggregation on the Filter shelf.

Here’s my setup for reference:

The SQL that’s get generated for my visual looks like this:

SQL

WITH T1 AS ( SELECT TA_0.`customer_name` as `sales_rank_customer`, sum((TA_0.`sales`)) as `sales_by_customer`, row_number () over ( order by sum ((TA_0.`sales`)) desc) as `rank`, count() over () as `total_rows` FROM `default`.`superstore_sales` TA_0 GROUP BY 1 ) SELECT T1.`sales_rank_customer` as `sales_rank_customer`, max((T1.`sales_by_customer`)) as `max(sales_by_customer)` FROM `T1` T1 WHERE (case when '>=' = '>=' then (T1.`rank`) >= (T1.`total_rows`)-8 else (T1.`rank`) >= 8 end) GROUP BY 1 ORDER BY 2 DESC LIMIT 5000

@shaun,

I checked the query. The difference I see is that instead of WHERE clause, HAVING clause is applied.

I followed the similar steps, so I am not sure why there is this difference in results.

Below is the query for your reference:

WITH T1 AS ( SELECT TA_0.username as agents_by_time_in_state, sum((TA_0.durationinsecondstalk))/60 as time_in_state_in_minutes, row_number () over ( order by sum((TA_0.durationinsecondstalk)) desc) as rank, count() over () as total_rows FROM default.impact_reports_agg_data TA_0 GROUP BY 1 ) SELECT T1.agents_by_time_in_state as agents_by_time_in_state, max((T1.time_in_state_in_minutes)) as max(time_in_state_in_minutes) FROM T1 T1 GROUP BY 1 HAVING (case when ‘<=’ = ‘>=’ then (T1.rank) < (T1.total_rows)-10 else (T1.rank) < 10 end) LIMIT 100

Kindly suggest.

@soumyaawasthi

  1. does this happen only when using bar chart? or does it work when you switch to table visual?

  2. Also does your derived data (green element) appear as dimension on the right side?

  3. Have you tried recreating the derived data from scratch and seeing if you see the same issue?

@shaun,

After fixing a few things, the chart rendered fine.

Although, there is one new case that is happening.
Requirement -> Chart should show only non-zero records for both highest and lowest.

Current behaviour ->

  • When not applying ‘measure > 0’ filter on derived data chart (pass2), chart shows users with no bars as by lowest sorting many have value 0.
  • When applying ‘measure > 0’ filter on derived data chart (pass2), chart shows ‘no results found’.

Then I updated derived data source to include a WHERE clause for ‘measure > 0’.
Query formed is as below: (any records with 0 value are excluded from this result set as expected)

SELECT TA_0.username as Agents, (sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold)))/60 as time_in_state_in_minutes, row_number () over ( order by (sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold))) desc) as rank, count() over () as total_rows FROM default.impact_reports_agg_data TA_0 GROUP BY 1 HAVING ((sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold))) > 0) LIMIT 100

Now, when I clone this visual and create a new one as in Pass2, query is formed as below: (Error: AnalysisException: aggregate function not allowed in WHERE clause)

WITH T1 AS ( SELECT TA_0.username as agents_by_time_in_state, (sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold)))/60 as time_in_state_in_minutes, row_number () over ( order by (sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold))) desc) as rank, count() over () as total_rows FROM default.impact_reports_agg_data TA_0 WHERE ((sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold))) > 0) GROUP BY 1 ) SELECT T1.agents_by_time_in_state as agents_by_time_in_state, max((T1.time_in_state_in_minutes)) as max(time_in_state_in_minutes) FROM default.impact_reports_agg_data TA_0 INNER JOIN T1 T1 ON ( TA_0.username = T1.agents_by_time_in_state ) GROUP BY 1 HAVING ((sum((TA_0.durationinsecondsacw)) + sum((TA_0.durationinsecondstalk)) + sum((TA_0.durationinsecondshold))) > 0) LIMIT 5000

Kindly suggest.

Thanks!

@soumyaawasthi I tried reproducing your setup with filter on the sum() in Pass1 and everything worked fine in Pass2 (see below).

Make sure to remove any sum() filters in pass 2. That might be causing your issues.

SQL

This is the final SQL that’s produced:

/* widget=15755, app=15244, ds=597 */ WITH T2 AS ( SELECT TA_0.`customer_name` as `customer_rank_having`, sum((TA_0.`sales`))+sum((TA_0.`discount`)) as `_sales_by_customer`, row_number () over ( order by sum ((TA_0.`sales`))+sum((TA_0.`discount`)) desc) as `_rank`, count() over () as `_total_rows` FROM `default`.`superstore_sales` TA_0 GROUP BY 1 HAVING ((sum((TA_0.`sales`))+sum((TA_0.`discount`)))>50000) ) SELECT T2.`customer_rank_having` as `customer_rank_having`, max((T2.`_sales_by_customer`)) as `max(_sales_by_customer)` FROM `T2` T2 GROUP BY 1 LIMIT 100

Alternative Approach

Another option if it’s easier, you can use our dataset from SQL to create these custom visual. So you can create the exact SQL you want, including any custom filter parameters coming from the dashboard, and just paste it into the dataset SQL.

http://documentation.arcadiadata.com/latest/#pages/topics/edit-dataset-from-query.html

@shaun,

I double checked and no filter is applied on Pass 2 report.

For more details, derived data definition is as below:
image

I cloned the visual, removed all dimensions, measures and filters, and added derived data source under dimensions. After refreshing the visual, I still get the same error. Refer screenshot below:

I am not sure what is going wrong. Kindly update.

Also, I tried working with the query based dataset approach. But it is failing when I update measure with filter. With fixed measure, it is working fine.

Thanks!

@soumyaawasthi for the alternative approach, can you share with me what SQL you used for your dataset from SQL ?

@shaun,

Below is the SQL query for dataset:

WITH T1 AS ( SELECT TA_0.username as agents_by_time_in_state, <ahtcomponent:0>/60 as time_in_state_in_minutes, row_number () over ( order by <ahtcomponent:sum(TA_0.`durationinsecondsacw`)> desc) as rank, count() over () as total_rows FROM default.impact_reports_agg_data TA_0 GROUP BY 1 HAVING (<ahtcomponent:0> > 0)) SELECT T1.agents_by_time_in_state as agents_by_time_in_state, max(T1.time_in_state_in_minutes) as time_in_state_in_minutes FROM T1 T1 WHERE (case when ‘<<sortorder:<>>’ = ‘>=’ then T1.rank <= T1.total_rows-10 else T1.rank <= 10 end) GROUP BY 1 ORDER BY 2 DESC

When I add a new visual for this dataset, it gives error as below:

Visual 319 - Could not load data
error: (HY000, None) AnalysisException: Syntax error in line 3:
…s_by_time_in_state`, (sum([durationinsecondsacw]) + su…
^
Encountered: [
Expected: ALL, CASE, CAST, DEFAULT, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, REPLACE, TRUNCATE, TRUE, IDENTIFIER
CAUSED BY: Exception: Syntax error

‘ahtcomponent’ filter looks like below:

Please let me know if any other information is needed. Thanks!

@soumyaawasthi okay, I see what you are doing incorrect.

The Dataset from SQL requires raw SQL.

So your parameter values need to have fully qualified columns names, such as TA_0. username
Remove the square bracket syntax […] and include the table reference like TA_0.

For example, instead of sum([durationinsecondstalk]) —> you need to change to TA_0.`durationinsecondstalk

Make that change for all your custom parameter values

sum(TA_0.durationinsecondstalk+TA_0.durationinsecondshold+TA_0.durationinsecondsacw)

TA_0.durationinsecondstalk

TA_0.durationinsecondshold

TA_0.durationinsecondsacw