Refreshing Arcadia Dataset definitions from the UI and the command line

When an operation that changes a table’s schema occurs (i.e. adding, changing, or removing a column), this will cause the Arcadia Dataset to become out of sync with the base table schema definition. This is easily fixable from the UI by going to the Dataset, clicking the “Fields” tab, then clicking “Edit Fields”, and clicking the “Refresh” button.

After you click “Refresh” you should see a message like this indicating that the column definitions in your dataset have been updated:

However, at some point it may be advantageous to automate this process as part of your ETL process. Automating a Dataset definition refresh can be done by using the Arcviz command line utility. Below is an example of how to access this utility from within a Cloudera environment and example of how the Arcviz utility can be invoked to refresh a dataset using the dataset id (see this link for further documentation on the location of the Arcviz utility in other environments such as Hortonworks and MapR)

Step 1. Become the “arcadia” user

su arcadia

Step 2. Change the directory location to the 'arcviz directory:

cd /opt/cloudera/parcels/ARCADIAENTERPRISE/lib/arcviz

Step 3. Access the ‘arcviz’ utility, invoke the “util” module and “dataset” command, and pass in a dataset id into the -r or --refresh argument:

./arcviz util dataset -r 753
