PSL can be used to define entities on two abstraction layers:
Relational (see: Physical Mapping) - defines schema tables that can be used in SQL queries. The relational model can span multiple data sources and provides uniform SQL access to your entire data landscape.
Multi-dimensional (see: Logical Model) - defines dimensions and measures on top of schema tables. It allows you to run aggregate and record type queries.
A simple example:
schema MySchema(dataSource = "Test") {measure unitSales(column = "sales.unit_sales")​dimension Customer(table = "customer") {attribute city(column = "city")}​table sales(physicalTable = "SALES_FACT_1998") {column customer_id(expression = "CUSTOMER_ID", tableRef = "customer.customer_id")column unit_sales(expression = "UNIT_SALES")}​table customer(physicalTable = "CUSTOMER") {column customer_id(expression = "CUSTOMER_ID")column city(expression = "CITY")}}
Now it is possible to query this schema for the measure unitSales
aggregated by the dimension attribute Customer.city
.
PSL definitions have the following structure:
<definition> := <type> <name> <parameters>? <body>?<parameters> := '(' <param> (',' <param>)* ')'<param> := <paramName> '=' <paramValue><body> := '{' <definition>* '}'
Top level element defining the complete data model.
'schema' <name> <parameters>? <body>?
Name | Description |
dataSource | Default data source for this schema. Data sources are referred to by name inside the project containing this schema. |
Contains the following definitions:
Zero or more Schema Expose​
Zero or more Dimensions​
Zero or more Measures​
Zero or more Tables​
At most one Filter​
schema MySchema(dataSource = "pg_prod") {dimension Time}
The following entities can be imported/exposed: schemas, dimensions, measures and tables.
The Import
definition imports names inside the current schema but does not expose them in the schema interface. Imported definitions can be used, for example, in conforms or other places where a reference to an existing definition is necessary.
Expose
definition exposes the imported definitions in the schema interface.
Each imported name can have an alias. If an alias is not provided the entity will be imported with last name component in the qualified name (i.e. unqualified name).
'import' IDENTIFIER'expose' <entity_type> <qualifiedName> [ <alias> ]'expose' <field_type>s (<wildcard>|<scope>) with (<prefix>|<namespace>)​Where:<field_type> := 'dimension' | 'measure' | 'table'<entity_type> := 'schema' | <field_type><wildcard> := qualifiedName.*<scope> := qualifiedName.{<qualifiedName> (',' <qualifiedName>)*}<pefix> := prefix IDENTIFIER<namespace> := namespace <qualifiedName><qualifiedName> := IDENTIFIER ('.' IDENTIFIER)*<alias> := as IDENTIFIER
/* 1. Importing */import Sales // importing schemaimport Inventoryschema Foodmart {​dimension Store {conforms Sales.Store // using imported nameconforms Inventory.Store}​measure sales {conforms Sales.sales // using imported name}}
/* 2. Expose */import Salesschema Foodmart {expose schema Sales // exposing schemaexpose schema Inventory as Invexpose measure Sales.storeSalesexpose dimension Sales.Store_D as Storeexpose tables Sales.*expose measures Sales.{m1, m2}expose dimensions Sales.* with prefix Foo_expose dimensions Sales.* with namespace Cusomer.Publicexpose dimensions Sales.{d1, d2} with namespace Cusomer.Publicexpose dimensions Sales.{d1, d2} with prefix Foo_​/** Exposed names: Store, Inv, storeSales, d1, d2, m1, m2, all tables and dimensions from Sales schema, etc.,* are accessible directly in queries to the Foodmart schema or when importing/exporting Foodmart schema itself.*/​dimension ConformingStore {conforms Store // exported names can be used inside this schema as wellconforms InvStore}}
Defines a set of related attributes which can be queried. Attributes can be further grouped in a dimension by hierarchies and levels.
A dimension defines a namespace for the set of elements it contains. The dimension name is used to prefix attribute names when querying. For example:
dimension Customer {attribute name}
Provides a dimension attribute called Customer.name
that can be used for querying.
'dimension' <name> <parameters>? <body>?
Name | Description |
table | Default table for this dimension. |
Contains the following definitions:
At most one Metadata​
Zero or more Conforms​
Either:
One or more Hierarchies​
One or more Levels​
One or more Attributes​
/** 1. Two different hierarchies* First hierarchy here is the default.* Levels have the following qualified names:* Customer.ageGroup, Customer.age, Customer.customer,* Customer.Geo.country, Customer.Geo.city, Customer.Geo.customer*/dimension Customer {hierarchy {level ageGrouplevel agelevel customer}​hierarchy Geo {level countrylevel citylevel customer}}​/** 2. Plain dimension structure* In this case there are no hierarchical relationships inside the dimension* Attributes have the following qualified names:* Person.id, Person.firstName, Person.lastName*/dimension Person {attribute idattribute firstNameattribute lastName}
Defines data attribute that can be queried using Contiamo.
'attribute' <name> <parameters>? <body>?
Name | Description |
table | ​table for this attribute. |
column | Reference to the column for this attribute. Can be either in form of |
columns | Array of equivalent references to columns for this attribute. Each element in the array corresponds to the rules described for the |
expression | ​ |
castType | Specifies SQL type to which the output value for this level is converted. |
Contains the following definitions:
attribute Email(column = "user_email")
Defines an optionally named group of levels. The order of levels inside a hierarchy defines hierarchical relationships from general to specific. Levels defined before are more broad than levels defined after. Hierarchies are used to group levels and attributes for organizational purposes. They have no impact on query execution.
'hierarchy' <name>? <parameters>? <body>
Note that the name is optional. In such a case, this is the default hierarchy inside a dimension. Only one hierarchy inside a dimension is allowed to have an empty name.
Name | Description |
table | Default table for this hierarchy. |
Contains the following definitions:
hierarchy {level Countrylevel Regionlevel City}
Defines groups of attributes. Levels as well as attributes represent specific data values. Unlike dimensions and hierarchies that cannot be queried directly and provide models, levels and attributes can be used in Contiamo queries.
In terms of modeling, the main difference between levels and attributes is that level is used to model hierarchies, while attribute does not participate in hierarchical relationships.
'level' <name> <parameters>? <body>?
Name | Description |
table | ​Table for this level. |
column | Reference to the column for this level. Can be either in form of |
columns | Array of equivalent references to columns for this level. Each element in the array corresponds to the rules described for the |
expression | ​ |
castType | Specifies SQL type to which the output value for this level is converted. |
Contains the following definitions:
At most one Metadata​
Zero or more Conforms​
Zero or more Attributes​
level Country(columns = ["country_id", "stats.country_id"]) {attribute nameattribute isoCountryCode}
Measure is an aggregatable piece of data.
Measures can be defined in 3 different flavors:
aggregated measure - measure that optionally applies filtering to dimensions and then aggregates underlying columns directly
filtered measure - measure that references other measures and applies filtering on dimensions afterwards
calculated measure - measure that implements some arithmetic expression on top of other measures
'measure' <name> <parameters>? <body>?
Name | Description |
aggregate | Aggregate function to use for this measure. Supported functions are: 'sum', 'avg', 'count', 'distinctCount', 'approxDistinctCount', 'max', 'min'. By default set to 'sum'. |
column | Reference to the column for this measure in form of |
columns | Array of equivalent references to columns for this measure. Each element in the array corresponds to the rules described for |
measure | Reference to other measure to base this value on. Used in filtered measures. |
filter | Filter to be applied when calculating this measure. |
calculation | Arithmetic expression on other measures. Used in calculated measures. |
Use the following rules to determine measure type:
If neither measure
nor calculation
are defined = aggregated measure. Aggregation function determined either from aggregate
parameter or "sum" by default.
If measure
is defined = filtered measure. In this case neither calculation
nor conforms
should be defined.
If calculation
= calculated measure. In this case aggregate
, measure
, filter
and conforms
are prohibited.
Can contain the following definitions:
// aggregated measuremeasure unitSales(aggregate="sum", column = "stats.unit_sales")​// filtered measuremeasure unitSalesx(measure = "unitSales", filter = "Customer.cars < 3")​// aggregated measure with filtermeasure numChildren(filter = "Customer.cars < 3")​// calculated measuremeasure calc (calculation = "storeSales + 100000 / (storeCost + storeSales) + 1")
A schema table can either be a direct mapping to a physical table, or a transformation combining multiple physical tables (which can reside in different data sources). Schema tables can be used in SQL queries and referred to in dimension and measure definitions.
By default the schema table name is the same as the physical table name. To combine multiple physical tables, or perform other transformations at query time, a standard SQL query can be defined with the sql
parameter. Table names in that query must be namespaced by the name of the data source.
'table' <name> <parameters>? <body>?
Name | Description |
dataSource | The name of the data source for this table. Can reference data sources from this schema's project. |
physicalTable | The name of underlying physical table for this schema table definition. Schema table name is used by default if this parameter is not provided. |
sql | Defines a SQL statement. Cannot be used with |
Can contain the following definitions:
Zero or more Columns​
// 1. Table with body. Physical table name is also "sales_fact_1998"table sales_fact_1998 {column customer_id (tableRef = "customer.customer_id")column unit_sales}​// 2. Using SQL to join sales to customer profiles which reside in different data sourcestable sales_with_time(sql = """select * from "foodmart"."sales_fact_1998" sf8 join "customer_warehouse"."customer_profile" profile on (sf8.customer_id = profile.cust_id)""") {columns *}
Column defines an elementary piece of data inside a schema table. This definition is also used to link the physical data structure with logical elements such as dimensions and measures, as well as to define relations between schema tables.
'column' <name> <parameters>?
Name | Description |
expression | SQL expression for this column. Expressions must be written in standard SQL. |
tableRef | Links to other schema columns. Schema columns shall be addressed using |
tableRefs | Array of links to other schema columns. Each element of this array contains a definition conforming to the rules for the |
table sales_fact_1998 {​// Provides link to another table/column which can be used to join target table when necessarycolumn customer_id (tableRef = "customer.customer_id")​// Column references multiple tablescolumn stat_id (tableRefs = ["inventoryStats.stat_id", "salesStats.stat_id"])​column unit_sales}​table calendar {// An example of using expressioncolumn day_of_week(expression = """extract(day from "the_date")""")}
This is a shorthand for being able to easily expose all columns in the physical table.
'columns' *
N/A
table sales_fact_1998 {// make all columns in the physical table availablecolumns *}​table customer {columns *​// overrides the definion of "customer_id" provided by "columns *"column customer_id (tableRef = "customer.customer_id")​// adds an additional column based on an expressioncolumn name(expression="fname || lname")}
Filter expressions have their own SQL-like syntax. A schema can have at most one filter.
'filter' '"'<expression> (<boolOp> <expression>)*'"'<expression> := <dimAttr> <operator> <value><boolOp> := 'and' | 'or'
Name | Description |
dimAttr | refers to a dimension attribute to filter by |
operator | one of: |
value | A dimension value (or values) to match |
filter "Date.Date > '2017-01-01'"​filter """Store.region.city in ('Tacoma', 'Seattle') and Store.store.type = 'Supermarket'"""
Used to identify that multiple measures, dimensions, levels, or attributes are the same (conforming) across tables and data sources. The conforms
relation is mono-directional, i.e., if some element defined as conforming to another element (conforming element) is queried, the result returned can only be found by the means of the element to which it conforms (conform element). To rephrase, when querying for the conforming element, the result returned is retrieved via the conform element.
'conforms' <qualifiedName>
import Sales.Storeimport Inventory.{Store => InvStore}import Sales.salesimport Inventory.{warehouseSales => invSales}​schema Foodmart {dimension ConformingStore {// defines conformance to 2 dimensions from other schemas// ConformingStore dimension will have intersection of hierarchies of dimension it conforms toconforms Storeconforms InvStore}​measure totalSales {// defines conformance to measure from other schemaconforms sales}​measure warehouseSales {// defines conformance to measure from other schemaconforms invSales}}
schema Foodmart {import {GoogleAnalytics => ga, FacebookStats => fb}​// In this case we explicitly define a set of attributes for a dimension.// If any defined attribute exists in one of the conformed dimensions, this will define conformance of attributes.dimension Date {conforms ga.Timeconforms fb.Time​attribute yearattribute dayattribute weekdayattribute date}}
Defines metadata for different types of elements. Metadata is not directly processed by Contiamo but stored for reference and for use in other tools. Metadata can be introspected to extract additional configuration or knowledge about the schema.
'metadata' <parameters>​Where:<parameters> := '(' <param> (',' <param>)* ')'<param> := <name> '=' (<simpleValue> | <arrayValue>)<arrayValue> := '[' <simpleValue> (',' <simpleValue>)* ']'<simpleValue> := BOOL | NUMBER | STRING
Parameter names here are arbitrary. They can be interpreted by third-party tools.
schema Foodmart {dimension D {metadata(d1 = ["d1", "dd1"], d2 = "d2")attribute A {metadata(a1 = ["a1", "aa1"], a2 = "a2")}}measure M {metadata(m1 = ["m1", "mm1"], m2 = "m2")}}