SELECT Queries

Represents a single SELECT query. Functionality is the same as Trino SQL. Any caveats are listed explicitly

SELECT

SELECT * is not supported right now.
Aliases are mandatory in case of conflicts, and qualified references to columns.
- They are also mandatory for function calls. SELECT SUM(a) AS b is a correct query, while SELECT SUM(a) is not.
- If referring to a column from a joined table, the column will include the qualifier too based on the joined table name or alias. It is recommended to use an appropriate alias in this case as well. For example, SELECT a.name as name or SELECT table_1.user_id as user_id
DISTINCT and COUNT(*) will work as expected
- While processing the distinct values of a column, all the NULLs are considered as one distinct value

FROM

Please use the table name.
Subqueries will work here. Aliases are mandatory for subqueries

WHERE

Conditions, including nested brackets, operator overloading and functions should work as expected
Subqueries should work here and aliases are NOT mandatory

GROUP BY

While grouping on a column, all the NULLs are put together in one bucket
GROUP BY with ordinal references to SELECT clauses are supported.

ORDER BY

It is mandatory to specify LIMIT or OFFSET Clause if ORDER BY is used inside a subquery or CTE

LIMIT

The LIMIT clause restricts the number of rows in the result set.

HAVING

Having works when a group by has been configured

JOIN

All join clauses are supported, including the different types of joins
Subqueries will work here but aliases are mandatory
When an alias is specified for a table name, then:
- If the column name is unique, the column can either directly be referenced or the alias must be used as the qualifier, not the table name.
  - For example, the following query is accurate: table_1 contains columns a, b, c and table_2 contains columns a, b, d
    select t1.a as a, t2.b as b, t1.c as c, d from table_1 t1 inner join table_2 t2 on t1.a = t2.a
  - For example, the following query is NOT accurate: table*1 contains columns a, b, c and table_2 contains columns a, b, d
    select t1.a as a, t2.b as b, table_1.c as c, d - - \_table_1.c is not valid* from table_1 t1 inner join table_2 t2 on t1.a = t2.a
- If a column is unique, it can be referenced directly or using the table name qualifier (for example, column d is unique but it is referenced as either d or t1.d). In the case the table name alias is used as a qualifier, an alias for the column is mandatory
- If a column name is not unique, the table alias must be used as a qualifier, not the table name. In this case, an alias for the column is mandatory

Null Handlings

Similar to <=> operator in spark, Trino has IS DISTINCT FROM and IS NOT DISTINCT FROM operators. In SQL a NULL value signifies an unknown value, so any comparison involving a NULL will produce NULL . These two operators treat NULL as a known value and both guarantee either a true or false outcome even in the presence of NULL input

SELECT * FROM memory.test_schema.dummy_table; -- Table 1
->
  id |  name   |       created_at
----+---------+-------------------------
  1 | A       | 2024-11-01 10:00:00.000
  2 | NULL    | 2024-11-02 11:30:00.000
  3 | C       | NULL
  4 | NULL    | NULL

SELECT * FROM memory.test_schema.dummy_table_2; -- Table 2
->
  id |  name   |       created_at        |  status  | score
----+---------+-------------------------+----------+-------
  1 | A       | 2024-11-01 10:00:00.000 | active   |  85.5
  2 | NULL    | 2024-11-02 11:30:00.000 | inactive |  NULL
  3 | C       | NULL                    | NULL     |  70.0
  4 | NULL    | NULL                    | active   |  NULL
  5 | E       | 2024-11-04 09:00:00.000 | NULL     |  90.0

`IS NOT DISTINCT FROM` VS `=`

Consider the following query which uses IS NOT DISTINCT FROM in the ON clause. There are null values in both name and created_at columns of both the tables.

SELECT a.id as a_id, b.id as b_id, a.name as a_name, b.name as b_name,
a.created_at as a_created_at, b.created_at as b_created_at, status, score
FROM memory.test_schema.dummy_table a
LEFT JOIN memory.test_schema.dummy_table_2 b
ON a.name IS NOT DISTINCT FROM b.name
AND a.created_at IS NOT DISTINCT FROM b.created_at;
->
  a_id | b_id | a_name  | b_name  |      a_created_at       |      b_created_at       |  status  | score
------+------+---------+---------+-------------------------+-------------------------+----------+-------
    1 |    1 | A       | A       | 2024-11-01 10:00:00.000 | 2024-11-01 10:00:00.000 | active   |  85.5
    2 |    2 | NULL    | NULL    | 2024-11-02 11:30:00.000 | 2024-11-02 11:30:00.000 | inactive |  NULL
    3 |    3 | C       | C       | NULL                    | NULL                    | NULL     |  70.0
    4 |    4 | NULL    | NULL    | NULL                    | NULL                    | active   |  NULL

This behaviour is compatible with spark’s <=> operator, which allows for NULL Safe comparisons. IS NOT DISTINCT FROM equates the NULLs from the left and the right tables while joining as opposed to = operator which when sees NULL in the joining column of the left table produces NULL as output in the columns of the right table.

SELECT a.id as a_id, b.id as b_id, a.name as a_name, b.name as b_name,
a.created_at as a_created_at, b.created_at as b_created_at, status, score
FROM memory.test_schema.dummy_table a
LEFT JOIN memory.test_schema.dummy_table_2 b
ON a.name = b.name AND a.created_at = b.created_at;
->
  a_id | b_id | a_name  | b_name |      a_created_at       |      b_created_at       | status | score
------+------+---------+--------+-------------------------+-------------------------+--------+-------
    1 |    1 | A       | A      | 2024-11-01 10:00:00.000 | 2024-11-01 10:00:00.000 | active |  85.5
    2 | NULL | NULL    | NULL   | 2024-11-02 11:30:00.000 | NULL                    | NULL   |  NULL
    3 | NULL | C       | NULL   | NULL                    | NULL                    | NULL   |  NULL
    4 | NULL | NULL    | NULL   | NULL                    | NULL                    | NULL   |  NULL

UNION

Two NULL values are considered equal

WINDOW

Window functions may have performance issues. They should work, but please ensure that you are testing the performance before committing to a window based approach.

OFFSET

The OFFSET clause is used to skip a specified number of rows before beginning to return rows from the query.

EXCEPT

Two NULL values are considered equal

INTERSECT

Two NULL values are considered equal

WITH

CTEs are supported.
Recursive CTEs are not supported.
WITH FUNCTION will not work

UNNEST

Only supports one column that can be exploded at a time

CASE

IN / NOT IN

IN expression can return a TRUE, FALSE or NULL value.

Null Handlings

The NULL semantics in IN can be deduced from the NULL value handling in comparison operators= and logical operators OR. If the subquery in where clause only returns NULL value, no rows will be selected by the IN operator even if NULL value is present in the column in question.

SELECT * FROM memory.test_schema.dummy_table;

id	name	created_at	status	score
1	A	2024-11-01 10:00:00	active	85.5
2	NULL	2024-11-02 11:30:00	inactive	NULL
3	C	NULL	NULL	70.0
4	NULL	NULL	active	NULL
5	E	2024-11-04 09:00:00	NULL	90.0

Consider the above table.

The subquery has only NULL value in its result set. Therefore, the IN predicate produces no rows as the result even though name column has NULL values

SELECT * FROM memory.test_schema.dummy_table WHERE name in (SELECT NULL);

id	name	created_at

The subquery has NULL value in the result set as well as a valid value ‘A’. Rows with name = ‘A’ are returned.

SELECT * FROM memory.test_schema.dummy_table
WHERE name in (SELECT name from (VALUES ('A'), (NULL)) AS sub(name));

id	name	created_at
1	A	2024-11-01 10:00:00

Compound Queries

Containing more than one SELECT queries.

SubQueries

SubQueries should work as expected
Aliases for subqueries are mandatory

SELECT​

FROM​

WHERE​

GROUP BY​

ORDER BY​

LIMIT​

HAVING​

JOIN​

Null Handlings​

IS NOT DISTINCT FROM VS =​

UNION​

WINDOW​

OFFSET​

EXCEPT​

INTERSECT​

WITH​

UNNEST​

CASE​

IN / NOT IN​

Null Handlings​

Compound Queries​

SubQueries​

SELECT

FROM

WHERE

GROUP BY

ORDER BY

LIMIT

HAVING

JOIN

Null Handlings

`IS NOT DISTINCT FROM` VS `=`

UNION

WINDOW

OFFSET

EXCEPT

INTERSECT

WITH

UNNEST

CASE

IN / NOT IN

Null Handlings

Compound Queries

SubQueries