Skip to main content

SELECT Queries

Represents a single SELECT query. Functionality is the same as Trino SQL. Any caveats are listed explicitly

SELECT

  • SELECT * is not supported right now.
  • Aliases are mandatory in case of conflicts, and qualified references to columns.
    • They are also mandatory for function calls. SELECT SUM(a) AS b is a correct query, while SELECT SUM(a) is not.
    • If referring to a column from a joined table, the column will include the qualifier too based on the joined table name or alias. It is recommended to use an appropriate alias in this case as well. For example, SELECT a.name as name or SELECT table_1.user_id as user_id
  • DISTINCT and COUNT(*) will work as expected
    • While processing the distinct values of a column, all the NULLs are considered as one distinct value

FROM

  • Please use the table name.
  • Subqueries will work here. Aliases are mandatory for subqueries

WHERE

  • Conditions, including nested brackets, operator overloading and functions should work as expected
  • Subqueries should work here and aliases are NOT mandatory

GROUP BY

  • While grouping on a column, all the NULLs are put together in one bucket
  • GROUP BY with ordinal references to SELECT clauses are supported.

ORDER BY

  • It is mandatory to specify LIMIT or OFFSET Clause if ORDER BY is used inside a subquery or CTE

LIMIT

The LIMIT clause restricts the number of rows in the result set.

HAVING

  • Having works when a group by has been configured

JOIN

  • All join clauses are supported, including the different types of joins
  • Subqueries will work here but aliases are mandatory
  • When an alias is specified for a table name, then:
    • If the column name is unique, the column can either directly be referenced or the alias must be used as the qualifier, not the table name.
      • For example, the following query is accurate: table_1 contains columns a, b, c and table_2 contains columns a, b, d
        select t1.a as a, t2.b as b, t1.c as c, d
        from
        table_1 t1
        inner join
        table_2 t2
        on t1.a = t2.a
      • For example, the following query is NOT accurate: table*1 contains columns a, b, c and table_2 contains columns a, b, d
        select t1.a as a, t2.b as b, table_1.c as c, d - - \_table_1.c is not valid*
        from
        table_1 t1
        inner join
        table_2 t2
        on t1.a = t2.a
    • If a column is unique, it can be referenced directly or using the table name qualifier (for example, column d is unique but it is referenced as either d or t1.d). In the case the table name alias is used as a qualifier, an alias for the column is mandatory
    • If a column name is not unique, the table alias must be used as a qualifier, not the table name. In this case, an alias for the column is mandatory

Null Handlings

Similar to <=> operator in spark, Trino has IS DISTINCT FROM and IS NOT DISTINCT FROM operators. In SQL a NULL value signifies an unknown value, so any comparison involving a NULL will produce NULL . These two operators treat NULL as a known value and both guarantee either a true or false outcome even in the presence of NULL input

SELECT * FROM memory.test_schema.dummy_table; -- Table 1
->
id | name | created_at
----+---------+-------------------------
1 | A | 2024-11-01 10:00:00.000
2 | NULL | 2024-11-02 11:30:00.000
3 | C | NULL
4 | NULL | NULL
SELECT * FROM memory.test_schema.dummy_table_2; -- Table 2
->
id | name | created_at | status | score
----+---------+-------------------------+----------+-------
1 | A | 2024-11-01 10:00:00.000 | active | 85.5
2 | NULL | 2024-11-02 11:30:00.000 | inactive | NULL
3 | C | NULL | NULL | 70.0
4 | NULL | NULL | active | NULL
5 | E | 2024-11-04 09:00:00.000 | NULL | 90.0
IS NOT DISTINCT FROM VS =

Consider the following query which uses IS NOT DISTINCT FROM in the ON clause. There are null values in both name and created_at columns of both the tables.

SELECT a.id as a_id, b.id as b_id, a.name as a_name, b.name as b_name,
a.created_at as a_created_at, b.created_at as b_created_at, status, score
FROM memory.test_schema.dummy_table a
LEFT JOIN memory.test_schema.dummy_table_2 b
ON a.name IS NOT DISTINCT FROM b.name
AND a.created_at IS NOT DISTINCT FROM b.created_at;
->
a_id | b_id | a_name | b_name | a_created_at | b_created_at | status | score
------+------+---------+---------+-------------------------+-------------------------+----------+-------
1 | 1 | A | A | 2024-11-01 10:00:00.000 | 2024-11-01 10:00:00.000 | active | 85.5
2 | 2 | NULL | NULL | 2024-11-02 11:30:00.000 | 2024-11-02 11:30:00.000 | inactive | NULL
3 | 3 | C | C | NULL | NULL | NULL | 70.0
4 | 4 | NULL | NULL | NULL | NULL | active | NULL

This behaviour is compatible with spark’s <=> operator, which allows for NULL Safe comparisons. IS NOT DISTINCT FROM equates the NULLs from the left and the right tables while joining as opposed to = operator which when sees NULL in the joining column of the left table produces NULL as output in the columns of the right table.

SELECT a.id as a_id, b.id as b_id, a.name as a_name, b.name as b_name,
a.created_at as a_created_at, b.created_at as b_created_at, status, score
FROM memory.test_schema.dummy_table a
LEFT JOIN memory.test_schema.dummy_table_2 b
ON a.name = b.name AND a.created_at = b.created_at;
->
a_id | b_id | a_name | b_name | a_created_at | b_created_at | status | score
------+------+---------+--------+-------------------------+-------------------------+--------+-------
1 | 1 | A | A | 2024-11-01 10:00:00.000 | 2024-11-01 10:00:00.000 | active | 85.5
2 | NULL | NULL | NULL | 2024-11-02 11:30:00.000 | NULL | NULL | NULL
3 | NULL | C | NULL | NULL | NULL | NULL | NULL
4 | NULL | NULL | NULL | NULL | NULL | NULL | NULL

UNION

  • Two NULL values are considered equal

WINDOW

  • Window functions may have performance issues. They should work, but please ensure that you are testing the performance before committing to a window based approach.

OFFSET

The OFFSET clause is used to skip a specified number of rows before beginning to return rows from the query.

EXCEPT

  • Two NULL values are considered equal

INTERSECT

  • Two NULL values are considered equal

WITH

  • CTEs are supported.
  • Recursive CTEs are not supported.
  • WITH FUNCTION will not work

UNNEST

Only supports one column that can be exploded at a time

CASE

IN / NOT IN

IN expression can return a TRUEFALSE or NULL value.

Null Handlings

The NULL semantics in IN can be deduced from the NULL value handling in comparison operators= and logical operators OR. If the subquery in where clause only returns NULL value, no rows will be selected by the IN operator even if NULL value is present in the column in question.

SELECT * FROM memory.test_schema.dummy_table;
idnamecreated_atstatusscore
1A2024-11-01 10:00:00active85.5
2NULL2024-11-02 11:30:00inactiveNULL
3CNULLNULL70.0
4NULLNULLactiveNULL
5E2024-11-04 09:00:00NULL90.0

Consider the above table.

  1. The subquery has only NULL value in its result set. Therefore, the IN predicate produces no rows as the result even though name column has NULL values
SELECT * FROM memory.test_schema.dummy_table WHERE name in (SELECT NULL);
idnamecreated_at
  1. The subquery has NULL value in the result set as well as a valid value ‘A’. Rows with name = ‘A’ are returned.
SELECT * FROM memory.test_schema.dummy_table
WHERE name in (SELECT name from (VALUES ('A'), (NULL)) AS sub(name));
idnamecreated_at
1A2024-11-01 10:00:00

Compound Queries

Containing more than one SELECT queries.

SubQueries

  • SubQueries should work as expected
  • Aliases for subqueries are mandatory