Immuta Best Practices (SQL)
Audience: Data Users
Content Summary: This page outlines some best practices for maximum performance from the Immuta Query Engine, which is an abstraction of your underlying database tables. This virtual layer is powerful as a single point to access all data across your organization and apply data policies consistently.
Overview
Immuta has the ability to push down many different queries, operations, and functions depending on the native database. Below are common best practices to consider when using the Immuta Query Engine in order to take advantage of the pushdown optimization and to avoid problematic queries that add latency to the response.
- Limit result sets.
- Push work down to the native database.
- Use unions and window functions sparingly.
- Avoid joining across servers or technologies.
- Keep data types consistent.
Limit result sets
Unless there’s a good reason to pull back a large result set, try to limit the response. The simplest way to
accomplish this is through a WHERE
clause that limits the response size or a LIMIT
. It is best to use WHERE
clauses
when possible, since whether or not a LIMIT
is pushed down depends on the query structure. Many business
intelligence tools append LIMIT
statements by default to avoid this issue. Treat Immuta similarly.
Push work down to the native database
You are limited to PostgreSQL syntax when querying through the Immuta Query Engine. For a function to be pushed down, an equivalent function must exist in the native database. Immuta maintains a "Rosetta Stone" of database functions that map PostgreSQL syntax to their analogs in all of the data sources that Immuta supports.
If there is specific database syntax that is not pushed down or does not exist in PostgreSQL, you can create an Immuta data source using the SQL statement option (rather than the SQL table option) and include the unsupported native syntax as the SQL statement for the data source. When you query the data source that was created with a SQL statement you will still query the Immuta layer with PostgreSQL syntax, but the underlying native syntax will also be executed as a subquery on the native database (similar to a view), effectively pushing the syntax down.
Use unions and window functions sparingly
Statements using UNION
and most window functions are not currently pushed down, so they will run in the Immuta layer
if you are not aggregating or setting a LIMIT
on the results.
To aggregate or LIMIT
results of the UNION
, the data must
flow back to Immuta. One solution to this problem is to create a view that represents the UNION
, expose that
view in Immuta, and query that view.
Avoid joining across servers or technologies
Immuta will push down joins between data sources that share the same source database as long as the data sources are created with the same connection details, which include hostname, port, database name, username, and schema.
If a JOIN
statement contains data sources that use Kerberos authentication, then the same Immuta user must have been
used to create each data source. If an Immuta user has the IMPERSONATE_HDFS_USER
permission, they may set their
principal to match any data sources that they want to be pushed down.
Simply put, at the time of data source creation,
all data source details must be the same in order to be pushed down.
Keep data types consistent
Casting, especially when done using CASE
statements, can be an expensive operation.
Therefore, it's best to have a schema set in the native database, and then allow Immuta to replicate that schema.
That is, if the source data are TEXT
, make sure that Immuta data type is TEXT
.
Inconsistencies, especially for DATE
data types, can cause performance issues with tools that
automatically generate queries.