Explicit Join vs. Implicit Join
When writing SQL there are two places you can define how tables are joined. The first is explicit, and is specified after the the keyword ON in the join clause like this:
FROM table_a as a
JOIN table_b as b ON a.id=b.id
The same join can also be written this way with implicit join syntax:
FROM table_a as a, table_b as b
Both of these work perfectly fine. Generally the implicit syntax is considered older while the explicit is considered newer. Personally, I vastly prefer the explicit syntax. The reason is simply code maintainablity. SQL often survives many application updates and is reused in many ways in many places. The where clause will often change over time, and be changed by different people at different times. More than once in my career, I’ve seen one of the changes accidentally remove the where clause, leading to significant performance problems and incorrect results.
Last week, someone was troubleshooting a very real performance problem and came to me with information from a monitoring tool, Dynatrace. The information showed that within a narrow time window being examined, a query expected to run in 3 seconds or less instead ran in 50 seconds. Considering the issue being worked on had to do with a web request timing out, this could be relevant. The monitoring tool presented the query like this:
select distinct USERS.USERS_ID,USERREG.LOGONID
where USERS.REGISTERTYPE in (?, ?, ..., ?)
and USERS.PROFILETYPE = ?
and (LCASE(USERREG.LOGONID) like ? escape '\'
I ran an explain on this query and found it was indeed exceptionally expensive, and in fact 50 seconds seemed rather speedy given that it was clearly creating a Cartesian Product:
Upon further inspection, we found the following representation of execution and response times for this query:
The green arrow points to the problem time identified – when this query ran for 50 seconds. The purple line indicates median execution duration. All other execution times for this query were much more reasonable. This graph alone ruled out our issue, as we were trying to help a vendor identify why a patch had started causing a particular function to time out on the website, and this query was occurring both before and after the patch, and generally was returning in a reasonable amount of time. However, I was still worried as not only would an abundance of these at once cause a potential performance problem, but it also wouldn’t return meaningful data since it would pair each logonid with each user_id, which is not meaningful.
After I reached out to our developers to discuss, someone realized that the analysis tool was TRUNCATING the query for some reason. After some research, it looks like the real query was:
SELECT DISTINCT USERS.USERS_ID, USERREG.LOGONID
FROM USERS, USERREG
IN (?, ?, ?) AND USERS.PROFILETYPE = ? AND (LCASE(USERREG.LOGONID) LIKE ? escape '\' ) AND (USERREG.USERS_ID = USERS.USERS_ID)
ORDER BY USERREG.LOGONID
Because the join condition was placed at the END of the WHERE clause, the part that got truncated was the where clause, leading the query to look much worse than it was. The actual explain plan looked like this:
While probably not the ideal, this is at least acceptable for a query that runs sporadically and less than once per minute.
First, there’s a reason why explicit join syntax is preferred. I would not have missed the tool truncating the query if the join condition were explicit.
Second, always check the data from a reporting tool to verify it means what you think it means. I would have spent less time on this if I had understood we were talking about a single execution and if I can correctly realized that we were not seeing a real-world cart.
Originally published on DataGeek Blog.