CROSS JOIN and UNION ALL
2024
Let’s say you have the following table:
user_id | start_date | end_date |
---|---|---|
1 | 2023-01-01 | 2023-01-10 |
2 | 2023-01-05 | 2023-01-15 |
3 | 2023-02-01 | 2023-02-10 |
Why CROSS JOIN?
- Purpose of the CROSS JOIN:
- The CROSS JOIN generates all possible pairs of rows from the
Subscription
table. - By pairing each row
u1
with every other rowu2
, it enables comparing subscription ranges between different users (u1.user_id != u2.user_id
).
- The CROSS JOIN generates all possible pairs of rows from the
- Condition Application:
- After generating all combinations, the
WHERE
clause filters the results to include only those where:- The users are different (
u1.user_id != u2.user_id
). - The subscription ranges overlap:sqlCopy code
u2.start_date BETWEEN u1.start_date AND u1.end_date OR u2.end_date BETWEEN u1.start_date AND u1.end_date
- The users are different (
- After generating all combinations, the
- Use Case for CROSS JOIN:
- It’s ideal for scenarios where you need pairwise comparisons between rows in the same table.
- Example: Checking for overlapping date ranges, finding duplicate rows, or performing similarity comparisons.
Why Not UNION ALL?
- Purpose of UNION ALL:
- Combines the results of multiple
SELECT
queries into a single result set. - Does not compare rows against each other—it simply stacks results from different queries.
- Combines the results of multiple
- Why UNION ALL Doesn’t Work Here:
- To detect overlaps, you need to compare subscriptions against each other (row by row). A
UNION ALL
cannot perform such comparisons because it doesn’t create pairs of rows. - Example
- To detect overlaps, you need to compare subscriptions against each other (row by row). A
SELECT user_id, start_date, end_date FROM Subscription
UNION ALL
SELECT user_id, start_date, end_date FROM Subscription;
- This just duplicates the table without any comparisons.
Key Limitation:
UNION ALL
is not designed for pairwise comparisons or filtering based on relationships between rows.
Alternative to CROSS JOIN (More Efficient Approach):
If you want to avoid the computational cost of a CROSS JOIN (it generates a large number of row pairs), you can use a self-join:
SELECT DISTINCT u1.user_id, 1 AS is_overlap
FROM Subscription u1
JOIN Subscription u2
ON u1.user_id != u2.user_id
AND (u2.start_date BETWEEN u1.start_date AND u1.end_date
OR u2.end_date BETWEEN u1.start_date AND u1.end_date);
This performs the same logic as the CROSS JOIN
but limits the number of row combinations by applying the join condition during the pairing step.
What CROSS JOIN Does:
A CROSS JOIN
generates all possible pairs of rows for comparison:
SELECT u1.user_id, u2.user_id
FROM Subscription u1
CROSS JOIN Subscription u2;
Resulting pairs:
u1.user_id | u2.user_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 2 |
2 | 3 |
3 | 1 |
3 | 2 |
3 | 3 |
You can then apply conditions to filter meaningful comparisons:
WHERE u1.user_id != u2.user_id
AND (u2.start_date BETWEEN u1.start_date AND u1.end_date OR
u2.end_date BETWEEN u1.start_date AND u1.end_date)
UNION ALL:
SELECT user_id, start_date, end_date FROM Subscription
UNION ALL
SELECT user_id, start_date, end_date FROM Subscription;
The result simply duplicates the rows (or combines rows from multiple queries). Adding a WHERE
clause like:
WHERE start_date BETWEEN ... OR end_date BETWEEN ...
would filter rows independently, but it wouldn’t allow you to compare one row against another because UNION
doesn’t pair rows.
Example to Clarify:
Task: Find Overlapping Subscriptions Between Users
user_id | start_date | end_date |
---|---|---|
1 | 2023-01-01 | 2023-01-10 |
2 | 2023-01-05 | 2023-01-15 |
3 | 2023-02-01 | 2023-02-10 |
Using CROSS JOIN:
SELECT u1.user_id AS user1, u2.user_id AS user2
FROM Subscription u1
CROSS JOIN Subscription u2
WHERE u1.user_id != u2.user_id
AND (u2.start_date BETWEEN u1.start_date AND u1.end_date
OR u2.end_date BETWEEN u1.start_date AND u1.end_date);
Result:
user1 | user2 |
---|---|
1 | 2 |
2 | 1 |
Using UNION:
If you try to use UNION ALL
:
SELECT user_id, start_date, end_date
FROM Subscription
UNION ALL
SELECT user_id, start_date, end_date
WHERE start_date BETWEEN ... OR end_date BETWEEN ...;
It simply duplicates rows. There’s no pairing or comparison between rows because UNION ALL
just stacks rows from the two queries.
1. What is UNION ALL
?
- Purpose: Combines the result sets of two or more queries into a single result.
- Key Feature: Keeps all rows from all queries, including duplicates (unlike
UNION
, which removes duplicates).
Example:
Table A
:
id | name |
---|---|
1 | Alice |
2 | Bob |
Table B
:
id | name |
---|---|
2 | Bob |
3 | Charlie |
Query:
SELECT * FROM A
UNION ALL
SELECT * FROM B;
Result:
id | name |
---|---|
1 | Alice |
2 | Bob |
2 | Bob |
3 | Charlie |
- Combines the rows from both tables without comparing or filtering relationships.