SQL questions
2024
- Finding data duplication
select order_amount, date, user_id, count(*) as c
from order
group by order_amount, date, user_id
having c > 1
2. To delete duplicates by keeping first record
WITH cte AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY user_id, amount, date ORDER BY date) AS row_number
FROM orders
)
DELETE FROM orders
WHERE id IN (
SELECT id
FROM cte
WHERE row_number > 1
);
3. List all employees whose salaries are above their manager’s salary
employee (emp_id, salary, dept_id, manager_id) , manager also could be an employee homself
-- Step 1: Get employees' details and join with their manager's salary
WITH employees_cte AS (
SELECT e.emp_id, e.salary AS employee_salary, e.manager_id
FROM employee e
)
-- Step 2: Join employees with their managers
SELECT cte.emp_id, cte.employee_salary, manager.salary AS manager_salary
FROM employees_cte cte
JOIN employee manager ON cte.manager_id = manager.emp_id
WHERE cte.employee_salary > manager.salary;
Explanation:
- CTE (
employees_cte
): We select the employee’semp_id
, theirsalary
(aliased asemployee_salary
), and theirmanager_id
to use in the main query. - Join: We join the
employees_cte
CTE with theemployee
table again, treating the second instance asmanager
. The join is done oncte.manager_id = manager.emp_id
to link the employee’s manager. - Where Clause: The
WHERE
clause filters the results where the employee’s salary is greater than their manager’s salary.
This query will return all employees whose salaries are higher than their respective managers’.
Cant we do like below ?
SELECT cte.emp_id, cte.employee_salary, manager.salary AS manager_salary
FROM employees_cte cte
JOIN employee manager ON cte.manager_id = manager.emp_id
having cte.employee_salary > manager.salary;
The query you are proposing to use the HAVING
clause instead of WHERE
is not quite appropriate for this use case. Here’s why:
Difference Between HAVING
and WHERE
:
WHERE
: It filters rows before aggregation (i.e., on raw data).HAVING
: It filters rows after aggregation, meaning it is used when you are applying aggregate functions likeSUM()
,COUNT()
,AVG()
, etc.
In your case, you’re simply comparing the employee’s salary with the manager’s salary, which doesn’t involve any aggregate function. Therefore, the correct approach is to use the WHERE
clause instead of HAVING
4. Top 5 Employees Per Department vs Top 5 Employees in the entire organization
SELECT emp_name, emp_id, MAX(salary) AS max_salary
FROM employee
WHERE year = 2023
GROUP BY department_id, emp_id, emp_name
ORDER BY max_salary DESC
LIMIT 5;
Breakdown:
- Selection: You’re selecting
emp_name
,emp_id
, and the maximum salary (MAX(salary)
) for each unique employee in the dataset. - GROUP BY:
- You group by
department_id
,emp_id
, andemp_name
, which means the result will include each employee’s maximum salary individually, effectively treating each employee as a separate group.
- You group by
- Result:
- You get a list of employees with their maximum salary, but since each employee is in their own group, you don’t get the maximum salary per department.
- ORDER BY and LIMIT:
- You order the results by
max_salary
in descending order and then limit the output to the top 5 employees overall, not per department.
- You order the results by
Top 5 Employees Per Department
WITH cte AS (
SELECT salary, emp_id, department_id,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
FROM employee
WHERE year = 2003
)
SELECT emp_id, salary
FROM cte
WHERE rn < 6;
Explanation:
- CTE (Common Table Expression):
- The CTE calculates the row number for each employee within their department (
PARTITION BY department_id
), ordering by their salary in descending order. - The filter
WHERE year = 2003
ensures only the data from 2003 is considered when generating row numbers.
- The CTE calculates the row number for each employee within their department (
- ROW_NUMBER():
- This assigns a row number (
rn
) to each employee within their department, starting from 1 for the employee with the highest salary.
- This assigns a row number (
- Outer Query:
- The outer query retrieves the
emp_id
andsalary
for all employees with a row number (rn
) less than 6, meaning it selects the top 5 employees by salary per department.
- The outer query retrieves the
This query will give you the top 5 employees by salary in each department for the year 2003.
5. Calculate the average sales for each department in 2023. What SQL query will help you compute and display this average? Employees with Sales Below Target
WITH cte AS (
SELECT emp_id, sales, dept_id,
AVG(sales) OVER (PARTITION BY dept_id) AS sales_average
FROM employee
WHERE year = 2023
)
SELECT emp_id, sales
FROM cte
WHERE sales < sales_average;
6. List employees whose sales in 2023 were below their target. What SQL query would you use to find these employees?
If the target is not explicitly available as a field but is instead the average of all sales for 2023, then we need to calculate the average sales for the year and compare each employee’s sales to that average.
WITH avg_sales_cte AS (
SELECT AVG(sales) AS avg_sales
FROM employee
WHERE year = 2023
)
SELECT emp_id, emp_name, sales
FROM employee, avg_sales_cte
WHERE year = 2023
AND sales < avg_sales_cte.avg_sales;
7.To compare sales across multiple years, you would typically want to analyze the sales data for different years for each employee
a) comparing multiple years for each employees
select sum(CASE WHEN year =2022 THEN sales ELSE 0 END) as sales_2022 ,
sum(CASE WHEN year =2023 THEN sales ELSE 0 END) as sales_2023,
emp_id
from Sales
group by emp_id
b) Comparing sales growth perecetage
select sum(CASE WHEN year = 2022 THEN sales ELSE 0 END) as sales_2022,
sum(CASE WHEN year = 2023 thEN sales ELSE 0 END) as sales_2023,
(sales_2023- sales_2022)/ NULLIF(sales_2022, 0) ) * 100