SQL questions

Finding data duplication

select order_amount, date, user_id, count(*) as c
from order
group by order_amount, date, user_id
having c > 1

2. To delete duplicates by keeping first record

WITH cte AS (
    SELECT id, 
           ROW_NUMBER() OVER (PARTITION BY user_id, amount, date ORDER BY date) AS row_number
    FROM orders
)
DELETE FROM orders
WHERE id IN (
    SELECT id 
    FROM cte 
    WHERE row_number > 1
);

3. List all employees whose salaries are above their manager’s salary

employee (emp_id, salary, dept_id, manager_id) , manager also could be an employee homself

-- Step 1: Get employees' details and join with their manager's salary
WITH employees_cte AS (
    SELECT e.emp_id, e.salary AS employee_salary, e.manager_id
    FROM employee e
)

-- Step 2: Join employees with their managers
SELECT cte.emp_id, cte.employee_salary, manager.salary AS manager_salary
FROM employees_cte cte
JOIN employee manager ON cte.manager_id = manager.emp_id
WHERE cte.employee_salary > manager.salary;

Explanation:

CTE (employees_cte): We select the employee’s emp_id, their salary (aliased as employee_salary), and their manager_id to use in the main query.
Join: We join the employees_cte CTE with the employee table again, treating the second instance as manager. The join is done on cte.manager_id = manager.emp_id to link the employee’s manager.
Where Clause: The WHERE clause filters the results where the employee’s salary is greater than their manager’s salary.

This query will return all employees whose salaries are higher than their respective managers’.

Cant we do like below ?

SELECT cte.emp_id, cte.employee_salary, manager.salary AS manager_salary
FROM employees_cte cte
JOIN employee manager ON cte.manager_id = manager.emp_id
having cte.employee_salary > manager.salary;

The query you are proposing to use the HAVING clause instead of WHERE is not quite appropriate for this use case. Here’s why:

Difference Between `HAVING` and `WHERE`:

WHERE: It filters rows before aggregation (i.e., on raw data).
HAVING: It filters rows after aggregation, meaning it is used when you are applying aggregate functions like SUM(), COUNT(), AVG(), etc.

In your case, you’re simply comparing the employee’s salary with the manager’s salary, which doesn’t involve any aggregate function. Therefore, the correct approach is to use the WHERE clause instead of HAVING

4. Top 5 Employees Per Department vs Top 5 Employees in the entire organization

SELECT emp_name, emp_id, MAX(salary) AS max_salary
FROM employee
WHERE year = 2023
GROUP BY department_id, emp_id, emp_name
ORDER BY max_salary DESC
LIMIT 5;

Breakdown:

Selection: You’re selecting emp_name, emp_id, and the maximum salary (MAX(salary)) for each unique employee in the dataset.
GROUP BY:
- You group by department_id, emp_id, and emp_name, which means the result will include each employee’s maximum salary individually, effectively treating each employee as a separate group.
Result:
- You get a list of employees with their maximum salary, but since each employee is in their own group, you don’t get the maximum salary per department.
ORDER BY and LIMIT:
- You order the results by max_salary in descending order and then limit the output to the top 5 employees overall, not per department.

Top 5 Employees Per Department

WITH cte AS (
    SELECT salary, emp_id, department_id, 
           ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
    FROM employee
    WHERE year = 2003
)
SELECT emp_id, salary
FROM cte
WHERE rn < 6;

Explanation:

CTE (Common Table Expression):
- The CTE calculates the row number for each employee within their department (PARTITION BY department_id), ordering by their salary in descending order.
- The filter WHERE year = 2003 ensures only the data from 2003 is considered when generating row numbers.
ROW_NUMBER():
- This assigns a row number (rn) to each employee within their department, starting from 1 for the employee with the highest salary.
Outer Query:
- The outer query retrieves the emp_id and salary for all employees with a row number (rn) less than 6, meaning it selects the top 5 employees by salary per department.

This query will give you the top 5 employees by salary in each department for the year 2003.

5. Calculate the average sales for each department in 2023. What SQL query will help you compute and display this average? Employees with Sales Below Target

WITH cte AS (
    SELECT emp_id, sales, dept_id, 
           AVG(sales) OVER (PARTITION BY dept_id) AS sales_average
    FROM employee
    WHERE year = 2023
)
SELECT emp_id, sales
FROM cte
WHERE sales < sales_average;

6. List employees whose sales in 2023 were below their target. What SQL query would you use to find these employees?

If the target is not explicitly available as a field but is instead the average of all sales for 2023, then we need to calculate the average sales for the year and compare each employee’s sales to that average.

WITH avg_sales_cte AS (
    SELECT AVG(sales) AS avg_sales
    FROM employee
    WHERE year = 2023
)
SELECT emp_id, emp_name, sales
FROM employee, avg_sales_cte
WHERE year = 2023
AND sales < avg_sales_cte.avg_sales;

7.To compare sales across multiple years, you would typically want to analyze the sales data for different years for each employee

a) comparing multiple years for each employees

select sum(CASE WHEN year =2022 THEN sales ELSE 0 END) as sales_2022 ,
       sum(CASE WHEN year =2023 THEN sales ELSE 0 END) as sales_2023,
 emp_id
from Sales
group by emp_id

b) Comparing sales growth perecetage

select sum(CASE WHEN year = 2022 THEN sales ELSE 0 END) as sales_2022,
       sum(CASE WHEN year = 2023 thEN sales ELSE 0 END) as sales_2023,
       (sales_2023- sales_2022)/ NULLIF(sales_2022, 0) ) * 100

Data Engineer Labs

Explanation:

Difference Between `HAVING` and `WHERE`:

Breakdown:

Explanation:

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Data Engineer Labs

SQL questions

Explanation:

Difference Between HAVING and WHERE:

Breakdown:

Explanation:

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Difference Between `HAVING` and `WHERE`: