SQL-problem from dataford
2024
URL :https://dataford.io/sql/Population-Density-Analysis

WITH DensityData AS (
SELECT
c.city,
c.population,
c.area,
ROUND(c.population / c.area) AS density
FROM
cities_population c
WHERE
c.area > 0 AND c.population > 0
)
SELECT
city,
population,
area,
density,
MAX(density) OVER () AS max_density,
MIN(density) OVER () AS min_density
FROM
DensityData
ORDER BY
density DESC;
To get the maximum and minimum densities globally and sort cities by their density, you can use a subquery or a CTE as previously suggested. Here’s a step-by-step correction:
- Calculate Density: Calculate the density directly.
- Filter Invalid Data: Ensure that
area
andpopulation
are non-zero. - Order by Density: Sort cities by density.
approach two:
SELECT
city,
population,
area,
ROUND(population / area) AS density,
MAX(ROUND(population / area)) OVER () AS max_density,
MIN(ROUND(population / area)) OVER () AS min_density
FROM
cities_population
WHERE
area > 0 AND population > 0
ORDER BY
density DESC;
explanation
Population Density Calculation: The ROUND(population / area)
computes the population density and rounds it to the nearest integer.Max and Min Density:
MAX(ROUND(population / area)) OVER ()
finds the global maximum density across all cities.MIN(ROUND(population / area)) OVER ()
finds the global minimum density across all cities.
Non-Zero Areas and Population: The WHERE
clause ensures that rows with zero area or population (which would cause division by zero or meaningless results) are excluded.Ordering: ORDER BY density DESC
sorts the cities from the highest population density to the lowest, ensuring cities with the same density are grouped together.
Conclusion:
- Query 1 (with CTE) is generally better in terms of readability, maintainability, and potentially performance (due to avoiding redundant calculations).
- Query 2 might be slightly simpler in cases where performance and redundancy aren’t concerns (e.g., small datasets or very simple computations), but it’s less optimal for larger, more complex datasets.
For a larger dataset or a more complex calculation, Query 1 is the preferred approach.