Delete duplicates based on string length criteria

2023-03-01 12:50 问答作者：

Background

Remove duplicate city names from a temporary table, based on the length of the name.

Problem

The following query returns 350,000 rows:

select
  tc.id,
  tc.name_lowercase,
  tc.population,
  tc.latitude_decimal,
  tc.longitude_decimal
from
  climate.temp_city tc
inner join (
  select
    tc2.latitude_decimal,
    tc2.longitude_decimal
  from
    climate.temp_city tc2
  group by
    tc2.latitude_decimal,
    tc2.longitude_decimal
  having
    count(*) > 3
) s on 
  tc.latitude_decimal = s.latitude_decimal and
  tc.longitude_decimal = s.longitude_decimal

Sample data:

940308;"sara"            ;;-53.4333333;-68.1833333
935665;"estancia la sara";;-53.4333333;-68.1833333
935697;"estancia sara"   ;;-53.4333333;-68.1833333
937204;"la sara"         ;;-53.4333333;-68.1833333
940350;"seccion gap"     ;;-52.1666667;-68.5666667
941448;"zanja pique"     ;;-52.1666667;-68.5666667
935941;"gap"             ;;-52.1666667;-68.5666667
935648;"estancia gap"    ;;-52.1666667;-68.5666667
939635;"ritchie"         ;;-51.9833333;-70.4
934948;"d.e. ritchie"    ;;-51.9833333;-70.4
934992;"diego richtie"   ;;-51.9833333;-70.4
934993;"diego ritchie"   ;;-51.9833333;-70.4
934990;"diego e. ritchie";;-51.9833333;-70.4

I would like to remove all duplicates, retaining the rows where:

population is not null; and
the name is the longest of the duplicates (max(tc.name_lowercase)); and
if neither of these conditions are met, retain max(tc.id).

From the given set of data, the remaining rows would be:

935665;"estancia la sara";;-53.4333333;-68.1833333
935648;"estancia gap"    ;;-52.1666667;-68.5666667
934990;"diego e. ritchie";;-51.9833333;-70.4

Question

How would y开发者_C百科ou select just the rows with duplicate lat/long values that meet the problem criteria?

Thank you!

I think you're looking for something like this:

SELECT t.id, t.name_lowercase, t.latitude_decimal, t.longitude_decimal
FROM (SELECT MAX(LENGTH(name_lowercase)) AS len, latitude_decimal, longitude_decimal FROM temp_city GROUP BY latitude_decimal, lng) AS max_length,
     temp_city t
WHERE max_length.latitude_decimal  = t.latitude_decimal
  AND max_length.longitude_decimal = t.longitude_decimal
  AND max_length.len = LENGTH(t.name_lowercase);

Where temp_city is the table that contains your sample results.

The above will run into problems if your temp_city also contains this row:

1 | xxxancia la sara | -53.4333333 | -68.1833333

You didn't offer a way to choose which row from amongst those whose name has the maximum length so both of these will be returned:

      1 | xxxancia la sara | -53.4333333 | -68.1833333
 935665 | estancia la sara | -53.4333333 | -68.1833333

UPDATE: If max(tc.id) is the extra criteria from trimming down the above duplicates, then you can wrap another layer on:

SELECT t.id, t.name_lowercase, t.latitude_decimal, t.longitude_decimal
FROM  
  (
    SELECT MAX(t.id) AS id
    FROM
      (
        SELECT MAX(LENGTH(name_lowercase)) AS len, latitude_decimal, longitude_decimal
        FROM temp_city
        GROUP BY latitude_decimal, longitude_decimal
      ) AS max_length,
      temp_city t
    WHERE max_length.latitude_decimal  = t.latitude_decimal
      AND max_length.longitude_decimal = t.longitude_decimal
      AND max_length.len               = LENGTH(t.name_lowercase)
    GROUP BY t.latitude_decimal, t.longitude_decimal, LENGTH(t.name_lowercase)
  ) AS tt, 
  temp_city t
WHERE t.id = tt.id

继续阅读：duplicates postgresql sql

Delete duplicates based on string length criteria

Background

Problem

Question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Background

Problem

Question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？