Addresses stored in SQL server have many small variations(errors)

2023-02-03 03:19 问答作者：

I have a table in my database which stores packing slips and their information. I'm trying to query that table and get each unique address. I've come close, but I still have many near misses and I'm looking for a way to exclude these near duplicates from my select.

Sample Data

CompanyCode   CompanyName                     Addr1                City         State   开发者_开发百科Zip
10033         UNITED DIE  CUTTING & FINISHIN  3610 HAMILTON AVE    CLEVELAND    Ohio    44114
10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVE    CLEVELAND    Ohio    44114
10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVE.   CLEVELAND    Ohio    44114
10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVENUE CLEVELAND    Ohio    44114
10033         UNITED DIECUTTING & FINISHING   3610 HAMILTON AVE    CLEVELAND    Ohio    44144
10033         UNITED FINISHING                3610 HAMILTON AVE    CLEVLAND     Ohio    44114
10033         UNITED FINISHING & DIE CUTTING  3610 HAMILTON AVE    CLEVELAND    Ohio    44114

And all I want is 1 record. Is there some way I can get the "Average" record? Meaning, if most of the records say CLEVELAND instead of CLEVLAND, I want my 1 record to say CLEVELAND. Is there any way to par this data down to what I'm looking for?

Desired Output

 CompanyCode   CompanyName                     Addr1                City         State   Zip
 10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVE    CLEVELAND    Ohio    44114

You are going to struggle. Personally I think I'd consider having a process that tries to update the data in the database and correct it.

You could change the system that inputs the data (or if that's not possible, have an external process that runs over the data once it's in the db) that does something like the following:

Against known lists of things like towns/states/countries etc to catch typos.
For known regular mistakes and abbreviations. E.g. "Avenue"/"Ave." or "Street"/"St." and normalises the values.
Change the input system to do this kind of validation and/or provide the users with an address search/validation UI that allows then to search for an address given some known values (zip/postal code etc). You can buy data like this from various suppliers depending on where you are in the world.

If this all works (I doubt you'll get 100% unless you provide a mechanism whereby those things that can't be auto-corrected are flagged for human intervention), then your reporting is as simple as SELECT DISTINCT...

Is it one-time job, I hope? It's impossible unless you can explain (in SQL terms) why first record is what you need. As an temporary solution I'd suggest following query

select C1.* from Company C1, 
(select CompanyCode, min(CompanyName) as CompanyNameSelected 
   from Company
   group by CompanyCode) C2
where 
   C1.CompanyCode = C2.CompanyCode and 
   C1.CompanyName = C2.CompanyNameSelected;

You could use any of aggregation functions instead of min (returning CompanyName of course), or even write your own stored function, but the only thing is needed - you have to explain in query language why record #1 is better than #2.

You need to use subqueries in the select statement Try this query:

 select CompanyCode,
    (select top 1 CompanyName from Table1 where CompanyCode=X.CompanyCode 
     group by CompanyName order by count(*) desc) CompanyName,
    (select top 1 Addr1 from Table1 where CompanyCode=X.CompanyCode 
     group by Addr1 order by count(*) desc) Addr1,
    (select top 1 City from Table1 where CompanyCode=X.CompanyCode 
     group by City order by count(*) desc) City,
    (select top 1 State from Table1 where CompanyCode=X.CompanyCode 
     group by State order by count(*) desc) State,
    (select top 1 Zip from Table1 where CompanyCode=X.CompanyCode 
     group by Zip order by count(*) desc) Zip
from    Table1 X
group by CompanyCode

继续阅读：sql sql-server-2000 tsql

Addresses stored in SQL server have many small variations(errors)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？