Partial matching in Oracle Database

2023-02-07 12:25 问答作者：

I am having very big table (more than 1 millions rows) , These rows are having product name and price from different sources.

There are many products which are same name but they differ in price.

Here is the problem ,

We have same products many times in row but their name will not be same for example

Row    Product name             price
-----  -----------------------  ---- 
Row 1 : XYZ - size information   $a
Row 2. XYZ -Brand information    $b
Row 3. 开发者_如何学Pythonxyz                       $c

I want to get all products which price differ. If name are same in row then I can easily go for self join as Table1.Product_Name = Table1.Product_name and Table1.Price != Table2.Price

But this will not work in this case :(

Can any one suggest a solution for it ?

You could try to use regexp_replace to go into the right direction:

create table tq84_products (
  name   varchar2(50),
  price  varchar2( 5)
);

Three products:

xyz
ABCD
efghi

of which ABCD has two records with the same price and all others have a different price.

insert into tq84_products values (' XYZ - size information', '$a');
insert into tq84_products values ('XYZ - brand information', '$b');
insert into tq84_products values ('xyz'                    , '$c');

insert into tq84_products values ('Product ABCD'           , '$d');
insert into tq84_products values ('Abcd is the best'       , '$d');

insert into tq84_products values ('efghi is cheap'         , '$f');
insert into tq84_products values ('no, efghi is expensive' , '$g');

The select statement with stop words to remove words that are typically found in product names.

with split_into_words as (
      select 
        name,
        price,
        upper (
        regexp_replace(name,
                             '\W*'  ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?\W?+' ||
                       '(\w+)?'     ||
                       '.*',
                       '\' || submatch.counter
                     ) 
        )                          word
         from
           tq84_products,
           (select
              rownum counter
            from 
              dual
            connect by
              level < 10
           ) submatch
  ),
  stop_words as (
    select 'IS'          word from dual union all
    select 'BRAND'       word from dual union all
    select 'INFORMATION' word from dual 
  )
  select
    w1.price,
    w2.price,
    w1.name,  
    w2.name
--  substr(w1.word, 1, 30)               common_word,
--  count(*) over (partition by w1.name) cnt
  from
    split_into_words w1,
    split_into_words w2
  where
    w1.word   = w2.word and
    w1.name  <  w2.name and
    w1.word is not null and
    w2.word is not null and
    w1.word not in (select word from stop_words) and
    w2.word not in (select word from stop_words) and
    w1.price != w2.price;

This then selects

$a    $b     XYZ - size information                            XYZ - brand information
$b    $c    XYZ - brand information                            xyz
$a    $c     XYZ - size information                            xyz
$f    $g    efghi is cheap                                     no, efghi is expensive

So, abcd is not returned while the others are.

继续阅读：oracle sql

Partial matching in Oracle Database

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？