开发者

Complex select query question for hardcore SQL designers

Very complex query been trying to construct it for few days with more real success.

I'm using SQL-SERVER 2005 Standard

What i need is : 5 CampaignVariants from Campaigns whereas 2 are with the largest PPU number set and 3 are random.

Next开发者_运维问答 condition is that CampaignDailyBudget and CampaignTotalBudget are below what is set in Campaign ( calculation is number of clicks in Visitors table connected to Campaigns via CampaignVariants on which users click)

Next condition CampaignLanguage, CampaignCategory, CampaignRegion and CampaignCountry must be the ones i send to this select with (languageID,categoryID,regionID and countryID).

Next condition is that IP address i send to this select statement won't be in IPs list for current Campaign ( i delete inactive for 24 hours IPs ).

In other words it gets 5 CampaignVariants for user that enters the site, when i take from user PublisherRegionUID,IP,Language,Country and Region

view diagram

more details

i get countryID, regionID, ipID, PublisherRegionUID and languageID from Visitor. This are filter parameters. While i first need to get what Publisher is about to show on his site by it's categories, language so on.... and then i filter all remaining Campaigns by Visitors's params with all parameters besides PublisherRegionUID.

So it has two actual fiters. One What Publisher wants to Publish and other one what Visitor can view...

campaignDailyBudget and campaignTotalBudget are values set by Users who creates a Campaign. Those two compared to (number of clicks per campaign)*(campaignPPU) while date filters obviously used to filter for campaignDailyBudget with from 12:00AM to 11:59PM of today. campaignTotalBudget is not filtered by date for obvious reasons

Demo of Stored Procedure

ALTER PROCEDURE dbo.CampaignsGetCampaignVariants4Visitor
    @publisherSiteRegionUID uniqueidentifier,
    @visitorIP varchar(15),
    @browserID tinyint,
    @countryID tinyint,
    @osID tinyint,
    @languageID tinyint,
    @acceptsCookies bit
AS
BEGIN
    SET NOCOUNT ON;

    -- check if such @publisherRegionUID exists
    if exists(select publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=@publisherSiteRegionUID)
        begin

            declare @publisherSiteRegionID int
            select @publisherSiteRegionID = publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=@publisherSiteRegionUID

            -- get CampaignVariants 
            -- ** choose 2 highest PPU and 3 random CampaignVariants from Campaigns list 
            -- where regionID,countryID,categoryID,languageID meets Publisher and Visitor requirements
            -- and Campaign.campaignDailyBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU during this day
            -- and Campaign.campaignTotalBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU
            -- and @visitorID does not appear in Campaigns2IPs with this Campaign

            -- insert visitor
            insert into Visitors (ipAddress,browserID,countryID,languageID,OSID,acceptsCookies)
            values (@visitorIP,@browserID,@countryID,@languageID,@OSID,@acceptsCookies)

            declare @visitorID int
            select @visitorID = IDENT_CURRENT('Visitors')

            -- add IP to pool Campaigns ** adding ip to all Campaigns whose CampaignVariants were chosen

            -- add PublisherRegion2Visitor relationship
            insert into PublisherSiteRegions2Visitors values (@visitorID,@publisherSiteRegionID)

            -- add CampaignVariant2Visitor relationship


        end



END
GO


I also make a number of assumptions about your oblique requirements. I’ll spell them out as I go along, along with explaining the code. Please note that I of course have no reasonable way of testing this code for typos or minor logic errors.

It might be possible to write this as a single ginormous query, but that would be awkward, ugly, and prone to performance issues as the SQL optimizer can have problems buliding plans for overly-large queries. An option would be to write it as a series of queries, populating temp tables for use in subsequent queries (which alows for much simpler debugging). I chose to write this as a large common table expression statement with a series of CTE tables, largely because it kind of “flows” better that way, and it'd probably perform better than the many-temp-tables version.

First assumption: there are several ciruclar references in there. Campaign has links to both Countries and Regions, so both of these parameter values must be checked—even though based on the table link from Countries to Region, this filter could possibly be simplified to just a check on Country (assuming that the country parameter value is always “in” the region parameter). The same applies to Language and Category, and perhaps to IPs and Visitors. This appears to be sloppy design; if it can be cleared up, or if assumptions on the validity of the data can be made, the query could be simplified.

Second assumption: Parameters are passed in as variables in the form of @Region, @Country, etc. Also, there is only one IP address being passed in; if not, then you’ll need to pass in multiple values, set up a temp table containing those values, and add that as a filter where I use the @IP parameter.

So, step 1 is a first pass identifying “eligible” campaigns, by pulling out all those that share the desired country, region, language, cateogory, and that do not have the one IP address associated with them:

WITH cteEligibleCampaigns (CampaignId)
 as (select CampaignId
      from Campaigns2Regions
      where RegionId = @RegionId
     intersect select CampaignId
      from Campaign2Countries
      where CountryId = @CountryId
     intersect select CampaignId
      from Campaign2Languages
      where LanguageId = @LanguageId
     intersect select CampaignId
      from Campaign2Categories
      where CategoryId = @CategoryId
     except select CampaignId
      from Campaigns2IPs
      where IPID = @IPId)

Next up, from these filter out those items where “CampaignDailyBudget and CampaignTotalBudget are below what is set in Campaign ( calculation is number of clicks in Visitors table connected to Campaigns via CampaignVariants on which users click)”. This requirement is not entirely clear to me. I have chosen to interpret it as “only include those campaigns where, if you count the number of visitors for those campaign’s CampaignVariants, the total count is less than both CampaignDailyBudget and CampaignTotalBudget”. Note that here I introduce a random value, used later on in selecting random rows.

,cteTargetCampaigns (CampaignId, RandomNumber)
  as (select CampaignId, checksum(newid() RandomNumber)
       from cteEligibleCampaigns ec
        inner join Campaigns ca
         on ca.CampgainId = ec.CampaignId
        inner join CampaignVariants cv
         on cv.CampgainId = ec.CampaignId
        inner join CampaignVariants2Visitors cvv
         on cvv.CampaignVariantId = cv. CampaignVariantId
       group by ec.CampaignId
       having count(*) < ca.CampaignDailyBudget
        and count(*) < CampaignTotalBudget)

Next up, identify the two “best” items.

,cteTopTwo (CampaignId, Ranking)
  as (select CampaignId, row_number() over (order by CampgainPPU desc)
       from cteTargetCampaigns tc
        inner join Campaigns ca
         on ca.CampaignId = tc.CampaignId)

Next, line up all other campaigns by the randomly assigned number:

,cteRandom (CampaignId, Ranking)
  as (select CampaignId, row_number() over (order by RandomNumber)
       from cteTargetCampaigns
       where CampaignId not in (select CampaignId
                                 from cteTopTwo
                                 where Ranking < 3))

And, at last, pull the data sets together:

 select CampaignId
  from cteTopTwo
  where Ranking <= 2
 union all select CampaignId
  from cteRandom
  where Ranking <= 3

Lump the above sections of code together, debug typos, invalid assumption, and missed requirements (such as order or flags identifying the top two items from the random ones), and you should be good.


I'm not sure I understand this portion of your post:

it gets 5 CampaignVariants for user that enters the site, when i take from user PublisherRegionUID,IP,Language,Country and Region

I'm assuming "it" is the query. The user given your second "Next Condition" is the IP? What does "when I take from user" mean? Does that mean that is the information you have at the time you execute your query or is that information you returned from your query? If the later, then there are a host of questions that would need to be answered since many of those columns are part of a Many:Many relationship.

Regardless, below is a means to get the 5 campaigns where, according to your second "Next condition", you have an IP address that you want filter out. I'm also assuming that you want five campaigns total which means that the three random ones cannot include the two "highest PPU" ones.

With 
    ValidCampaigns As
    (
    Select C.campaignId
    From Campaigns As C
        Left Join (Campaigns2IPs As CIP
            Join IPs
                On IPs.ipID = CIP.ipID
                    And IPs.ipAddress = @IPAddress)
            On CIP.campaignId = C.campaignId
    Where CIP.campaignID Is Null
    )
    CampaignPPURanks As
    (
    Select C.campaignId
        , Row_Number() Over ( Order By C.campaignPPU desc ) As ItemRank
    From ValidCampaigns As C
    )
    , RandomRanks As
    (
    Select campaignId
        , Row_Number() Over ( Order By newid() desc ) As ItemRank
    From ValidCampaigns As C
        Left Join CampaignPPURanks As CR
            On CR.campaignId = C.campaignId
                And CR.ItemRank <= 2
    Where CR.campaignId Is Null
    )
Select ...
From CampaignPPURanks As CPR
    Join CampaignVariants As CV
        On CV.campaignId = CPR.campaignId
            And CPR.ItemRank <= 2 
Union All           
Select ...
From RandomRanks As RR
    Join CampaignVariants As CV
        On CV.campaignId = RR.campaignId
            And RR.ItemRank <= 3 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜