Complex select query question for hardcore SQL designers
Very complex query been trying to construct it for few days with more real success.
I'm using SQL-SERVER 2005 Standard
What i need is : 5 CampaignVariants from Campaigns whereas 2 are with the largest PPU number set and 3 are random.
Next开发者_运维问答 condition is that CampaignDailyBudget and CampaignTotalBudget are below what is set in Campaign ( calculation is number of clicks in Visitors table connected to Campaigns via CampaignVariants on which users click)
Next condition CampaignLanguage, CampaignCategory, CampaignRegion and CampaignCountry must be the ones i send to this select with (languageID,categoryID,regionID and countryID).
Next condition is that IP address i send to this select statement won't be in IPs list for current Campaign ( i delete inactive for 24 hours IPs ).
In other words it gets 5 CampaignVariants for user that enters the site, when i take from user PublisherRegionUID,IP,Language,Country and Region
view diagram
more details
i get countryID, regionID, ipID, PublisherRegionUID and languageID from Visitor. This are filter parameters. While i first need to get what Publisher is about to show on his site by it's categories, language so on.... and then i filter all remaining Campaigns by Visitors's params with all parameters besides PublisherRegionUID.
So it has two actual fiters. One What Publisher wants to Publish and other one what Visitor can view...
campaignDailyBudget and campaignTotalBudget are values set by Users who creates a Campaign. Those two compared to (number of clicks per campaign)*(campaignPPU) while date filters obviously used to filter for campaignDailyBudget with from 12:00AM to 11:59PM of today. campaignTotalBudget is not filtered by date for obvious reasons
Demo of Stored Procedure
ALTER PROCEDURE dbo.CampaignsGetCampaignVariants4Visitor
@publisherSiteRegionUID uniqueidentifier,
@visitorIP varchar(15),
@browserID tinyint,
@countryID tinyint,
@osID tinyint,
@languageID tinyint,
@acceptsCookies bit
AS
BEGIN
SET NOCOUNT ON;
-- check if such @publisherRegionUID exists
if exists(select publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=@publisherSiteRegionUID)
begin
declare @publisherSiteRegionID int
select @publisherSiteRegionID = publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=@publisherSiteRegionUID
-- get CampaignVariants
-- ** choose 2 highest PPU and 3 random CampaignVariants from Campaigns list
-- where regionID,countryID,categoryID,languageID meets Publisher and Visitor requirements
-- and Campaign.campaignDailyBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU during this day
-- and Campaign.campaignTotalBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU
-- and @visitorID does not appear in Campaigns2IPs with this Campaign
-- insert visitor
insert into Visitors (ipAddress,browserID,countryID,languageID,OSID,acceptsCookies)
values (@visitorIP,@browserID,@countryID,@languageID,@OSID,@acceptsCookies)
declare @visitorID int
select @visitorID = IDENT_CURRENT('Visitors')
-- add IP to pool Campaigns ** adding ip to all Campaigns whose CampaignVariants were chosen
-- add PublisherRegion2Visitor relationship
insert into PublisherSiteRegions2Visitors values (@visitorID,@publisherSiteRegionID)
-- add CampaignVariant2Visitor relationship
end
END
GO
I also make a number of assumptions about your oblique requirements. I’ll spell them out as I go along, along with explaining the code. Please note that I of course have no reasonable way of testing this code for typos or minor logic errors.
It might be possible to write this as a single ginormous query, but that would be awkward, ugly, and prone to performance issues as the SQL optimizer can have problems buliding plans for overly-large queries. An option would be to write it as a series of queries, populating temp tables for use in subsequent queries (which alows for much simpler debugging). I chose to write this as a large common table expression statement with a series of CTE tables, largely because it kind of “flows” better that way, and it'd probably perform better than the many-temp-tables version.
First assumption: there are several ciruclar references in there. Campaign has links to both Countries and Regions, so both of these parameter values must be checked—even though based on the table link from Countries to Region, this filter could possibly be simplified to just a check on Country (assuming that the country parameter value is always “in” the region parameter). The same applies to Language and Category, and perhaps to IPs and Visitors. This appears to be sloppy design; if it can be cleared up, or if assumptions on the validity of the data can be made, the query could be simplified.
Second assumption: Parameters are passed in as variables in the form of @Region, @Country, etc. Also, there is only one IP address being passed in; if not, then you’ll need to pass in multiple values, set up a temp table containing those values, and add that as a filter where I use the @IP parameter.
So, step 1 is a first pass identifying “eligible” campaigns, by pulling out all those that share the desired country, region, language, cateogory, and that do not have the one IP address associated with them:
WITH cteEligibleCampaigns (CampaignId)
as (select CampaignId
from Campaigns2Regions
where RegionId = @RegionId
intersect select CampaignId
from Campaign2Countries
where CountryId = @CountryId
intersect select CampaignId
from Campaign2Languages
where LanguageId = @LanguageId
intersect select CampaignId
from Campaign2Categories
where CategoryId = @CategoryId
except select CampaignId
from Campaigns2IPs
where IPID = @IPId)
Next up, from these filter out those items where “CampaignDailyBudget and CampaignTotalBudget are below what is set in Campaign ( calculation is number of clicks in Visitors table connected to Campaigns via CampaignVariants on which users click)”. This requirement is not entirely clear to me. I have chosen to interpret it as “only include those campaigns where, if you count the number of visitors for those campaign’s CampaignVariants, the total count is less than both CampaignDailyBudget and CampaignTotalBudget”. Note that here I introduce a random value, used later on in selecting random rows.
,cteTargetCampaigns (CampaignId, RandomNumber)
as (select CampaignId, checksum(newid() RandomNumber)
from cteEligibleCampaigns ec
inner join Campaigns ca
on ca.CampgainId = ec.CampaignId
inner join CampaignVariants cv
on cv.CampgainId = ec.CampaignId
inner join CampaignVariants2Visitors cvv
on cvv.CampaignVariantId = cv. CampaignVariantId
group by ec.CampaignId
having count(*) < ca.CampaignDailyBudget
and count(*) < CampaignTotalBudget)
Next up, identify the two “best” items.
,cteTopTwo (CampaignId, Ranking)
as (select CampaignId, row_number() over (order by CampgainPPU desc)
from cteTargetCampaigns tc
inner join Campaigns ca
on ca.CampaignId = tc.CampaignId)
Next, line up all other campaigns by the randomly assigned number:
,cteRandom (CampaignId, Ranking)
as (select CampaignId, row_number() over (order by RandomNumber)
from cteTargetCampaigns
where CampaignId not in (select CampaignId
from cteTopTwo
where Ranking < 3))
And, at last, pull the data sets together:
select CampaignId
from cteTopTwo
where Ranking <= 2
union all select CampaignId
from cteRandom
where Ranking <= 3
Lump the above sections of code together, debug typos, invalid assumption, and missed requirements (such as order or flags identifying the top two items from the random ones), and you should be good.
I'm not sure I understand this portion of your post:
it gets 5 CampaignVariants for user that enters the site, when i take from user PublisherRegionUID,IP,Language,Country and Region
I'm assuming "it" is the query. The user given your second "Next Condition" is the IP? What does "when I take from user" mean? Does that mean that is the information you have at the time you execute your query or is that information you returned from your query? If the later, then there are a host of questions that would need to be answered since many of those columns are part of a Many:Many relationship.
Regardless, below is a means to get the 5 campaigns where, according to your second "Next condition", you have an IP address that you want filter out. I'm also assuming that you want five campaigns total which means that the three random ones cannot include the two "highest PPU" ones.
With
ValidCampaigns As
(
Select C.campaignId
From Campaigns As C
Left Join (Campaigns2IPs As CIP
Join IPs
On IPs.ipID = CIP.ipID
And IPs.ipAddress = @IPAddress)
On CIP.campaignId = C.campaignId
Where CIP.campaignID Is Null
)
CampaignPPURanks As
(
Select C.campaignId
, Row_Number() Over ( Order By C.campaignPPU desc ) As ItemRank
From ValidCampaigns As C
)
, RandomRanks As
(
Select campaignId
, Row_Number() Over ( Order By newid() desc ) As ItemRank
From ValidCampaigns As C
Left Join CampaignPPURanks As CR
On CR.campaignId = C.campaignId
And CR.ItemRank <= 2
Where CR.campaignId Is Null
)
Select ...
From CampaignPPURanks As CPR
Join CampaignVariants As CV
On CV.campaignId = CPR.campaignId
And CPR.ItemRank <= 2
Union All
Select ...
From RandomRanks As RR
Join CampaignVariants As CV
On CV.campaignId = RR.campaignId
And RR.ItemRank <= 3
精彩评论