SQL Server Date Range Problem When Using Data From Web Server Logs
I have been importing my raw IIS log files into a SQL Server table using the Log Parser tool for several months now. This is done to enable the writing of SSRS reports based on this log data.
One of the reports that I am working on is supposed to determine the number of Visits from each unique IP address. A Visit is defined as an IP address hitting a page on the site and then making 4 more requests within an hour of each other. All of the 5 requests are within one Visit to the site. Later on that night the same IP address hits the site, except that now it's 3 hours later, so we count this new activity from the same IP as a new Visit. Here is a sample of the data:
IPAddress, RequestDateTime, UriStem
10.1.1.100, 2010-10-15 13:30:30, /
10.1.1.100, 2010-10-15 13:30:31, /style.css
10.1.1.100, 2010-10-15 13:30:31, /script.js
10.1.1.100, 2010-10-15 13:30:32, /funny.gif
10.1.1.100, 2010-10-15 13:30:33, /picture.jpg
10.1.1.101, 2010-10-15 13:40:50, /page2.html
10.1.1.101, 2010-10-15 13:40:51, /style.css
10.1.1.102, 2010-10-15 14:10:20, /page4.html
10.1.1.102, 2010-10-15 14:10:21, /style.css
10.1.1.100, 2010-10-15 16:55:10, /
10.1.1.100, 2010-10-15 16:55:11, /style.css
10.1.1.100, 2010-10-15 16:55:11, /scrip开发者_开发知识库t.js
10.1.1.100, 2010-10-15 16:55:12, /funny.gif
10.1.1.100, 2010-10-15 16:55:13, /picture.jpg
By looking at the data above I can easily discern that the 10.1.1.100
IP
address has visited the site twice and had 5 hits on each visit. However, I am
at a loss as to how to express that in SQL code. Is there an easy way to group
and count these date ranges by IP address?
I understand that this information can be captured by using tools such as AWStats, but I do not have the luxury of being able to install Perl on the systems we use.
Give the code below a trial run. The code groups and numbers the visits from each IP address. Then it looks to see how many "uristem" hits compared with the "threshold" value. I tested the code on a table named "Foo" and you need to check your table and column names prior to running the test.
DECLARE @threshold INT;
SET @threshold = 4; --this number should not include the initial visit
DECLARE @lookbackdays int;
SET @lookbackdays = 300;
;WITH postCTE as
(
SELECT
ipaddress,
uristem,
requestdatetime,
RowNumber = ROW_NUMBER() OVER (ORDER BY ipaddress,requestdatetime ASC)
FROM
Foo --put your table name here
WHERE
requestdatetime > GETDATE() - @lookbackdays
)
--select * from postCTE
SELECT
p1.ipaddress AS [ipaddress],
p2.RowNumber - p1.RowNumber +1 AS [Requests],
p1.requestdatetime AS [DateStart]
FROM
postCTE p1
INNER JOIN
postCTE p2
ON p1.ipaddress = p2.ipaddress
AND p1.Rownumber = p2.RowNumber - (@threshold )
WHERE
DATEDIFF(minute,p1.requestdatetime,p2.requestdatetime) <= 60
The output of my test on SQL 2008 is
ipaddress Requests DateStart
10.1.1.100 5 2010-10-15 13:30:30.000
10.1.1.100 5 2010-10-15 16:55:10.000
I think the best way to do this is to summarize your data first, then generate your report.
Here's how I'd do it.
Create a SummaryTable with the FACTS you want (e.g. UserIP, SessionStart, SessionEnd, PageViews)
Figure out what you consider a new visit (e.g. I think IIS default session timeout is 20 minutes, so any consecutive hit by an IP after 20 minutes I'll consider a new visit.)
Create a cursor to calculate the summarized data based on your rule.
-- Summary Data DECLARE @UserIP AS VARCHAR(15) DECLARE @SessionStart AS DateTime DECLARE @SessionEnd AS DateTime DECLARE @PageViews AS INT -- Current Values DECLARE @ThisUserIP AS VARCHAR(15) DECLARE @ThisVisitTime AS DateTime DECLARE @ThisPage AS VARCHAR(100) -- Declare Cusrsor DECLARE StatCursor CURSOR FAST_FORWARD FOR -- Query, make sure you sort by IP/Date so their data is in cronological order SELECT IPAddress, RequestDateTime, UriStem FROM Stats ORDER BY IPAddress, RequestDateTime OPEN StatCursor FETCH NEXT FROM StatCursor INTO @ThisUserIP, @ThisVisitTime, @ThisPage -- Start New Summary SELECT @UserIP = @ThisUserIP, @SessionStart = @ThisVisitTime, @SessionEnd = @ThisVisitTime, @PageViews = 1 FETCH NEXT FROM StatCursor INTO @ThisUserIP, @ThisVisitTime, @ThisPage WHILE @@FETCH_STATUS = 0 BEGIN
END CLOSE StatCursor DEALLOCATE StatCursor-- Check rule IF @UserIP = @ThisUserIP AND @ThisVisitTime <= DATEADD(MI,30,@SessionEnd) BEGIN -- Same User and Session / Add to Summary SELECT @PageViews = @PageViews + 1, @SessionEnd = @ThisVisitTime END ELSE BEGIN -- Different User or New User / Write Current Summary and Start New Summary INSERT INTO StatSummary (UserIP, SessionStart, SessionEnd, PageViews) VALUES (@UserIP, @SessionStart, @SessionEnd, @PageViews) SELECT @UserIP = @ThisUserIP, @SessionStart = @ThisVisitTime, @SessionEnd = @ThisVisitTime, @PageViews = 1 END FETCH NEXT FROM StatCursor INTO @ThisUserIP, @ThisVisitTime, @ThisPage
Create a query to get the data you need, example (All time Hits by IP).
SELECT UserIP, COUNT(UserIP) FROM StatSummary GROUP BY UserIP
精彩评论