开发者

SQL Server Date Range Problem When Using Data From Web Server Logs

I have been importing my raw IIS log files into a SQL Server table using the Log Parser tool for several months now. This is done to enable the writing of SSRS reports based on this log data.

One of the reports that I am working on is supposed to determine the number of Visits from each unique IP address. A Visit is defined as an IP address hitting a page on the site and then making 4 more requests within an hour of each other. All of the 5 requests are within one Visit to the site. Later on that night the same IP address hits the site, except that now it's 3 hours later, so we count this new activity from the same IP as a new Visit. Here is a sample of the data:

IPAddress,   RequestDateTime,     UriStem
10.1.1.100,  2010-10-15 13:30:30, /
10.1.1.100,  2010-10-15 13:30:31, /style.css
10.1.1.100,  2010-10-15 13:30:31, /script.js
10.1.1.100,  2010-10-15 13:30:32, /funny.gif
10.1.1.100,  2010-10-15 13:30:33, /picture.jpg
10.1.1.101,  2010-10-15 13:40:50, /page2.html
10.1.1.101,  2010-10-15 13:40:51, /style.css
10.1.1.102,  2010-10-15 14:10:20, /page4.html
10.1.1.102,  2010-10-15 14:10:21, /style.css
10.1.1.100,  2010-10-15 16:55:10, /
10.1.1.100,  2010-10-15 16:55:11, /style.css
10.1.1.100,  2010-10-15 16:55:11, /scrip开发者_开发知识库t.js
10.1.1.100,  2010-10-15 16:55:12, /funny.gif
10.1.1.100,  2010-10-15 16:55:13, /picture.jpg

By looking at the data above I can easily discern that the 10.1.1.100 IP address has visited the site twice and had 5 hits on each visit. However, I am at a loss as to how to express that in SQL code. Is there an easy way to group and count these date ranges by IP address?

I understand that this information can be captured by using tools such as AWStats, but I do not have the luxury of being able to install Perl on the systems we use.


Give the code below a trial run. The code groups and numbers the visits from each IP address. Then it looks to see how many "uristem" hits compared with the "threshold" value. I tested the code on a table named "Foo" and you need to check your table and column names prior to running the test.

DECLARE @threshold INT;  
SET @threshold = 4;  --this number should not include the initial visit
DECLARE @lookbackdays int; 
SET @lookbackdays = 300; 

;WITH postCTE as  
(  
SELECT   
    ipaddress,  
    uristem,  
    requestdatetime,  
    RowNumber = ROW_NUMBER() OVER (ORDER BY ipaddress,requestdatetime ASC)  
FROM  
    Foo  --put your table name here
WHERE  
    requestdatetime > GETDATE() - @lookbackdays 
)  
--select * from postCTE

SELECT   
    p1.ipaddress AS [ipaddress],   
    p2.RowNumber - p1.RowNumber +1 AS [Requests], 
    p1.requestdatetime AS [DateStart]
FROM  
    postCTE p1  
INNER JOIN  
    postCTE p2   
    ON p1.ipaddress = p2.ipaddress   
    AND p1.Rownumber = p2.RowNumber - (@threshold )  
WHERE  
    DATEDIFF(minute,p1.requestdatetime,p2.requestdatetime) <= 60 

The output of my test on SQL 2008 is

ipaddress   Requests    DateStart
10.1.1.100  5   2010-10-15 13:30:30.000
10.1.1.100  5   2010-10-15 16:55:10.000


I think the best way to do this is to summarize your data first, then generate your report.

Here's how I'd do it.

  1. Create a SummaryTable with the FACTS you want (e.g. UserIP, SessionStart, SessionEnd, PageViews)

  2. Figure out what you consider a new visit (e.g. I think IIS default session timeout is 20 minutes, so any consecutive hit by an IP after 20 minutes I'll consider a new visit.)

  3. Create a cursor to calculate the summarized data based on your rule.

    -- Summary Data
    DECLARE @UserIP AS VARCHAR(15)
    DECLARE @SessionStart AS DateTime
    DECLARE @SessionEnd AS DateTime
    DECLARE @PageViews AS INT
    
    
    -- Current Values
    DECLARE @ThisUserIP AS VARCHAR(15)
    DECLARE @ThisVisitTime AS DateTime
    DECLARE @ThisPage AS VARCHAR(100)
    
    
    -- Declare Cusrsor
    DECLARE StatCursor CURSOR FAST_FORWARD FOR
    -- Query, make sure you sort by IP/Date so their data is in cronological order
    SELECT IPAddress, RequestDateTime, UriStem
    FROM Stats
    ORDER BY IPAddress, RequestDateTime
    
    
    OPEN StatCursor
    FETCH NEXT FROM StatCursor
    INTO @ThisUserIP, @ThisVisitTime, @ThisPage
    
    
    -- Start New Summary
    SELECT @UserIP = @ThisUserIP, @SessionStart = @ThisVisitTime, @SessionEnd = @ThisVisitTime, @PageViews = 1
    
    
    FETCH NEXT FROM StatCursor
    INTO @ThisUserIP, @ThisVisitTime, @ThisPage
    
    
    WHILE @@FETCH_STATUS = 0
    BEGIN
    
    
    
    -- Check rule
    IF @UserIP = @ThisUserIP AND @ThisVisitTime &lt;= DATEADD(MI,30,@SessionEnd)
        BEGIN
            -- Same User and Session / Add to Summary
            SELECT @PageViews = @PageViews + 1, @SessionEnd = @ThisVisitTime
        END
    ELSE
        BEGIN
            -- Different User or New User / Write Current Summary and Start New Summary
            INSERT INTO StatSummary (UserIP, SessionStart, SessionEnd, PageViews) VALUES (@UserIP, @SessionStart, @SessionEnd, @PageViews)
            SELECT @UserIP = @ThisUserIP, @SessionStart = @ThisVisitTime, @SessionEnd = @ThisVisitTime, @PageViews = 1
        END
    
    
    FETCH NEXT FROM StatCursor
    INTO @ThisUserIP, @ThisVisitTime, @ThisPage
    
    END CLOSE StatCursor DEALLOCATE StatCursor
  4. Create a query to get the data you need, example (All time Hits by IP).

    SELECT UserIP, COUNT(UserIP) FROM StatSummary GROUP BY UserIP

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜