开发者

Website search on Excel

I have a spreadsheet on excel with a list of product names. What I want to do is (1) seperate each of these product names by 5 rows, and (2) setup a website search that extracts data from a given website (clinicaltrials.gov) and populates it in the rows underneath each spreadsheet.

(2) is much more important and challenging for me at the moment. I know I have to run a loop that goes through all the product names. But before I focus on the loop, I need help figuring out how to write code that executes a website search.

Some help that I received:

The following Excel VBA code snipet will take a Cell with a constructed URL in the form of:

开发者_运维知识库="URL;http://clinicaltrials.gov/ct2/show?term="& [Cell Reference to Drug name here] &"&rank=1"

And output 4 lines such as:

Estimated Enrollment:   40
Study Start Date:   Jan-11
Estimated Study Completion Date:    Apr-12
Estimated Primary Completion Date:  April 2012 (Final data collection date for primary outcome measure)


    With ActiveSheet.QueryTables.Add(Connection:= _
            ActiveCell.Text, Destination:=Cells(ActiveCell.Row, ActiveCell.Column + 1))
            .Name = "Clinical Trials"
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = True
            .RefreshOnFileOpen = False
            .BackgroundQuery = True
            .RefreshStyle = xlInsertDeleteCells
            .SavePassword = False
            .SaveData = True
            .AdjustColumnWidth = True
            .RefreshPeriod = 0
            .WebSelectionType = xlSpecifiedTables
            .WebFormatting = xlWebFormattingNone
            .WebTables = "12"
            .WebPreFormattedTextToColumns = True
            .WebConsecutiveDelimitersAsOne = True
            .WebSingleBlockTextImport = False
            .WebDisableDateRecognition = False
            .WebDisableRedirections = False
            .Refresh BackgroundQuery:=False
        End With
    


That URL you provided won't work. You need the NCT ID to get to the right page, not the drug name. Assume you have two drugs listed in A1:B2 and the proper NCT id is in column B

celebrex    NCT00571701
naproxen    NCT00586365

To use this code, set a reference to the Microsoft XML 5.0 library and the Microsoft Forms 2.0 library.

Sub GetClinical()

    Dim i As Long
    Dim lLast As Long
    Dim oHttp As MSXML2.XMLHTTP50
    Dim sHtml As String
    Dim lDataStart As Long, lTblStart As Long, lTblEnd As Long
    Dim doClip As DataObject

    'Find the last cell in column A
    lLast = Sheet1.Cells(Sheet1.Rows.Count, 1).End(xlUp).Row
    Set oHttp = New MSXML2.XMLHTTP50

    'Loop from the last cell to row 1 in column A
    For i = lLast To 1 Step -1
        'Insert 5 rows below
        Sheet1.Cells(i, 1).Offset(1, 0).Resize(5).EntireRow.Insert

        'get the web page
        oHttp.Open "GET", "http://clinicaltrials.gov/ct2/show/" & Sheet1.Cells(i, 2).Value & "?rank=1"
        oHttp.send
        sHtml = oHttp.responseText

        'Find the start and end to the table
        lDataStart = InStr(1, sHtml, "Estimated  Enrollment:")
        lTblStart = InStr(lDataStart - 200, sHtml, "<table")
        lTblEnd = InStr(lDataStart, sHtml, "</table>") + 8

        'put the table in the clipboard
        Set doClip = New DataObject
        doClip.SetText Mid$(sHtml, lTblStart, lTblEnd - lTblStart)
        doClip.PutInClipboard

        'paste the table as text
        Sheet1.Cells(i, 1).Offset(1, 0).Select
        Sheet1.PasteSpecial "Text", , , , , , True

    Next i

End Sub

If you don't have the NCT numbers, I don't think you'll be able to construct a workable URL. Also note that I find the table by looking for a particular string (Estimated Enrollment: - note the two spaces in between) and back up 200 characters. The 200 is arbitrary, but worked for both celebrex and naproxen. I can't guarantee their formatting will be consistent. They don't use table ids, so that makes it tough to find the right one.

Always make a backup of your data before running code that alters it.


If you run a search and look at the bottom of the results page, you'll see there's an option to download the results in various formats. For example this url will download all the fluoxetine results in tab-delimited format:

http://clinicaltrials.gov/ct2/results/download?down_stds=all&down_flds=all&down_fmt=tsv&term=fluoxetine

The only complication is that the results are zipped, so you'll need to save the file and unzip it first. Luckily for you I'd already had to do this... Create a folder called "files" in the same folder as your workbook, then add this code and test it out. Works OK for me.

Option Explicit

Sub Tester()

    FetchUnzipOpen "fluoxetine"

End Sub

Sub FetchUnzipOpen(DrugName As String)
   Dim s, sz 'don't dim these as strings-must be variants!
   s = ThisWorkbook.Path & "\files"
   sz = s & "\test.zip"
   FetchFile "http://clinicaltrials.gov/ct2/results/download?down_stds=all&" & _
              "down_flds=all&down_fmt=tsv&term=" & DrugName, sz
   Unzip s, sz
   'now you just need to open the data file (files/search_result.txt)
End Sub


Sub FetchFile(sURL As String, sPath)
 Dim oXHTTP As Object
 Dim oStream As Object

    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    Set oStream = CreateObject("ADODB.Stream")
    Application.StatusBar = "Fetching " & sURL & " as " & sPath
    oXHTTP.Open "GET", sURL, False
    oXHTTP.send
    With oStream
        .Type = 1 'adTypeBinary
        .Open
        .Write oXHTTP.responseBody
        .SaveToFile sPath, 2 'adSaveCreateOverWrite
        .Close
    End With
    Set oXHTTP = Nothing
    Set oStream = Nothing
    Application.StatusBar = False

End Sub

Sub Unzip(sDest, sZip)
 Dim o
 Set o = CreateObject("Shell.Application")
 o.NameSpace(sDest).CopyHere o.NameSpace(sZip).Items
End Sub
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜