开发者

how to get category among words in wikipedia?

i've problem about extracting category among words. i have several words in a cluster ("apple","iMac","snowleopard") and i would like to retrieve catego开发者_如何转开发ry among that words.

("apple","iMac","snowleopard") --> "Mac OS X"

i've tried using lexical database such as WordNet, but it won't work. i've been looking for other methods and found that wikipedia may help. any java library for wikipedia? and how to do such task i've mentioned above? Thanks


You can try using Wikipedia to extract some meaning from these terms. For example, the following query against the Wikipedia API:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&format=json&clshow=!hidden&cllimit=10&generator=search&gsrsearch=apple%20iMac%20snowleopard%22&gsrnamespace=0&gsrprop=titlesnippet&gsrredirects=&gsrlimit=10

Yields the following result:

    {
        "query": {
            "searchinfo": {
                "totalhits": 3,
                "suggestion": "apple iMac snow leopard\"\""
            },
            "pages": {
                "2020710": {
                    "pageid": 2020710,
                    "ns": 0,
                    "title": "Apple's transition to Intel processors",
                    "categories": [
                        {
                            "ns": 14,
                            "title": "Category:Apple Inc."
                        },
                        {
                            "ns": 14,
                            "title": "Category:Intel Corporation"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mac OS X"
                        }
                    ]
                },
                "14059031": {
                    "pageid": 14059031,
                    "ns": 0,
                    "title": "Mac OS X Snow Leopard",
                    "categories": [
                        {
                            "ns": 14,
                            "title": "Category:2009 software"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mac OS X"
                        }
                    ]
                },
                "20640": {
                    "pageid": 20640,
                    "ns": 0,
                    "title": "OS X",
                    "categories": [
                        {
                            "ns": 14,
                            "title": "Category:1999 software"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Apple Inc. operating systems"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Apple Inc. software"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mac OS X"
                        },
                        {
                            "ns": 14,
                            "title": "Category:Mach"
                        }
                    ]
                }
            }
        },
        "query-continue": {
            "categories": {
                "clcontinue": "14059031|X86-64 operating systems"
            }
        }
    }

May not be easy to determine from this data what is the "correct" category, but it's a start.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜