开发者

Parser generation

i am doing a project on SOFWARE PLAGIARISM DETECTION..i am intended to do it with language C..for that i am supposed to create a token generator, and a parser..but i dont know where to start..any one can help me out with this..

i created 开发者_JS百科a database of tokens and i separated the tokens from my program.Next thing i wanna do is to compare two programs to find out whether it's plagiarized or not. For that i need to create a syntax analyzer.I don't know where to start from...

i.e I want to create a parser for c programs in python


If you want to create a parser in Python you can look at these libraries:
PLY
pyparsing
and Lepl - new but very powerful


Building a real C parser by yourself is a really big task.

I suggest you either find one that is already done, eg. pycparser or you define a really simple subset of C that is easily parsed.

You'll have plenty of work to do for your plagiarism detector after you are done parsing C.


I'm not sure you need to parse the token stream to detect the features you're looking for. In fact, it's probably going to complicate things more than anything.

what you're really looking for is sequences of original source code that have a very strong similarity with a suspect sample code being tested. This sounds very similar to the purpose of a Bayes classifier, like those used in spam filtering and language detection.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜