开发者

Text parsing library

A colleague of mine works on an universal text parsing library, based on C# lambdas. The cor开发者_运维知识库e looks cool, but unfortunately to me he has hardcoded a grammar, specifical to his private task -- math expression evaluating. So, I will not use it as I had intended before I saw the API. And now I'm looking for another lib, that meets at least some of my requirements. It has to:

  1. Be able to load a grammar from an external file -- say, XML, YML or JSON.
  2. Return AST from grammar and parsed tree that is built from any text.
  3. Work fast enough to load C# grammar then parse a large code file.

I'd prefer the library that has grammar format file simple enough for easy writing a grammar for math expressions, is open source and written in C# or C++.

Regards,

-- UPDATED: point 2 has been corrected.


You might check out Text Transformer which claims to be some kind of universal text processing language. I have no specific experience with it.

Building robust langauge front ends and usable processing tools is actually a lot of work. If you want to process computer languages in a generic way, you might consider our DMS Software Reengineering Toolkit, a kind of generalized compiler technology for parsing, analyzing, transforming, and/or generating code (or any other kind of formal document).

DMS will accept arbitrary context free grammars for langauges, automatically builds an AST with no additional specification effort on your part, and is designed to handle not only large files but very large sets of files in a single computation. Normally people that want to process code need pattern recognition, code analysis and code transformation capabilities; DMS has all of these built in. It also has a variety of predefined, mature grammars for a wide variety of computer langauges, well-known (C, C++, C#, COBOL, Java, JavaScript, ... ) and otherwise (Natural, EGL, Python, MATLAB, ...), and has been used to carry out massive automated analyses and transformations on programs in these various langauges.

DMS does not meet your open-source or C#/C++ implementation requirements. It is implemented as a set of domain-specific langauges for describing grammars, analyzers, transformations, prettyprinters, and scripting that allows parallel execution to enable complex analyses to run faster than single-threaded programs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜