开发者

Count distinct strings in C# code

I'm in need to estimate localization effort needed for a legacy project. I'm looking for a tool that I co开发者_JAVA技巧uld point at a directory, and it would:

  • Parse all *.cs files in the directory structure
  • Extract all C# string literals from the code
  • Count total number of occurrences of the strings

Do you know any tool that could do that? Writing it would be simple, but if some time can be saved, then why not save it?


Use ILDASM to decompile your .DLL / .EXE.

I just use options to dump all, and you get an .il file with a section "User String":

User Strings
-------------------------------------------------------
70000001 : (14) L"Starting up..."
7000001f : (12) L"progressBar1"
70000039 : (21) L"$this.BackgroundImage"
70000065 : (10) L"$this.Icon"
7000007b : ( 6) L"Splash"

Now if you want to know how many time a certain string is used. Search for a "ldstr" like this:

IL_003c:  /* 72   | (70)000001       */ ldstr      "Starting up..." /* 70000001 */

I think this will be a lot easier to parse as C#.


Doing a quick search, I found the following tool that may or may not be useful to you.

http://www.devincook.com/goldparser/

I also found another SO user who was trying to do something similar.

Regex to parse C# source code to find all strings


Well, if you have hardcoded strings, you need to know what is your i18n effort first (unhardcoding them could be quite painful). Another issue: you need to count translatable words not distinct strings, that is the input for translation providers. And even though string might seem duplicated, it could be translated in a different way depending on the context, so you don't need to care about "distninct", you just have to count all words... That's how Localization works per my experience.


In most common development, you should keep your strings external to your program source code. In your case, could you spare the effort to extract the strings into a resource file?

If so, then you can make use of the default localization solution in .NET, i.e.

resource.resx,

resource.fr.resx,

resources.es.resx

stores strings for different locales.

Updated :

The actual implementation depends on your project architecture/technology, resource files ain't the best way to do this, but it is the easiest, and the recommended way in .NET.

Like in this article

A few more tutorials A few more tutorials

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜