Frequency of symbols in programming languages

2023-01-10 17:08 问答作者：

I'm looking for some kind of reference which shows the frequency of symbols of popular programming languages. I'm trying t开发者_开发技巧o design an optimal keyboard layout for programming.

If there is no such reference, I wouldn't mind creating a simple utility that figures this out. However, I would need suggestions as to which files to analyze for each language.

One of the problems I can foresee is say I get some objective-c code, if it is a simple program with no objects, then the [ and ] keys will be far less frequent than an average objective-c file. So, I would say one of the guidelines is that the sample code should be representative of an average file and use the most commonly used features of the language.

Originally I was thinking that I should get the same code written in different languages, but I'm not sure if that's a good idea since some languages have different uses than others.

For large code samples to use for statistical analysis, you might try browsing popular open-source projects or searching on Koders by language.

I made some simple changes to a QWERTY layout a few years ago, and I've been using it ever since as my general-purpose layout:

Swap digits for their corresponding shift-symbols.
Swap _ and -: names with underscores are common, and now - and + both require Shift.
Swap [] and {}: blocks are more common than subscripts.

Plus two optional changes, to taste:

Swap ` and ~: destructors are common.
Swap ' and ": strings are more common than characters.

The last is the only one that typically would interfere with typing ordinary English text. The layout works beautifully for C++, Perl, and whatever else I've used in the past two or three years. The noticeable speed increase comes from the drastic reduction in the need to hit the Shift key. I find that using Shift for the numbers isn't a big deal since the number pad is usually faster anyway.

The book The New C Standard: An economic and cultural commentary contains a lot of measurements of C source usage. The usage figures and tables are available as a stand-alone pdf

@Derek Jones cited The New C Standard: An economic and cultural commentary which has the information but here are the frequencies contained therein for quick reference:

space 15.083
! 0.102
" 0.376
# 0.175
$ 0.005
% 0.105
# 0.175
& 0.237
' 0.101
( 1.372
) 1.373
* 1.769
+ 0.182
, 1.565
- 1.176
. 1.512
/ 0.718
: 0.192
; 1.276
< 0.118
= 1.039
> 0.587
? 0.022
@ 0.009
[ 0.163
\ 0.97
] 0.163
^ 0.003
_ 2.550
{ 0.303
| 0.098
} 0.210
~ 0.002

Here is the same sorted by frequency:

space 15.083
_ 2.550
* 1.769
, 1.565
. 1.512
) 1.373
( 1.372
; 1.276
- 1.176
= 1.039
/ 0.718
> 0.587
" 0.376
{ 0.303
& 0.237
} 0.210
: 0.192
+ 0.182
# 0.175
] 0.163
[ 0.163
< 0.118
% 0.105
! 0.102
' 0.101
| 0.098
? 0.022
@ 0.009
$ 0.005
^ 0.003
~ 0.002

Their is a version of the Dvorak keyboard layout available, optimized for programmers.

http://www.kaufmann.no/roland/dvorak/

If you happen to use Ubuntu, it is already on your system.

There's a vast collection of open-source software that you could measure to gain some good data on character frequency. Sourceforge and github would be the places to look.

Developers don't just write code though, they also write design documents, emails and answers to stack overflow questions. Maybe installing a key logger on a few consenting developers computers would be the best way.

What you're looking for is a good corpus of programming languages. While nothing immediately sprung up in a cursory Googling, the following links might hopefully prove to be useful if you do create your own tool.

A novel framework to detect source code plagiarism

Calgary Corpus

Generating an NLP Corpus from Java Source Code

A Computer Science Text Corpus/Search Engine X-Tec and Its Applications

Mining search topics from a code search engine usage log

继续阅读：keyboard-layout

Frequency of symbols in programming languages

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？