开发者

Get first character of each word and its position in a sentence/paragraph

I am trying to create a map by taking the first character of each word and it's position in a sentence/paragraph. I am using regex pattern to achieve this. Regex is a costly operation. Are there are any ways to achieve this?

Regex way:

public static void getFirstChar(String paragraph) {
    Pattern patte开发者_如何学Pythonrn = Pattern.compile("(?<=\\b)[a-zA-Z]");
    Map newMap = new HashMap();

    Matcher fit = pattern.matcher(paragraph);
    while (fit.find()) {
        newMap.put((fit.group().toString().charAt(0)), fit.start());
    }
}


You can do your own linear scan if you really need to squeeze every bit of performance:

                 //0123456789012345678901
    String text = "Hello,my name is=Helen";
    Map<Character,Integer> map = new HashMap<Character,Integer>();

    boolean lastIsLetter = false;
    for (int i = 0; i < text.length(); i++) {
        char ch = text.charAt(i);
        boolean currIsLetter = Character.isLetter(ch);
        if (!lastIsLetter && currIsLetter) {
            map.put(ch, i);
        }
        lastIsLetter = currIsLetter;
    }

    System.out.println(map);
    // prints "{n=9, m=6, H=17, i=14}"

API links

  • Character.isLetter


Python:

wmap = {}
prev = 0
for word in "the quick brown fox jumps over the lazy dog".split():
    wmap[word[0]] = prev
    prev += len(word) + 1

print wmap

If a letter appears more than once as the first letter of a word it'll map to the last position. For a list of all positions change wmap[word[0]] = prev to:

if word[0] in wmap:
    wmap[word[0]].append(prev)
else:
    wmap[word[0]] = [prev]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜