Get first character of each word and its position in a sentence/paragraph
I am trying to create a map by taking the first character of each word and it's position in a sentence/paragraph. I am using regex pattern to achieve this. Regex is a costly operation. Are there are any ways to achieve this?
Regex way:
public static void getFirstChar(String paragraph) {
Pattern patte开发者_如何学Pythonrn = Pattern.compile("(?<=\\b)[a-zA-Z]");
Map newMap = new HashMap();
Matcher fit = pattern.matcher(paragraph);
while (fit.find()) {
newMap.put((fit.group().toString().charAt(0)), fit.start());
}
}
You can do your own linear scan if you really need to squeeze every bit of performance:
//0123456789012345678901
String text = "Hello,my name is=Helen";
Map<Character,Integer> map = new HashMap<Character,Integer>();
boolean lastIsLetter = false;
for (int i = 0; i < text.length(); i++) {
char ch = text.charAt(i);
boolean currIsLetter = Character.isLetter(ch);
if (!lastIsLetter && currIsLetter) {
map.put(ch, i);
}
lastIsLetter = currIsLetter;
}
System.out.println(map);
// prints "{n=9, m=6, H=17, i=14}"
API links
Character.isLetter
Python:
wmap = {}
prev = 0
for word in "the quick brown fox jumps over the lazy dog".split():
wmap[word[0]] = prev
prev += len(word) + 1
print wmap
If a letter appears more than once as the first letter of a word it'll map to the last position. For a list of all positions change wmap[word[0]] = prev to:
if word[0] in wmap:
wmap[word[0]].append(prev)
else:
wmap[word[0]] = [prev]
精彩评论