开发者

python truncate text around keyword

I have a string and I wan开发者_开发技巧t to search it for a keyword or phrase and return only a portion of the text before and after the keyword or phrase. Google does exactly what I am talking about.

Here is a string I grabbed from the web:

"This filter truncates words like the original truncate words Django filter, but instead of being based on the number of words, it's based on the number of characters. I found the need for this when building a website where i'd have to show labels on really small text boxes and truncating by words didn't always gave me the best results (and truncating by character is...well...not that elegant)."

Now lets say I want to search this for the phrase building a website and then output something like this:

"... the need for this when building a website where i'd have to show ..."

Edit: I should have made this more clear. This has to work on multiple strings / phrases, not just this one.


Building on the answers of others (especially cababunga's) I like a function, which will take up to 25 (or however many) characters, stopping at the last word boundary, and provide a nice match:

import re

def find_with_context(haystack, needle, context_length, escape=True):
    if escape:
        needle = re.escape(needle)
    return re.findall(r'\b(.{,%d})\b(%s)\b(.{,%d})\b' % (context_length, needle, context_length), haystack)

# Returns a list of three-tuples, (context before, match, context after).

Usage:

>>> find_with_context(s, 'building a website', 25)
[(' the need for this when ', 'building a website', " where i'd have to show ")]
>>> # Compare this to what it would be without making sure it ends at word boundaries:
... # [('d the need for this when ', 'building a website', " where i'd have to show l")]
...
>>> for match in find_with_context(s, 'building a website', 25):
...     print '<p>...%s<strong>%s</strong>%s...</p>' % match
... 
<p>... the need for this when <strong>building a website</strong> where i'd have to show ...</p>


Use a method that gets the index of the phrase you want, then slice the string up to N characters before and after that index. You could get fancy by looking for the whitespace closest to N characters away from that index on each side, so you get whole words.

Python string functions to find the exact ones you need:

http://docs.python.org/py3k/library/strings.html


>>> re.search(r'((?:\S+\s+){,5}\bbuilding a website\b(?:\s+\S+){,5})', s).groups()
("the need for this when building a website where i'd have to show",)


Something like this maybe:

import re
mo = re.search(r"(.{25})\bbuilding a website\b(.{25})", text)
if mo:
    print mo.group(1), "<b>building a website</b>", mo.group(2)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜