开发者

Analyzing and storing text in a data structure

I hope you understand what I want to do. It is hard to choose the best words, because English is not my first language and I distrust automatic translators. I will try to explain as well as I can.

I was thinking about analyzing a long text. Suppose, for example, that I have a string divided into paragraphs.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla vitae elit libero, a pharetra augue. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras mattis consectetur purus sit amet fermentum.

Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Cras justo odio, dapibus ac facilisis in, egestas eget quam. L开发者_如何学Pythonorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur blandit tempus porttitor. Maecenas sed diam eget risus varius blandit sit amet non magna.

I would like to store this string in an array or something similar, in a way I can find the length or location of the two paragraphs very quickly. For example (pseudocode):

Array => {

    paragraphs => {

        "Lorem ipsum dolor sit amet, [...] fermentum.",
        ...

    }

}

I don't really know whether this has a name. I suppose there is much theory about how to do this type of task. I am really interested in practices that take care about performance when processing a big amount of text. I would like to have something to study and read carefully.

Any help would be very appreciated. Thanks in advance,

—Alberto


Perhaps read into Apache's UIMA, it's all about analyzing unstructured information, text analysis being a major component of it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜