开发者

Complex String Processing - well complex to me

I am calling a web service and all I get back is a giant blob of text. I am left to process it myself. Problem is not all lines are necessarily the same. They each have 2 or 3 sections to them and they are similar. Here are the most common examples

text1 [text2] /text3/
text1/test3
text1[text2]/text3
text1 [text2] /text /3 here/

I am not exactly sure how to approach this problem. I am not too good at doing anything advanced as far as manipulating strings.

I was thinking using a regular expression might work, but not too sure on that either. If I can get each of these 3 sections broken up it is easier from there to do the rest. its just there doesn't seem to be any uniformity to the main 3 sections that I know how to work with.

EDIT: Thanks for mentioning i didn't actually say what I wanted to do.

Basically, I want to split these 3 sections of text into their own strings seperate stings so basically take it from one single string to an array of 3 开发者_如何学Cstrings.

string[0] = text1
string[1] = text2
string[2] = text3

Here is some of the text I get back from a call as an example

スルホ基 [スルホき] /(n) sulfo group/
鋭いナイフ [するどいナイフ] /(n) sharp knife/
鋭い批判 [するどいひはん] /(n) sharp criticism/
スルナーイ /(n) (See ズルナ) (obsc) surnay (Anatolian woodwind instrument) (per:)/zurna/
スルピリン /(n) sulpyrine/
スルファミン /(n) sulfamine/
剃る [そる(P);する] /(v5r,vt) to shave/(P)/

As the first line for an example I want to pull it out into an array

string[0] = スルホ基
string[0] = [スルホき]
string[0] = /(n) sulfo group/


Those example seem a bit random, there has to be some kind of order, isn't there a spec for the service? If not i suggest more example so that we can understand the rules.


Read up on some of the info here on finite state machines, and see if you can use some of the concepts on your input parsing problem.

If there is some order to the groups on each line, then maybe you can use a regex to separate the groups out.

Edit: after seeing your samples, you may get by with a regex, breaking on some of those specific delimiters. It will take maybe half an hour to test theory: pick yourself up a free regex tester, make yourself a regex that will isolate out just one of those groups, and pump a few sample lines through. If it performs reliably on the real data that you have, then expand it and see if you can also isolate out the other groups.

I should mention though that your regexes will break or just become a nightmare if there is any sort of vagaries in your data (and frequently there is). So test long and hard before settling on them. If you find you start to have exceptions in your data, then you will need to choose some sort of parsing algorithm (the FSM i mentioned above is a pattern you can follow if you implement a parsing mechanism).


The most stupid answer is "Use regex". But more information needed for better one.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜