Complex String Processing - well complex to me

2022-12-25 03:31 问答作者：

I am calling a web service and all I get back is a giant blob of text. I am left to process it myself. Problem is not all lines are necessarily the same. They each have 2 or 3 sections to them and they are similar. Here are the most common examples

text1 [text2] /text3/
text1/test3
text1[text2]/text3
text1 [text2] /text /3 here/

I am not exactly sure how to approach this problem. I am not too good at doing anything advanced as far as manipulating strings.

I was thinking using a regular expression might work, but not too sure on that either. If I can get each of these 3 sections broken up it is easier from there to do the rest. its just there doesn't seem to be any uniformity to the main 3 sections that I know how to work with.

EDIT: Thanks for mentioning i didn't actually say what I wanted to do.

Basically, I want to split these 3 sections of text into their own strings seperate stings so basically take it from one single string to an array of 3 开发者_如何学Cstrings.

string[0] = text1
string[1] = text2
string[2] = text3

Here is some of the text I get back from a call as an example

スルホ基 [スルホき] /(n) sulfo group/
鋭いナイフ [するどいナイフ] /(n) sharp knife/
鋭い批判 [するどいひはん] /(n) sharp criticism/
スルナーイ /(n) (See ズルナ) (obsc) surnay (Anatolian woodwind instrument) (per:)/zurna/
スルピリン /(n) sulpyrine/
スルファミン /(n) sulfamine/
剃る [そる(P);する] /(v5r,vt) to shave/(P)/

As the first line for an example I want to pull it out into an array

string[0] = スルホ基
string[0] = [スルホき]
string[0] = /(n) sulfo group/

Those example seem a bit random, there has to be some kind of order, isn't there a spec for the service? If not i suggest more example so that we can understand the rules.

Read up on some of the info here on finite state machines, and see if you can use some of the concepts on your input parsing problem.

If there is some order to the groups on each line, then maybe you can use a regex to separate the groups out.

Edit: after seeing your samples, you may get by with a regex, breaking on some of those specific delimiters. It will take maybe half an hour to test theory: pick yourself up a free regex tester, make yourself a regex that will isolate out just one of those groups, and pump a few sample lines through. If it performs reliably on the real data that you have, then expand it and see if you can also isolate out the other groups.

I should mention though that your regexes will break or just become a nightmare if there is any sort of vagaries in your data (and frequently there is). So test long and hard before settling on them. If you find you start to have exceptions in your data, then you will need to choose some sort of parsing algorithm (the FSM i mentioned above is a pattern you can follow if you implement a parsing mechanism).

The most stupid answer is "Use regex". But more information needed for better one.

继续阅读：string

Complex String Processing - well complex to me

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？