how to parse this text in c#
abc = tamaz feeo maa roo key gaera porla Xyz = gippaza eka jaguar ammaz te sanna.
i want to make a struct
public struct word
{
public string Word;
public string开发者_Python百科 Definition;
}
how i can parse them and make a list of <word>
in c#.
how i can parse it in c#
thanks for help but it is a text and it is not sure that a line or more so what i do for newline
Read the input line by line and split by the equal sign.
class Entry
{
private string term;
private string definition;
Entry(string term, string definition)
{
this.term = term;
this.definition = definition;
}
}
// ...
string[] data = line.Split('=');
string word = data[0].Trim();
string definition = data[1].Trim();
Entry entry = new Entry(word, definition);
This can also be done using a very simple LINQ query:
var definitions =
from line in File.ReadAllLines(file)
let parts = line.Split('=')
select new word
{
Word = parts[0].Trim(),
Definition = parts[1].Trim()
}
Using RegExp you can proceed in two ways, depending on your source input
Exemple 1
Assuming you have read your source and saved any single line in a vector or list :
string[] input = { "abc = tamaz feeo maa roo key gaera porla", "Xyz = gippaza eka jaguar ammaz te sanna." };
Regex mySplit = new Regex("(\\w+)\\s*=\\s*((\\w+).*)");
List<word> mylist = new List<word>();
foreach (string wordDef in input)
{
Match myMatch = mySplit.Match(wordDef);
word myWord;
myWord.Word = myMatch.Groups[1].Captures[0].Value;
myWord.Definition = myMatch.Groups[2].Captures[0].Value;
mylist.Add(myWord);
}
Exemple 2
Assuming you have read your source in a single variable (and any line is terminated with the line break character '\n') you can use the same regexp "(\w+)\s*=\s*((\w+).*)" but in this way
string inputs = "abc = tamaz feeo maa roo, key gaera porla\r\nXyz = gippaza eka jaguar; ammaz: te sanna.";
MatchCollection myMatches = mySplit.Matches(inputs);
foreach (Match singleMatch in myMatches)
{
word myWord;
myWord.Word = singleMatch.Groups[1].Captures[0].Value;
myWord.Definition = singleMatch.Groups[2].Captures[0].Value;
mylist.Add(myWord);
}
Lines that matches or does not match the regexp "(\w+)\s=\s*((\w+).)":
- "abc = tamaz feeo maa roo key gaera porla,qsdsdsqdqsd\n" --> Match!
- "Xyz= gippaza eka jaguar ammaz te sanna. sdq=sqds \n" --> Match! you can insert description that includes spaces too.
- "qsdqsd=\nsdsdsd\n" --> Match a multiline pair too!
- "sdqsd=\n" --> DO NOT Match! (lacking descr)
- "= sdq sqdqsd.\n" --> DO NOT Match! (lacking word)
// Split at an = sign. Take at most two parts (word and definition);
// ignore any = signs in the definition
string[] parts = line.Split(new[] { '=' }, 2);
word w = new word();
w.Word = parts[0].Trim();
// If the definition is missing then parts.Length == 1
if (parts.Length == 1)
w.Definition = string.Empty;
else
w.Definition = parts[1].Trim();
words.Add(w);
Use Regular Expressions
精彩评论