开发者

How do I locate a particular word in a text file using .NET

I am sending mails (in asp.net ,c#), having a template in text file (.txt) like below

User Name :<User Name>

Address : <Address>.

I used to replace the words within the angle brackets in the text file using the below code

StreamReader sr;
sr = File.OpenText(HttpContext.Current.Server.MapPath(txt));

copy = sr.ReadToEnd();

sr.Close(); //close the reader

copy = copy.Replace(word.ToUpper(),"#" + word.ToUpper()); //remove the word specified UC


//save new copy into existing text file

FileInfo newText = new FileInfo(HttpContext.C开发者_Go百科urrent.Server.MapPath(txt));

StreamWriter newCopy = newText.CreateText();
newCopy.WriteLine(copy);
newCopy.Write(newCopy.NewLine);
newCopy.Close();

Now I have a new problem,

the user will be adding new words within an angle, say for eg, they will be adding <Salary>.

In that case i have to read out and find the word <Salary>.

In other words, I have to find all the words, that are located with the angle brackets (<>).

How do I do that?


Having a stream for your file, you can build something similar to a typical tokenizer.

In general terms, this works as a finite state machine: you need an enumeration for the states (in this case could be simplified down to a boolean, but I'll give you the general approach so you can reuse it on similar tasks); and a function implementing the logic. C#'s iterators are quite a fit for this problem, so I'll be using them on the snippet below. Your function will take the stream as an argument, will use an enumerated value and a char buffer internally, and will yield the strings one by one. You'll need this near the start of your code file:

using System.Collections.Generic;
using System.IO;
using System.Text;

And then, inside your class, something like this:

enum States {
    OUT,
    IN,
}
IEnumerable<string> GetStrings(TextReader reader) {
    States state=States.OUT;
    StringBuilder buffer;
    int ch;
    while((ch=reader.Read())>=0) {
        switch(state) {
            case States.OUT:
                if(ch=='<') {
                    state=States.IN;
                    buffer=new StringBuilder();
                }
                break;
            case States.IN:
                if(ch=='>') {
                    state=States.OUT;
                    yield return buffer.ToString();
                } else {
                    buffer.Append(Char.ConvertFromUtf32(ch));
                }
                break;
        }
    }
}

The finite-state machine model always has the same layout: while(READ_INPUT) { switch(STATE) {...}}: inside each case of the switch, you may be producing output and/or altering the state. Beyond that, the algorithm is defined in terms of states and state changes: for any given state and input combination, there is an exact new state and output combination (the output can be "nothing" on those states that trigger no output; and the state may be the same old state if no state change is triggered).

Hope this helps.

EDIT: forgot to mention a couple of things:

1) You get a TextReader to pass to the function by creating a StreamReader for a file, or a StringReader if you already have the file on a string.

2) The memory and time costs of this approach are O(n), with n being the length of the file. They seem quite reasonable for this kind of task.


Using regex.

var matches = Regex.Matches(text, "<(.*?)>");
List<string> words = new List<string>();

for (int i = 0; i < matches.Count; i++)
{
    words.Add(matches[i].Groups[1].Value);
}

Of course, this assumes you already have the file's text in a variable. Since you have to read the entire file to achieve that, you could look for the words as you are reading the stream, but I don't know what the performance trade off would be.


This is not an answer, but comments can't do this:

You should place some of your objects into using blocks. Something like this:

using(StreamReader sr = File.OpenText(HttpContext.Current.Server.MapPath(txt)))
{
    copy = sr.ReadToEnd();
} // reader is closed by the end of the using block

//remove the word specified UC 
copy = copy.Replace(word.ToUpper(), "#" + word.ToUpper());    

//save new copy into existing text file 

FileInfo newText = new FileInfo(HttpContext.Current.Server.MapPath(txt));

using(var newCopy = newText.CreateText())
{
    newCopy.WriteLine(copy);
    newCopy.Write(newCopy.NewLine);
}

The using block ensures that resources are cleaned up even if an exception is thrown.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜