开发者

How to modify/update a set of html files with a standard header and footer

I have a set of html files that I want to modify by replacing the header and 开发者_Python百科footer. The contents of each file is different and I would like to use a regular expression (or similar if RE can't handle multiline queries).

As an example, one modification I want to make is to replace everything between <html> and </head> with a standard header.

Can this be done with a regular expression? What method would you use to perform a bulk search and replace like this in C#?

Can you provide an example of a regular expression that matches multiple lines?


Well the simple answer is, yes.

Regex could indeed help you, but you need a tool that copes with multiple files. I can't recommend any at the moment, try Googling "multiple file search and replace". Regex can cope with multi-line or single-line matching.

I use Notepad++ which can sort of do what you want to do a search/replace in multiple files (open or within a directory tree), not it's primary aim, but it works.

The hard part is defining your "match" making sure that where you want to pick out details you need to preserve that you have an appropriate capture group that you can use in your "replace" expression.

So, again, yes it can help, but your question is very high level.

For the C# part, it's simple once you have your regex defined.

static void Main()
{ 
     // Remove everything (by commenting out) everything between HTML
     // and the end of the HEAD tag.
     string matchRegex = "<html[^>]*>(.*?)</head>";
     string replaceExpression = "<html> <!-- \0 </head> -->";

     string pattern = "*.html";

     using ( DirectoryInfo di = new DirectoryInfo(.) )
     {
          foreach (FileInfo fi in di.GetFiles(pattern))
          {
               using ( StreamReader sr = fi.OpenText() )
               {
                    // Going from memory here, may need to use a TextReader...
                    string content = fi.ReadToEnd();

                    // Treat as single-line so that the match can span
                    // several lines.
                    string newContent = Regex.Replace(content, 
                                                      matchRegex, 
                                                      replaceExpression,
                                                      RegexOptions.Singleline);

                    // Write-out/overwirte your new file here....
               }
          }
     }
}

You may find this page useful, in it, someone is trying to write a regular expression to match comments, then handle multiple line comments, etc. It shows the regex thought process. Finding Comments in source code. The replace part is easy, put a capture group in and reference the group/name in the replacement string!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜