Read and Remove invalid characters from xml outside xml elements in C# Linq to Xml
I am facing a problem while reading xml in C# linq to xml.
When I try to 开发者_运维技巧read xml document by using following statement:
XDocument xdoc = XDocument.Load(path);
It throws an exception like this.
Data at the root level is invalid. Line 1, position 1.
When I open the xml file that I was trying to read, I found an invalid character before xml declaration. Here is the declaration:
?<?xml version="1.0" encoding="utf-8"?>
I know the question mark at the start of declarations shouldn't be there.
I have three questions
1) How to read this invalid xml in C# linq to xml?
2) How to remove such kind of invalid characters any where in the xml in C#?
3) How to prevent these kind of invalid characters while creating the xml in c# linq to xml?
xml sample: ?<?
hex equivalent : 3f 3c 3f
And here is the code that I am using to create it:
XDocument xdoc = new XDocument();
xdoc.Add(new XElement("TaskAlert"));
AddParentNodeInTaskAlertXml(ref xdoc, userId);
and so on......
I couldn't understand the reason why it add such kind of characters sometime.
Here is some code that I am using to create or load the file:
public static void CreateUpdateTaskAlertXmlFile(int userId)
{
try
{
string path = string.Format("{0}\\{1}\\{2}", Application.StartupPath, "Configuration",
"TaskAlert.xml");
if (userId.Equals(0))
userId = Utility.Application.CurrentUser.UserId;
XDocument xdoc;
LoadTaskAlertXml(out xdoc, path, userId);
xdoc.Save(path);
}
catch (Exception exception)
{
MSLib.HandleException(exception);
}
}
public static void LoadTaskAlertXml(out XDocument xdoc, string path, int userId)
{
xdoc = null;
TaskCollection tasks = TaskEntity.GetOverDueTasks(userId);
if (!File.Exists(path))
{
CreateTaskAlertXml(userId.ToString(), ref xdoc);
AddOverDueTasksInTaskAlertXml(xdoc, userId.ToString(), tasks, false);
}
else
{
xdoc = XDocument.Load(path);
XElement userElement =
xdoc.Descendants("User").Where(x => x.Attribute("Id").Value.Equals(userId.ToString())).
SingleOrDefault();
if (userElement == null)
{
AddParentNodeInTaskAlertXml(ref xdoc, userId.ToString());
AddOverDueTasksInTaskAlertXml(xdoc, userId.ToString(), tasks, false);
}
else
AddOverDueTasksInTaskAlertXml(xdoc, userId.ToString(), tasks, true);
}
}
LINQ to XML wouldn't create an invalid file to start with, so question 3 is moot.
LINQ to XML is only designed to read valid XML. You should find out why you've ended up with invalid XML to start with, and fix the root cause. It's generally a bad idea to try to fix an already-invalid file, especially without understanding the root cause to start with - you never know what other problems might be lurking round the corner.
I suspect that the extra character was originally a byte order mark, but that it's been mangled by something else. If you can give us more information about how you've created the file in the first place, that would help a lot. LINQ to XML can read files which start with a valid BOM with no problems.
I suggest you look at the file in a binary editor and edit your question with exactly the bytes at the start of the file. A valid UTF-8 BOM would be 0xEF, 0xBB, 0xBF.
EDIT: It sounds like the bug is in the way you're creating the file. For example, this should be absolutely fine:
using System.Xml.Linq;
class Test
{
static void Main()
{
XDocument doc = new XDocument();
doc.Add(new XElement("Test"));
doc.Save("test.xml");
}
}
That creates a file with a valid byte order mark. Please show an equivalent program which doesn't, or investigate exactly what you're doing with the file, e.g. copying via FTP.
As an aside, do you really need to use ref
in your call to AddParentNodeInTaskAlertXml
? It seems unlikely to me. See my parameter passing article if you're not quite sure what ref
really means.
精彩评论